You can make a self-hosted AI server with LM Studio 0.4.0

LM Studio is one of the best tools for running generative AI models locally on your computer, giving you something comparable to ChatGPT, Microsoft Copilot or Google Gemini without using cloud services. Now, LM Studio 0.4.0 has arrived with a revamped interface and new options for self-hosted servers.
If you’ve never used it before, LM Studio is a desktop application for Windows, Mac, and Linux that can run a variety of large language models (LLMs). Simply download the desired model from the app, such as GPT-OSS, Meta Llama, or Google Gemma, and LM Studio will run it using your computer’s GPU or NPU. The default interface is a chat window like ChatGPT, but it can also start a server that works like OpenAI’s API services.
LM Studio 0.4.0 separated the main application code from the GUI, allowing you to easily set up LM Studio on a self-hosted server, or simply use it in the terminal on your desktop. The core functionality is now in the “llmster” tool, and the desktop application is now a graphics layer on top.
The announcement blog post explained: “We’ve reorganized our software to separate the GUI from core functionality, allowing llmster to run as a standalone daemon. This means that llmster can be run completely application-agnostic and deployed anywhere: Linux machines, cloud servers, your GPU platform, or even Google Colabs. It can of course still be run on your local machine without the GUI, for those who prefer terminal-based workflows.
LM Studio is a great choice on desktops because it’s easy to use, while still giving you the flexibility to dive into settings and optimizations as needed. There are already several tools for self-hosting generative AI models on servers (like Ollama), but if the new llmster tool is anything like LM Studio on desktop, it might be the best option for most people looking to run AI models.
There are, however, still some interesting changes for the LM Studio desktop application. The interface has been updated with “a more consistent and enjoyable experience, and it should still be familiar to anyone who has used ChatGPT or Google Gemini.” It can also now export your chats to PDF, Markdown, or plain text files, and there’s a split view mode for using multiple chats at the same time.
LM Studio uses the open source llama.cpp engine for LLM inference, and LM Studio 0.4.0 upgrades it to version 2.0 of llama.cpp. This unlocks concurrent inference requests to the same model, allowing you to run multiple threads at once with the same model.
There are some other improvements in this version. If you’re using the built-in API server, there’s a new REST endpoint that lets you store conversation response IDs for multi-step workflows. You can also start an interactive chat from your terminal with the “lms chat” command, as seen above. Keep in mind that this is not the same as the new “llmster” backend, it runs on LM Studio.
You can download LM Studio from the official website for Windows, macOS and Linux. Instructions for installing the llmster backend can be found in the announcement blog post, linked below.
Source: LM Studio



