LM Studio lets you run large language models locally on your computer with a beautiful GUI and an OpenAI-compatible API server. No cloud costs, no data leaving your machine, full privacy.
What Is LM Studio?
LM Studio is a desktop application for discovering, downloading, and running local LLMs. It supports GGUF models from Hugging Face and provides an OpenAI-compatible API endpoint, making it a drop-in replacement for OpenAI in your applications.
Key Features:
- OpenAI-compatible local API server
- GPU acceleration (CUDA, Metal, Vulkan)
- GGUF model format support
- Model discovery from Hugging Face
- Chat UI with conversation history
- Multi-model loading
- Quantization support (Q4, Q5, Q8)
Getting Started
- Download LM Studio from lmstudio.ai
- Search and download a model (e.g., Llama 3, Mistral, Phi-3)
- Load the model and start the local server
- Use the API at
http://localhost:1234
LM Studio API: OpenAI-Compatible Endpoint
from openai import OpenAI
# Point to local LM Studio server
client = OpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio" # Any string works
)
# Chat completion (same as OpenAI!)
response = client.chat.completions.create(
model="llama-3.2-3b-instruct",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to find prime numbers"}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
Streaming Responses
stream = client.chat.completions.create(
model="llama-3.2-3b-instruct",
messages=[{"role": "user", "content": "Explain Docker in simple terms"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Embeddings API
# Generate embeddings locally
embeddings = client.embeddings.create(
model="nomic-embed-text-v1.5",
input=["Kubernetes orchestrates containers", "Docker packages applications"]
)
for i, emb in enumerate(embeddings.data):
print(f"Text {i}: {len(emb.embedding)} dimensions")
Using with LangChain
from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
llm = ChatOpenAI(
base_url="http://localhost:1234/v1",
api_key="lm-studio",
model="llama-3.2-3b-instruct"
)
template = PromptTemplate(
input_variables=["topic"],
template="Write a technical blog outline about {topic}"
)
chain = LLMChain(llm=llm, prompt=template)
result = chain.run(topic="web scraping best practices")
print(result)
REST API Direct Access
# Chat completion
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d x27{"model": "llama-3.2-3b-instruct", "messages": [{"role": "user", "content": "Hello!"}]}x27
# List loaded models
curl http://localhost:1234/v1/models
# Embeddings
curl http://localhost:1234/v1/embeddings \
-H "Content-Type: application/json" \
-d x27{"model": "nomic-embed-text-v1.5", "input": "Hello world"}x27
LM Studio vs Ollama
| Feature | LM Studio | Ollama |
|---|---|---|
| GUI | Full desktop app | CLI only |
| API | OpenAI-compatible | OpenAI-compatible |
| Model format | GGUF | GGUF (Modelfile) |
| Model discovery | In-app HF browser | ollama.com library |
| Multi-model | Yes | Yes |
| OS | Win/Mac/Linux | Win/Mac/Linux |
Resources
Need to scrape web data to feed your local LLMs? Check out my web scraping tools on Apify — production-ready actors for Reddit, Google Maps, and more. Questions? Email me at spinov001@gmail.com
Top comments (0)