TL;DR
- Ollama runs open LLMs locally: Llama 3.3, Mistral, Gemma, DeepSeek, Qwen, Phi, and vision models
ollama run llama3.3โ pull and start in one command- REST API on
localhost:11434โ chat completions, generate, embeddings- Python and JavaScript libraries for integration
- Modelfile for customizing system prompts, temperature, and context window
Ollama made local LLMs accessible to anyone who could run a Docker container. No Hugging Face model downloads to manage manually. No inference server configuration. ollama run llama3.3 downloads the model and opens a chat session. It has become the standard way to run open models locally, with a library of thousands of community models on top of the official catalog.
Quick Start
# Install (macOS, Linux, Docker)
curl -fsSL https://ollama.com/install.sh | sh
# Pull and run a model
ollama run llama3.3
# Run a single prompt
ollama run llama3.3 "Explain Docker containers in one sentence."
# REST API
curl http://localhost:11434/api/chat -d '{
"model": "llama3.3:8b",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'
Popular Models (May 2026)
| Model | Size | Memory |
|---|---|---|
llama3.3:70b |
70B | ~128GB VRAM |
llama3.3:8b |
8B | ~8GB VRAM |
deepseek-r1:8b |
8B | ~8GB VRAM |
qwen3:8b |
8B | ~8GB VRAM |
gemma3:4b |
4B | ~4GB VRAM |
phi4:3.8b |
3.8B | ~4GB VRAM |
llava:7b |
7B | ~12GB VRAM |
Python Integration
pip install ollama
import ollama
response = ollama.chat(
model='llama3.3:8b',
messages=[
{'role': 'system', 'content': 'You are a code reviewer.'},
{'role': 'user', 'content': 'Review: def add(a, b): return a + b'},
]
)
print(response['message']['content'])
# Streaming
for chunk in ollama.chat(model='llama3.3:8b',
messages=[{'role': 'user', 'content': 'Hello'}]):
print(chunk['message']['content'], end='', flush=True)
Originally published at meshworld.in
Top comments (0)