Ollama lets you run large language models locally — Llama 3, Mistral, Gemma, Phi — with a single command. No cloud, no API keys, no costs. Just ollama run llama3 and you have a local AI.
Quick Start
# Install (macOS)
brew install ollama
# Install (Linux)
curl -fsSL https://ollama.com/install.sh | sh
# Run a model
ollama run llama3.2
The REST API
Ollama exposes a local API at http://localhost:11434:
# Generate completion
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in 3 sentences",
"stream": false
}'
# Chat
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Docker?"}
],
"stream": false
}'
# List models
curl http://localhost:11434/api/tags
# Model info
curl http://localhost:11434/api/show -d '{"name": "llama3.2"}'
Popular Models
# Meta Llama 3.2 (3B — runs on laptop)
ollama pull llama3.2
# Mistral (7B — great for coding)
ollama pull mistral
# Google Gemma 2 (9B)
ollama pull gemma2
# Microsoft Phi-3 (3.8B — small and fast)
ollama pull phi3
# Code Llama (code generation)
ollama pull codellama
# DeepSeek Coder V2
ollama pull deepseek-coder-v2
Using from JavaScript
const response = await fetch('http://localhost:11434/api/chat', {
method: 'POST',
body: JSON.stringify({
model: 'llama3.2',
messages: [{ role: 'user', content: 'Write a haiku about coding' }],
stream: false,
}),
});
const data = await response.json();
console.log(data.message.content);
Using from Python
import requests
response = requests.post('http://localhost:11434/api/chat', json={
'model': 'llama3.2',
'messages': [{'role': 'user', 'content': 'Explain REST APIs'}],
'stream': False,
})
print(response.json()['message']['content'])
Custom Modelfiles
# Modelfile
FROM llama3.2
SYSTEM You are a senior software engineer. Give concise, code-focused answers.
PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 4096
ollama create code-assistant -f Modelfile
ollama run code-assistant
Embeddings
curl http://localhost:11434/api/embeddings -d '{
"model": "llama3.2",
"prompt": "The quick brown fox"
}'
Building AI-powered data tools? Check out my Apify actors for web scraping that feeds your AI models, or email spinov001@gmail.com for custom solutions.
Which local LLM do you run? Llama, Mistral, or something else? Share below!
Top comments (0)