TL;DR
Ollama lets you run large language models locally on your machine. One command to download and run Llama 3, Mistral, Gemma, Phi, and 100+ models — with an OpenAI-compatible API.
What Is Ollama?
Ollama makes local AI simple:
-
One command —
ollama run llama3and you're chatting - 100+ models — Llama 3, Mistral, Gemma, Phi, CodeLlama, etc.
- OpenAI-compatible API — drop-in replacement at localhost:11434
- GPU acceleration — NVIDIA, AMD, Apple Silicon
- Model customization — Modelfiles for custom system prompts
- Free — MIT license, runs on your hardware
Quick Start
# Install
curl -fsSL https://ollama.com/install.sh | sh
# Or: brew install ollama
# Run a model (auto-downloads)
ollama run llama3.1
# Run smaller models for faster responses
ollama run phi3 # 3.8B — fast, good for coding
ollama run mistral # 7B — great general purpose
ollama run gemma2 # 9B — Google's model
ollama run codellama # For code generation
REST API
# Chat completion
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Why is the sky blue?"}],
"stream": false
}'
# Generate (simple completion)
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Write a haiku about programming",
"stream": false
}'
# List local models
curl http://localhost:11434/api/tags
OpenAI-Compatible API
import OpenAI from "openai";
// Point to Ollama instead of OpenAI
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "ollama", // required but unused
});
const response = await client.chat.completions.create({
model: "llama3.1",
messages: [
{ role: "system", content: "You are a helpful coding assistant." },
{ role: "user", content: "Write a Python function to sort a list" },
],
});
console.log(response.choices[0].message.content);
Python
import ollama
response = ollama.chat(
model="llama3.1",
messages=[{"role": "user", "content": "Explain recursion simply"}],
)
print(response["message"]["content"])
# Streaming
for chunk in ollama.chat(
model="llama3.1",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True,
):
print(chunk["message"]["content"], end="")
Custom Models (Modelfile)
# Modelfile
FROM llama3.1
SYSTEM You are a senior Python developer. You write clean, efficient code with type hints. Always include docstrings and tests.
PARAMETER temperature 0.3
PARAMETER top_p 0.9
ollama create python-expert -f Modelfile
ollama run python-expert
Model Recommendations
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| phi3 | 3.8B | 4 GB | Fast responses, coding |
| mistral | 7B | 8 GB | General purpose |
| llama3.1 | 8B | 8 GB | Best open model |
| gemma2 | 9B | 8 GB | Instruction following |
| codellama | 13B | 16 GB | Code generation |
| llama3.1:70b | 70B | 48 GB | Near GPT-4 quality |
Ollama vs Alternatives
| Feature | Ollama | LM Studio | GPT4All | llama.cpp |
|---|---|---|---|---|
| Setup | 1 command | GUI install | GUI install | Compile |
| API | REST + OpenAI compat | OpenAI compat | API | None |
| Models | 100+ (auto-download) | HuggingFace | Curated | Manual GGUF |
| GPU support | NVIDIA/AMD/Apple | NVIDIA/Apple | NVIDIA/Apple | All |
| Docker | Official image | No | No | Community |
| CLI | Excellent | No | No | Yes |
Resources
- Ollama Website
- GitHub Repository — 110K+ stars
- Model Library
- API Documentation
Running AI locally on scraped data? My Apify tools extract web data — process it locally with Ollama for private, cost-free AI analysis. Questions? Email spinov001@gmail.com
Top comments (0)