Ollama is a free tool that lets you run large language models locally on your machine. Run Llama 3, Mistral, Gemma, and more — no API keys, no cloud, no costs.
What Is Ollama?
Ollama makes it ridiculously easy to run AI models on your own hardware. One command to install, one command to run any model.
Key features:
- Run LLMs locally (no internet needed)
- Supports 100+ models
- OpenAI-compatible API
- GPU acceleration (NVIDIA, AMD, Apple Silicon)
- Model customization (Modelfile)
- Multi-model serving
- Lightweight and fast
- Works on macOS, Linux, Windows
Quick Start
Install
# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download from ollama.com
Run a Model
# Run Llama 3.2 (3B parameters)
ollama run llama3.2
# Run Mistral
ollama run mistral
# Run Code Llama
ollama run codellama
# Run Gemma 2
ollama run gemma2
First run downloads the model. After that, it starts instantly.
Available Models
| Model | Size | Use Case |
|---|---|---|
| llama3.2:1b | 1.3GB | Fast, lightweight tasks |
| llama3.2:3b | 2GB | General purpose |
| llama3.1:8b | 4.7GB | High quality |
| mistral | 4.1GB | Balanced performance |
| codellama | 3.8GB | Code generation |
| gemma2:9b | 5.4GB | Google's model |
| phi3 | 2.3GB | Microsoft's small model |
| deepseek-coder | 776MB | Coding assistant |
OpenAI-Compatible API
Ollama exposes an API compatible with OpenAI's format:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
}'
Python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
print(response.choices[0].message.content)
JavaScript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:11434/v1",
apiKey: "ollama"
});
const response = await client.chat.completions.create({
model: "llama3.2",
messages: [{ role: "user", content: "Explain recursion" }]
});
Custom Models (Modelfile)
# Modelfile
FROM llama3.2
SYSTEM "You are a senior software engineer. Be concise and provide code examples."
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
ollama create code-assistant -f Modelfile
ollama run code-assistant
Cost Comparison
| Service | Cost per 1M tokens | Privacy |
|---|---|---|
| OpenAI GPT-4o | $2.50-$10 | Cloud |
| Anthropic Claude | $3-$15 | Cloud |
| Google Gemini | $0.075-$5 | Cloud |
| Ollama (local) | $0 | 100% local |
Run unlimited queries. Zero cost. Complete privacy.
Hardware Requirements
| Model Size | RAM Needed | GPU Needed |
|---|---|---|
| 1-3B | 4GB | Optional |
| 7-8B | 8GB | Recommended |
| 13B | 16GB | Recommended |
| 70B | 48GB+ | Required |
Apple Silicon Macs with unified memory work especially well.
Who Uses Ollama?
With 120K+ GitHub stars:
- Developers testing AI integrations locally
- Companies needing data privacy
- Researchers experimenting with models
- Anyone wanting free AI without API keys
Get Started
- Install with one command
- Run
ollama run llama3.2 - Use the OpenAI-compatible API in your apps
Free AI on your machine. No API keys. No limits.
Combining AI with web data? Check out my web scraping tools on Apify — feed scraped data into your local AI models. Custom solutions: spinov001@gmail.com
Top comments (0)