Ollama lets you run large language models locally with a single command. Llama 3, Mistral, Gemma, Phi — all running on your machine with a REST API.
What Is Ollama?
Ollama is a tool for running open-source LLMs locally. It handles model downloading, quantization, and serving via an OpenAI-compatible API.
Supports:
- Llama 3.2, Llama 3.1, Llama 3
- Mistral, Mixtral
- Gemma 2, Phi-3
- Code Llama, DeepSeek Coder
- Custom models (GGUF)
Quick Start
curl -fsSL https://ollama.com/install.sh | sh
# Run a model
ollama run llama3.2
# Chat interface starts immediately
REST API (OpenAI Compatible)
# Chat completion
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
"stream": false
}'
# Generate
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a Python function to sort a list",
"stream": false
}'
# Embeddings
curl http://localhost:11434/api/embeddings -d '{
"model": "llama3.2",
"prompt": "The quick brown fox"
}'
# List models
curl http://localhost:11434/api/tags
Python Example
import requests
response = requests.post("http://localhost:11434/api/chat", json={
"model": "llama3.2",
"messages": [{"role": "user", "content": "What is RAG?"}],
"stream": False
})
print(response.json()["message"]["content"])
Use with OpenAI SDK
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Use Cases
- Local AI development — no API costs
- Privacy — data never leaves your machine
- RAG prototyping — embeddings + chat
- Code generation — Code Llama/DeepSeek
- Offline AI — works without internet
Need web data at scale? Check out my scraping tools on Apify or email spinov001@gmail.com for custom solutions.
Top comments (0)