Ollama lets you run LLMs locally — Llama 3, Mistral, Gemma, Phi, CodeLlama — with a single command. OpenAI-compatible API, zero cloud costs, complete privacy.
Why Ollama?
-
One command:
ollama run llama3and you're chatting - OpenAI-compatible: Same API format
- Private: Data never leaves your machine
- Free: No API keys, no costs
- GPU + CPU: Works on both
- 100+ models: Llama 3, Mistral, Gemma, Phi, CodeLlama
Install
curl -fsSL https://ollama.com/install.sh | sh
Run a Model
# Download and chat
ollama run llama3.1
# Specific size
ollama run llama3.1:70b
# Code model
ollama run codellama:34b
REST API: Chat
curl http://localhost:11434/api/chat -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
"stream": false
}'
REST API: Generate
curl http://localhost:11434/api/generate -d '{
"model": "llama3.1",
"prompt": "Write a Python function to sort a list",
"stream": false
}'
OpenAI-Compatible Endpoint
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:11434/v1',
apiKey: 'ollama', // required but unused
});
const response = await client.chat.completions.create({
model: 'llama3.1',
messages: [{ role: 'user', content: 'Hello!' }],
});
console.log(response.choices[0].message.content);
Python
import openai
client = openai.OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
response = client.chat.completions.create(
model='llama3.1',
messages=[{'role': 'user', 'content': 'Explain recursion'}]
)
print(response.choices[0].message.content)
Embeddings
curl http://localhost:11434/api/embed -d '{
"model": "nomic-embed-text",
"input": "The quick brown fox jumps over the lazy dog"
}'
Create Custom Model
# Modelfile
FROM llama3.1
SYSTEM You are a helpful coding assistant. Always provide code examples.
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
ollama create code-helper -f Modelfile
ollama run code-helper
List Models
curl http://localhost:11434/api/tags
Real-World Use Case
A startup prototyped their AI feature with GPT-4 ($2,000/mo). Before launch, they switched to Ollama + Llama 3.1 70B on their own GPU server. Same quality for their use case, $0/mo API costs, and customer data stays private — crucial for their healthcare clients.
Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.
Top comments (0)