Ollama Has a Free API: Run LLMs Locally With One Command

#ai #ollama #llm #programming

Cloud AI APIs cost money and see your data. Ollama runs LLMs on your laptop — free, private, and offline.

What Is Ollama?

Ollama runs open-source LLMs locally with a single command. Llama 3, Mistral, Gemma, Phi, CodeLlama — download and run in seconds.

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.1
# Chat starts immediately

The REST API

Ollama exposes an OpenAI-compatible API at localhost:11434:

# Chat completion
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Why is the sky blue?"}]
}'

# Generate (simple completion)
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Explain Docker in one sentence"
}'

# List local models
curl http://localhost:11434/api/tags

# OpenAI-compatible endpoint
curl http://localhost:11434/v1/chat/completions -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello!"}]
}'

Use With Any OpenAI SDK

from openai import OpenAI

# Point to Ollama instead of OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="not-needed")

response = client.chat.completions.create(
    model="llama3.1",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)
print(response.choices[0].message.content)

Your existing OpenAI code works with Ollama. Change one line (base_url).

Available Models

ollama pull llama3.1        # Meta's best open model
ollama pull mistral         # Fast, good for coding
ollama pull codellama       # Specialized for code
ollama pull gemma2          # Google's open model
ollama pull phi3            # Microsoft's small model
ollama pull deepseek-coder  # Best open coding model
ollama pull llava           # Vision + text (multimodal)