DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free API — Run LLMs Locally in One Command

Ollama lets you run large language models locally with a single command. Llama 3, Mistral, Gemma, Phi — all running on your machine with a REST API.

What Is Ollama?

Ollama is a tool for running open-source LLMs locally. It handles model downloading, quantization, and serving via an OpenAI-compatible API.

Supports:

  • Llama 3.2, Llama 3.1, Llama 3
  • Mistral, Mixtral
  • Gemma 2, Phi-3
  • Code Llama, DeepSeek Coder
  • Custom models (GGUF)

Quick Start

curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.2
# Chat interface starts immediately
Enter fullscreen mode Exit fullscreen mode

REST API (OpenAI Compatible)

# Chat completion
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
  "stream": false
}'

# Generate
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Write a Python function to sort a list",
  "stream": false
}'

# Embeddings
curl http://localhost:11434/api/embeddings -d '{
  "model": "llama3.2",
  "prompt": "The quick brown fox"
}'

# List models
curl http://localhost:11434/api/tags
Enter fullscreen mode Exit fullscreen mode

Python Example

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "What is RAG?"}],
    "stream": False
})
print(response.json()["message"]["content"])
Enter fullscreen mode Exit fullscreen mode

Use with OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Use Cases

  1. Local AI development — no API costs
  2. Privacy — data never leaves your machine
  3. RAG prototyping — embeddings + chat
  4. Code generation — Code Llama/DeepSeek
  5. Offline AI — works without internet

Need web data at scale? Check out my scraping tools on Apify or email spinov001@gmail.com for custom solutions.

Top comments (0)