Ollama Has a Free API — Run LLMs Locally with One Command

#webdev #tutorial #ai #machinelearning

Ollama lets you run large language models locally — Llama 3, Mistral, Gemma, Phi — with a single command. No cloud, no API keys, no costs. Just ollama run llama3 and you have a local AI.

Quick Start

# Install (macOS)
brew install ollama

# Install (Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama3.2

The REST API

Ollama exposes a local API at http://localhost:11434:

# Generate completion
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in 3 sentences",
  "stream": false
}'

# Chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Docker?"}
  ],
  "stream": false
}'

# List models
curl http://localhost:11434/api/tags

# Model info
curl http://localhost:11434/api/show -d '{"name": "llama3.2"}'

Popular Models

# Meta Llama 3.2 (3B — runs on laptop)
ollama pull llama3.2

# Mistral (7B — great for coding)
ollama pull mistral

# Google Gemma 2 (9B)
ollama pull gemma2

# Microsoft Phi-3 (3.8B — small and fast)
ollama pull phi3

# Code Llama (code generation)
ollama pull codellama

# DeepSeek Coder V2
ollama pull deepseek-coder-v2

Using from JavaScript

const response = await fetch('http://localhost:11434/api/chat', {
  method: 'POST',
  body: JSON.stringify({
    model: 'llama3.2',
    messages: [{ role: 'user', content: 'Write a haiku about coding' }],
    stream: false,
  }),
});
const data = await response.json();
console.log(data.message.content);

Using from Python

import requests

response = requests.post('http://localhost:11434/api/chat', json={
    'model': 'llama3.2',
    'messages': [{'role': 'user', 'content': 'Explain REST APIs'}],
    'stream': False,
})
print(response.json()['message']['content'])

Custom Modelfiles

# Modelfile
FROM llama3.2

SYSTEM You are a senior software engineer. Give concise, code-focused answers.

PARAMETER temperature 0.3
PARAMETER top_p 0.9
PARAMETER num_ctx 4096

ollama create code-assistant -f Modelfile
ollama run code-assistant

Embeddings

curl http://localhost:11434/api/embeddings -d '{
  "model": "llama3.2",
  "prompt": "The quick brown fox"
}'

Building AI-powered data tools? Check out my Apify actors for web scraping that feeds your AI models, or email spinov001@gmail.com for custom solutions.

Which local LLM do you run? Llama, Mistral, or something else? Share below!