Alex Spinov

Posted on Mar 29

Ollama Has a Free Tool That Runs LLMs Locally on Your Laptop

#ai #llm #selfhosted #ollama

OpenAI charges per token. Anthropic charges per token. Ollama lets you run Llama 3, Mistral, Gemma, and other LLMs on YOUR machine — completely free, offline, no API keys needed.

What Ollama Gives You for Free

Run LLMs locally — Llama 3, Mistral, Gemma, Phi, Code Llama
One-command install — ollama run llama3 and it works
OpenAI-compatible API — swap OpenAI for Ollama in existing code
GPU acceleration — NVIDIA, AMD, Apple Silicon
No internet needed — fully offline after download
Custom models — import GGUF, create Modelfiles
Multi-model — run multiple models simultaneously

Quick Start

# Install (macOS)
brew install ollama

# Or Linux
curl -fsSL https://ollama.com/install.sh | sh

# Run a model (downloads automatically)
ollama run llama3

# Chat!
>>> What is the capital of France?
The capital of France is Paris.

Available Models

# Large models (for powerful machines)
ollama run llama3.1:70b        # 70B params, needs 40GB+ RAM
ollama run mixtral              # MoE, great quality
ollama run command-r-plus       # 104B, excellent reasoning

# Medium models (good balance)
ollama run llama3.1             # 8B, great all-rounder
ollama run mistral              # 7B, fast and capable
ollama run gemma2               # 9B, Google's best small model

# Small/fast models
ollama run phi3                 # 3.8B, surprisingly capable
ollama run gemma2:2b            # 2B, runs on anything

# Code models
ollama run codellama            # Code generation
ollama run deepseek-coder-v2    # Best OSS code model
ollama run starcoder2           # Fast code completion

# Embedding models
ollama run nomic-embed-text     # For RAG applications

OpenAI-Compatible API

// Your existing OpenAI code works with ONE change:
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1', // ← Just change this
  apiKey: 'ollama'                       // ← Required but unused
});

const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

REST API

# Generate
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# Chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": false
}'

# Embeddings
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The quick brown fox"
}'

Custom Models (Modelfile)

# Modelfile
FROM llama3.1
SYSTEM "You are a senior software engineer. Give concise, practical answers with code examples."
PARAMETER temperature 0.3
PARAMETER top_p 0.9

ollama create coding-assistant -f Modelfile
ollama run coding-assistant

Use With LangChain

from langchain_community.llms import Ollama

llm = Ollama(model="llama3.1")
result = llm.invoke("Write a Python function to sort a list")
print(result)

Performance by Hardware

Hardware	Llama 3 8B	Mistral 7B	Phi-3 3.8B
M1 Mac (8GB)	15 tok/s	18 tok/s	30 tok/s
M2 Pro (16GB)	35 tok/s	40 tok/s	60 tok/s
RTX 3080	50 tok/s	55 tok/s	80 tok/s
RTX 4090	100 tok/s	110 tok/s	150 tok/s

Ollama vs OpenAI vs Anthropic

Aspect	Ollama	OpenAI	Anthropic
Cost	Free	$2-60/1M tokens	$3-75/1M tokens
Privacy	100% local	Cloud	Cloud
Internet	Not needed	Required	Required
Quality (best)	Good (70B)	Excellent	Excellent
Speed	Hardware-dependent	Fast	Fast
Customization	Full (Modelfile)	Fine-tuning ($)	None

The Verdict

Ollama makes running LLMs locally as easy as ollama run llama3. Free, private, offline-capable, and compatible with the OpenAI API. For development, testing, and privacy-sensitive applications, Ollama is essential.

Need help building AI-powered data pipelines or web scrapers? I build custom solutions. Reach out: spinov001@gmail.com

Check out my awesome-web-scraping collection — 400+ tools for extracting web data.

DEV Community