DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free Tool That Runs LLMs Locally on Your Laptop

OpenAI charges per token. Anthropic charges per token. Ollama lets you run Llama 3, Mistral, Gemma, and other LLMs on YOUR machine — completely free, offline, no API keys needed.

What Ollama Gives You for Free

  • Run LLMs locally — Llama 3, Mistral, Gemma, Phi, Code Llama
  • One-command installollama run llama3 and it works
  • OpenAI-compatible API — swap OpenAI for Ollama in existing code
  • GPU acceleration — NVIDIA, AMD, Apple Silicon
  • No internet needed — fully offline after download
  • Custom models — import GGUF, create Modelfiles
  • Multi-model — run multiple models simultaneously

Quick Start

# Install (macOS)
brew install ollama

# Or Linux
curl -fsSL https://ollama.com/install.sh | sh

# Run a model (downloads automatically)
ollama run llama3

# Chat!
>>> What is the capital of France?
The capital of France is Paris.
Enter fullscreen mode Exit fullscreen mode

Available Models

# Large models (for powerful machines)
ollama run llama3.1:70b        # 70B params, needs 40GB+ RAM
ollama run mixtral              # MoE, great quality
ollama run command-r-plus       # 104B, excellent reasoning

# Medium models (good balance)
ollama run llama3.1             # 8B, great all-rounder
ollama run mistral              # 7B, fast and capable
ollama run gemma2               # 9B, Google's best small model

# Small/fast models
ollama run phi3                 # 3.8B, surprisingly capable
ollama run gemma2:2b            # 2B, runs on anything

# Code models
ollama run codellama            # Code generation
ollama run deepseek-coder-v2    # Best OSS code model
ollama run starcoder2           # Fast code completion

# Embedding models
ollama run nomic-embed-text     # For RAG applications
Enter fullscreen mode Exit fullscreen mode

OpenAI-Compatible API

// Your existing OpenAI code works with ONE change:
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1', // ← Just change this
  apiKey: 'ollama'                       // ← Required but unused
});

const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  stream: true
});

for await (const chunk of response) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
Enter fullscreen mode Exit fullscreen mode

REST API

# Generate
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

# Chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": false
}'

# Embeddings
curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The quick brown fox"
}'
Enter fullscreen mode Exit fullscreen mode

Custom Models (Modelfile)

# Modelfile
FROM llama3.1
SYSTEM "You are a senior software engineer. Give concise, practical answers with code examples."
PARAMETER temperature 0.3
PARAMETER top_p 0.9
Enter fullscreen mode Exit fullscreen mode
ollama create coding-assistant -f Modelfile
ollama run coding-assistant
Enter fullscreen mode Exit fullscreen mode

Use With LangChain

from langchain_community.llms import Ollama

llm = Ollama(model="llama3.1")
result = llm.invoke("Write a Python function to sort a list")
print(result)
Enter fullscreen mode Exit fullscreen mode

Performance by Hardware

Hardware Llama 3 8B Mistral 7B Phi-3 3.8B
M1 Mac (8GB) 15 tok/s 18 tok/s 30 tok/s
M2 Pro (16GB) 35 tok/s 40 tok/s 60 tok/s
RTX 3080 50 tok/s 55 tok/s 80 tok/s
RTX 4090 100 tok/s 110 tok/s 150 tok/s

Ollama vs OpenAI vs Anthropic

Aspect Ollama OpenAI Anthropic
Cost Free $2-60/1M tokens $3-75/1M tokens
Privacy 100% local Cloud Cloud
Internet Not needed Required Required
Quality (best) Good (70B) Excellent Excellent
Speed Hardware-dependent Fast Fast
Customization Full (Modelfile) Fine-tuning ($) None

The Verdict

Ollama makes running LLMs locally as easy as ollama run llama3. Free, private, offline-capable, and compatible with the OpenAI API. For development, testing, and privacy-sensitive applications, Ollama is essential.


Need help building AI-powered data pipelines or web scrapers? I build custom solutions. Reach out: spinov001@gmail.com

Check out my awesome-web-scraping collection — 400+ tools for extracting web data.

Top comments (0)