Alex Spinov

Posted on Mar 27

Ollama Has a Free Local AI Model Runner

#ai #webdev #opensource #machinelearning

Ollama is a free tool that lets you run large language models locally on your machine. Run Llama 3, Mistral, Gemma, and more — no API keys, no cloud, no costs.

What Is Ollama?

Ollama makes it ridiculously easy to run AI models on your own hardware. One command to install, one command to run any model.

Key features:

Run LLMs locally (no internet needed)
Supports 100+ models
OpenAI-compatible API
GPU acceleration (NVIDIA, AMD, Apple Silicon)
Model customization (Modelfile)
Multi-model serving
Lightweight and fast
Works on macOS, Linux, Windows

Quick Start

Install

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from ollama.com

Run a Model

# Run Llama 3.2 (3B parameters)
ollama run llama3.2

# Run Mistral
ollama run mistral

# Run Code Llama
ollama run codellama

# Run Gemma 2
ollama run gemma2

First run downloads the model. After that, it starts instantly.

Available Models

Model	Size	Use Case
llama3.2:1b	1.3GB	Fast, lightweight tasks
llama3.2:3b	2GB	General purpose
llama3.1:8b	4.7GB	High quality
mistral	4.1GB	Balanced performance
codellama	3.8GB	Code generation
gemma2:9b	5.4GB	Google's model
phi3	2.3GB	Microsoft's small model
deepseek-coder	776MB	Coding assistant

OpenAI-Compatible API

Ollama exposes an API compatible with OpenAI's format:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
  }'

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)

print(response.choices[0].message.content)

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama"
});

const response = await client.chat.completions.create({
  model: "llama3.2",
  messages: [{ role: "user", content: "Explain recursion" }]
});

Custom Models (Modelfile)

# Modelfile
FROM llama3.2

SYSTEM "You are a senior software engineer. Be concise and provide code examples."

PARAMETER temperature 0.3
PARAMETER num_ctx 4096

ollama create code-assistant -f Modelfile
ollama run code-assistant

Cost Comparison

Service	Cost per 1M tokens	Privacy
OpenAI GPT-4o	$2.50-$10	Cloud
Anthropic Claude	$3-$15	Cloud
Google Gemini	$0.075-$5	Cloud
Ollama (local)	$0	100% local

Run unlimited queries. Zero cost. Complete privacy.

Hardware Requirements

Model Size	RAM Needed	GPU Needed
1-3B	4GB	Optional
7-8B	8GB	Recommended
13B	16GB	Recommended
70B	48GB+	Required

Apple Silicon Macs with unified memory work especially well.

Who Uses Ollama?

With 120K+ GitHub stars:

Developers testing AI integrations locally
Companies needing data privacy
Researchers experimenting with models
Anyone wanting free AI without API keys

Get Started

Install with one command
Run ollama run llama3.2
Use the OpenAI-compatible API in your apps

Free AI on your machine. No API keys. No limits.

Combining AI with web data? Check out my web scraping tools on Apify — feed scraped data into your local AI models. Custom solutions: spinov001@gmail.com

DEV Community