DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free Local AI Model Runner

Ollama is a free tool that lets you run large language models locally on your machine. Run Llama 3, Mistral, Gemma, and more — no API keys, no cloud, no costs.

What Is Ollama?

Ollama makes it ridiculously easy to run AI models on your own hardware. One command to install, one command to run any model.

Key features:

  • Run LLMs locally (no internet needed)
  • Supports 100+ models
  • OpenAI-compatible API
  • GPU acceleration (NVIDIA, AMD, Apple Silicon)
  • Model customization (Modelfile)
  • Multi-model serving
  • Lightweight and fast
  • Works on macOS, Linux, Windows

Quick Start

Install

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Or download from ollama.com
Enter fullscreen mode Exit fullscreen mode

Run a Model

# Run Llama 3.2 (3B parameters)
ollama run llama3.2

# Run Mistral
ollama run mistral

# Run Code Llama
ollama run codellama

# Run Gemma 2
ollama run gemma2
Enter fullscreen mode Exit fullscreen mode

First run downloads the model. After that, it starts instantly.

Available Models

Model Size Use Case
llama3.2:1b 1.3GB Fast, lightweight tasks
llama3.2:3b 2GB General purpose
llama3.1:8b 4.7GB High quality
mistral 4.1GB Balanced performance
codellama 3.8GB Code generation
gemma2:9b 5.4GB Google's model
phi3 2.3GB Microsoft's small model
deepseek-coder 776MB Coding assistant

OpenAI-Compatible API

Ollama exposes an API compatible with OpenAI's format:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}]
  }'
Enter fullscreen mode Exit fullscreen mode

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Write a Python function to sort a list"}]
)

print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:11434/v1",
  apiKey: "ollama"
});

const response = await client.chat.completions.create({
  model: "llama3.2",
  messages: [{ role: "user", content: "Explain recursion" }]
});
Enter fullscreen mode Exit fullscreen mode

Custom Models (Modelfile)

# Modelfile
FROM llama3.2

SYSTEM "You are a senior software engineer. Be concise and provide code examples."

PARAMETER temperature 0.3
PARAMETER num_ctx 4096
Enter fullscreen mode Exit fullscreen mode
ollama create code-assistant -f Modelfile
ollama run code-assistant
Enter fullscreen mode Exit fullscreen mode

Cost Comparison

Service Cost per 1M tokens Privacy
OpenAI GPT-4o $2.50-$10 Cloud
Anthropic Claude $3-$15 Cloud
Google Gemini $0.075-$5 Cloud
Ollama (local) $0 100% local

Run unlimited queries. Zero cost. Complete privacy.

Hardware Requirements

Model Size RAM Needed GPU Needed
1-3B 4GB Optional
7-8B 8GB Recommended
13B 16GB Recommended
70B 48GB+ Required

Apple Silicon Macs with unified memory work especially well.

Who Uses Ollama?

With 120K+ GitHub stars:

  • Developers testing AI integrations locally
  • Companies needing data privacy
  • Researchers experimenting with models
  • Anyone wanting free AI without API keys

Get Started

  1. Install with one command
  2. Run ollama run llama3.2
  3. Use the OpenAI-compatible API in your apps

Free AI on your machine. No API keys. No limits.


Combining AI with web data? Check out my web scraping tools on Apify — feed scraped data into your local AI models. Custom solutions: spinov001@gmail.com

Top comments (0)