DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free API That Lets You Run LLMs Locally With Zero Cloud Costs

Ollama runs LLMs on your laptop. Llama 3, Mistral, Phi-3, Gemma — all locally, no API keys, no cloud costs, no data leaving your machine.

Quick Start

curl -fsSL https://ollama.com/install.sh | sh
ollama run llama3.1
Enter fullscreen mode Exit fullscreen mode

REST API (OpenAI-Compatible)

curl http://localhost:11434/v1/chat/completions -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Hello!"}]
}'
Enter fullscreen mode Exit fullscreen mode

Use with OpenAI SDK:

import OpenAI from 'openai'
const client = new OpenAI({ baseURL: 'http://localhost:11434/v1', apiKey: 'ollama' })
const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Write a haiku about coding' }]
})
Enter fullscreen mode Exit fullscreen mode

Popular Models

Model Size Use Case
llama3.1:8b 4.7GB General purpose
mistral 4.1GB Fast, good quality
codellama 3.8GB Code generation
gemma2:9b 5.4GB Google's model

Custom Models (Modelfile)

FROM llama3.1
PARAMETER temperature 0.7
SYSTEM You are a senior software engineer.
Enter fullscreen mode Exit fullscreen mode
ollama create code-assistant -f Modelfile
ollama run code-assistant
Enter fullscreen mode Exit fullscreen mode

The Bottom Line

Free, private, offline AI. OpenAI-compatible API means zero code changes to switch between local and cloud.


Need to automate data collection or build custom scrapers? Check out my Apify actors for ready-made tools, or email spinov001@gmail.com for custom solutions.

Top comments (0)