Alex Spinov

Posted on Mar 30

Ollama Has a Free API That Runs LLMs Locally — ChatGPT-Level AI on Your Laptop, No Cloud Needed

#ai #machinelearning #webdev #opensource

The AI API Problem

OpenAI charges per token. Anthropic charges per token. Your AI prototype costs $50/day in API calls. Your data leaves your machine.

Ollama runs the same models locally. Free. Private. Unlimited tokens.

What Ollama Gives You

One-Command Model Download

# Download and run Llama 3.1 (8B)
ollama run llama3.1

# Download Mistral
ollama run mistral

# Download CodeLlama for coding
ollama run codellama

# Download DeepSeek Coder
ollama run deepseek-coder-v2

OpenAI-Compatible API

curl http://localhost:11434/v1/chat/completions \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Explain Docker in one paragraph"}]
  }'

Same API format as OpenAI. Your existing code works — just change the base URL.

Python SDK

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Write a Python function to sort a list'}]
)
print(response['message']['content'])

JavaScript SDK

import { Ollama } from 'ollama';

const ollama = new Ollama();
const response = await ollama.chat({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Explain REST APIs' }]
});
console.log(response.message.content);

Streaming Responses

for chunk in ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Write a haiku about coding'}],
    stream=True
):
    print(chunk['message']['content'], end='', flush=True)

Custom Models (Modelfile)

FROM llama3.1
SYSTEM You are a senior Python developer. Always include type hints and docstrings.
PARAMETER temperature 0.3

ollama create python-expert -f Modelfile
ollama run python-expert

Embeddings

embeddings = ollama.embed(
    model='nomic-embed-text',
    input='Your text to embed'
)
# Use for RAG, semantic search, clustering

Hardware Requirements

Model	RAM Needed	Quality
Llama 3.1 8B	8GB	Good for most tasks
Mistral 7B	8GB	Great for coding
Llama 3.1 70B	48GB	Near-GPT-4 quality
Phi-3 Mini	4GB	Works on any laptop

Why This Matters

AI shouldn't require a cloud subscription. Ollama puts state-of-the-art language models on your laptop. Your data stays private. Your costs stay zero. And with the OpenAI-compatible API, you can switch between local and cloud with one line change.

Building AI apps that need real-world data? Check out my web scraping actors on Apify Store — feed structured web data into your local LLMs. For custom solutions, email spinov001@gmail.com.

DEV Community