DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free API That Runs LLMs Locally — ChatGPT-Level AI on Your Laptop, No Cloud Needed

The AI API Problem

OpenAI charges per token. Anthropic charges per token. Your AI prototype costs $50/day in API calls. Your data leaves your machine.

Ollama runs the same models locally. Free. Private. Unlimited tokens.

What Ollama Gives You

One-Command Model Download

# Download and run Llama 3.1 (8B)
ollama run llama3.1

# Download Mistral
ollama run mistral

# Download CodeLlama for coding
ollama run codellama

# Download DeepSeek Coder
ollama run deepseek-coder-v2
Enter fullscreen mode Exit fullscreen mode

OpenAI-Compatible API

curl http://localhost:11434/v1/chat/completions \
  -d '{
    "model": "llama3.1",
    "messages": [{"role": "user", "content": "Explain Docker in one paragraph"}]
  }'
Enter fullscreen mode Exit fullscreen mode

Same API format as OpenAI. Your existing code works — just change the base URL.

Python SDK

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Write a Python function to sort a list'}]
)
print(response['message']['content'])
Enter fullscreen mode Exit fullscreen mode

JavaScript SDK

import { Ollama } from 'ollama';

const ollama = new Ollama();
const response = await ollama.chat({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Explain REST APIs' }]
});
console.log(response.message.content);
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

for chunk in ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Write a haiku about coding'}],
    stream=True
):
    print(chunk['message']['content'], end='', flush=True)
Enter fullscreen mode Exit fullscreen mode

Custom Models (Modelfile)

FROM llama3.1
SYSTEM You are a senior Python developer. Always include type hints and docstrings.
PARAMETER temperature 0.3
Enter fullscreen mode Exit fullscreen mode
ollama create python-expert -f Modelfile
ollama run python-expert
Enter fullscreen mode Exit fullscreen mode

Embeddings

embeddings = ollama.embed(
    model='nomic-embed-text',
    input='Your text to embed'
)
# Use for RAG, semantic search, clustering
Enter fullscreen mode Exit fullscreen mode

Hardware Requirements

Model RAM Needed Quality
Llama 3.1 8B 8GB Good for most tasks
Mistral 7B 8GB Great for coding
Llama 3.1 70B 48GB Near-GPT-4 quality
Phi-3 Mini 4GB Works on any laptop

Why This Matters

AI shouldn't require a cloud subscription. Ollama puts state-of-the-art language models on your laptop. Your data stays private. Your costs stay zero. And with the OpenAI-compatible API, you can switch between local and cloud with one line change.


Building AI apps that need real-world data? Check out my web scraping actors on Apify Store — feed structured web data into your local LLMs. For custom solutions, email spinov001@gmail.com.

Top comments (0)