DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free Local LLM API That Runs AI Models Without Cloud

Ollama runs open-source LLMs locally with a simple API. Run Llama 3, Mistral, Gemma, and more on your machine — no API keys, no cloud costs, no data leaving your network.

Setup

# Install
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3.1
ollama pull mistral
ollama pull codellama
Enter fullscreen mode Exit fullscreen mode

REST API

# Chat completion
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
  "stream": false
}'

# Generate (simple completion)
curl http://localhost:11434/api/generate -d '{
  "model": "codellama",
  "prompt": "Write a Python function to merge two sorted lists",
  "stream": false
}'

# Embeddings
curl http://localhost:11434/api/embeddings -d '{
  "model": "llama3.1",
  "prompt": "Machine learning is a subset of AI"
}'
Enter fullscreen mode Exit fullscreen mode

JavaScript Client

import { Ollama } from 'ollama';

const ollama = new Ollama();

// Chat
const response = await ollama.chat({
  model: 'llama3.1',
  messages: [
    { role: 'system', content: 'You are a helpful coding assistant.' },
    { role: 'user', content: 'How do I handle errors in async/await?' }
  ]
});
console.log(response.message.content);

// Streaming
const stream = await ollama.chat({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Write a haiku about programming' }],
  stream: true
});
for await (const chunk of stream) {
  process.stdout.write(chunk.message.content);
}

// Embeddings for RAG
const embedding = await ollama.embeddings({
  model: 'llama3.1',
  prompt: 'What is vector search?'
});
// embedding.embedding = [0.123, -0.456, ...]
Enter fullscreen mode Exit fullscreen mode

OpenAI-Compatible Endpoint

// Works with any OpenAI SDK — just change the base URL
import OpenAI from 'openai';

const openai = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama' // Required but unused
});

const completion = await openai.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Hello!' }]
});
Enter fullscreen mode Exit fullscreen mode

Model Management

ollama list              # Show downloaded models
ollama show llama3.1     # Model details
ollama rm mistral        # Remove a model
ollama cp llama3.1 my-model  # Copy/customize
Enter fullscreen mode Exit fullscreen mode

Why This Matters

  • Privacy: No data leaves your machine
  • Free: No API costs, no rate limits
  • Fast: GPU-accelerated inference
  • OpenAI compatible: Swap cloud AI for local with one URL change
  • Offline: Works without internet after model download

Need custom AI tools or local LLM integrations? I build developer tools. Check out my web scraping actors on Apify or reach out at spinov001@gmail.com for custom solutions.

Top comments (0)