DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free API — Heres How to Run LLMs Locally Without OpenAI

Ollama lets you run LLMs locally — Llama 3, Mistral, Gemma, Phi, CodeLlama — with a single command. OpenAI-compatible API, zero cloud costs, complete privacy.

Why Ollama?

  • One command: ollama run llama3 and you're chatting
  • OpenAI-compatible: Same API format
  • Private: Data never leaves your machine
  • Free: No API keys, no costs
  • GPU + CPU: Works on both
  • 100+ models: Llama 3, Mistral, Gemma, Phi, CodeLlama

Install

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Run a Model

# Download and chat
ollama run llama3.1

# Specific size
ollama run llama3.1:70b

# Code model
ollama run codellama:34b
Enter fullscreen mode Exit fullscreen mode

REST API: Chat

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

REST API: Generate

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Write a Python function to sort a list",
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

OpenAI-Compatible Endpoint

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // required but unused
});

const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Python

import openai

client = openai.OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

response = client.chat.completions.create(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Explain recursion'}]
)
print(response.choices[0].message.content)
Enter fullscreen mode Exit fullscreen mode

Embeddings

curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "The quick brown fox jumps over the lazy dog"
}'
Enter fullscreen mode Exit fullscreen mode

Create Custom Model

# Modelfile
FROM llama3.1
SYSTEM You are a helpful coding assistant. Always provide code examples.
PARAMETER temperature 0.3
PARAMETER num_ctx 4096
Enter fullscreen mode Exit fullscreen mode
ollama create code-helper -f Modelfile
ollama run code-helper
Enter fullscreen mode Exit fullscreen mode

List Models

curl http://localhost:11434/api/tags
Enter fullscreen mode Exit fullscreen mode

Real-World Use Case

A startup prototyped their AI feature with GPT-4 ($2,000/mo). Before launch, they switched to Ollama + Llama 3.1 70B on their own GPU server. Same quality for their use case, $0/mo API costs, and customer data stays private — crucial for their healthcare clients.


Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.

Top comments (0)