Alex Spinov

Posted on Mar 29

Ollama Has a Free API — Heres How to Run LLMs Locally Without OpenAI

#ai #llm #opensource #tutorial

Ollama lets you run LLMs locally — Llama 3, Mistral, Gemma, Phi, CodeLlama — with a single command. OpenAI-compatible API, zero cloud costs, complete privacy.

Why Ollama?

One command: ollama run llama3 and you're chatting
OpenAI-compatible: Same API format
Private: Data never leaves your machine
Free: No API keys, no costs
GPU + CPU: Works on both
100+ models: Llama 3, Mistral, Gemma, Phi, CodeLlama

Install

curl -fsSL https://ollama.com/install.sh | sh

Run a Model

# Download and chat
ollama run llama3.1

# Specific size
ollama run llama3.1:70b

# Code model
ollama run codellama:34b

REST API: Chat

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Explain Docker in 3 sentences"}],
  "stream": false
}'

REST API: Generate

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.1",
  "prompt": "Write a Python function to sort a list",
  "stream": false
}'

OpenAI-Compatible Endpoint

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',  // required but unused
});

const response = await client.chat.completions.create({
  model: 'llama3.1',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

Python

import openai

client = openai.OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

response = client.chat.completions.create(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Explain recursion'}]
)
print(response.choices[0].message.content)

Embeddings

curl http://localhost:11434/api/embed -d '{
  "model": "nomic-embed-text",
  "input": "The quick brown fox jumps over the lazy dog"
}'

Create Custom Model

# Modelfile
FROM llama3.1
SYSTEM You are a helpful coding assistant. Always provide code examples.
PARAMETER temperature 0.3
PARAMETER num_ctx 4096

ollama create code-helper -f Modelfile
ollama run code-helper

List Models

curl http://localhost:11434/api/tags

Real-World Use Case

A startup prototyped their AI feature with GPT-4 ($2,000/mo). Before launch, they switched to Ollama + Llama 3.1 70B on their own GPU server. Same quality for their use case, $0/mo API costs, and customer data stays private — crucial for their healthcare clients.

Need to automate data collection? Check out my Apify actors for ready-made scrapers, or email spinov001@gmail.com for custom solutions.

DEV Community