DEV Community

Alex Spinov
Alex Spinov

Posted on

Ollama Has a Free API — Here's How to Run LLMs Locally and Query Them

Ollama lets you run large language models locally on your machine. It provides a REST API compatible with OpenAI's format — completely free, no API keys needed.

Installation

curl -fsSL https://ollama.com/install.sh | sh
# or download from ollama.com
Enter fullscreen mode Exit fullscreen mode

Pull and Run Models

# Pull a model
ollama pull llama3.2
ollama pull codellama
ollama pull mistral

# Chat in terminal
ollama run llama3.2 "Explain web scraping in 3 sentences"
Enter fullscreen mode Exit fullscreen mode

REST API — Generate

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  body: JSON.stringify({
    model: "llama3.2",
    prompt: "Write a Python function to scrape a webpage",
    stream: false
  })
});
const data = await response.json();
console.log(data.response);
Enter fullscreen mode Exit fullscreen mode

Chat API (OpenAI Compatible)

const response = await fetch("http://localhost:11434/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "llama3.2",
    messages: [
      { role: "system", content: "You are a helpful coding assistant." },
      { role: "user", content: "How do I parse JSON in Go?" }
    ]
  })
});
const data = await response.json();
console.log(data.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Streaming Responses

const response = await fetch("http://localhost:11434/api/generate", {
  method: "POST",
  body: JSON.stringify({ model: "llama3.2", prompt: "Write a haiku about code", stream: true })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = JSON.parse(decoder.decode(value));
  process.stdout.write(chunk.response);
}
Enter fullscreen mode Exit fullscreen mode

Embeddings

const response = await fetch("http://localhost:11434/api/embeddings", {
  method: "POST",
  body: JSON.stringify({
    model: "llama3.2",
    prompt: "Web scraping best practices"
  })
});
const { embedding } = await response.json();
console.log(`Embedding dimensions: ${embedding.length}`);
Enter fullscreen mode Exit fullscreen mode

Model Management

ollama list              # List installed models
ollama show llama3.2     # Model details
ollama rm codellama      # Remove a model
ollama cp llama3.2 my-model  # Copy/customize
Enter fullscreen mode Exit fullscreen mode

Need to extract or automate web content at scale? Check out my web scraping tools on Apify — no coding required. Or email me at spinov001@gmail.com for custom solutions.

Top comments (0)