DEV Community

Cover image for Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)
Vishnu Damwala
Vishnu Damwala

Posted on • Originally published at meshworld.in

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)

TL;DR

  • Ollama runs open LLMs locally: Llama 3.3, Mistral, Gemma, DeepSeek, Qwen, Phi, and vision models
  • ollama run llama3.3 โ€” pull and start in one command
  • REST API on localhost:11434 โ€” chat completions, generate, embeddings
  • Python and JavaScript libraries for integration
  • Modelfile for customizing system prompts, temperature, and context window

Ollama made local LLMs accessible to anyone who could run a Docker container. No Hugging Face model downloads to manage manually. No inference server configuration. ollama run llama3.3 downloads the model and opens a chat session. It has become the standard way to run open models locally, with a library of thousands of community models on top of the official catalog.


Quick Start

# Install (macOS, Linux, Docker)
curl -fsSL https://ollama.com/install.sh | sh

# Pull and run a model
ollama run llama3.3

# Run a single prompt
ollama run llama3.3 "Explain Docker containers in one sentence."

# REST API
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.3:8b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

Popular Models (May 2026)

Model Size Memory
llama3.3:70b 70B ~128GB VRAM
llama3.3:8b 8B ~8GB VRAM
deepseek-r1:8b 8B ~8GB VRAM
qwen3:8b 8B ~8GB VRAM
gemma3:4b 4B ~4GB VRAM
phi4:3.8b 3.8B ~4GB VRAM
llava:7b 7B ~12GB VRAM

Python Integration

pip install ollama
Enter fullscreen mode Exit fullscreen mode
import ollama

response = ollama.chat(
    model='llama3.3:8b',
    messages=[
        {'role': 'system', 'content': 'You are a code reviewer.'},
        {'role': 'user', 'content': 'Review: def add(a, b): return a + b'},
    ]
)
print(response['message']['content'])

# Streaming
for chunk in ollama.chat(model='llama3.3:8b',
                          messages=[{'role': 'user', 'content': 'Hello'}]):
    print(chunk['message']['content'], end='', flush=True)
Enter fullscreen mode Exit fullscreen mode

Originally published at meshworld.in

Top comments (0)