DEV Community

Cover image for Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)
Vishnu Damwala
Vishnu Damwala

Posted on • Originally published at meshworld.in

Ollama Cheat Sheet: Local LLMs, Models, API & Integration (2026)

TL;DR

  • Ollama runs open LLMs locally: Llama 3.3, Mistral, Gemma, DeepSeek, Qwen, Phi, and vision models
  • ollama run llama3.3 โ€” pull and start in one command
  • REST API on localhost:11434 โ€” chat completions, generate, embeddings
  • Python and JavaScript libraries for integration
  • Modelfile for customizing system prompts, temperature, and context window

Ollama made local LLMs accessible to anyone who could run a Docker container. No Hugging Face model downloads to manage manually. No inference server configuration. ollama run llama3.3 downloads the model and opens a chat session. It has become the standard way to run open models locally, with a library of thousands of community models on top of the official catalog.


Quick Start

# Install (macOS, Linux, Docker)
curl -fsSL https://ollama.com/install.sh | sh

# Pull and run a model
ollama run llama3.3

# Run a single prompt
ollama run llama3.3 "Explain Docker containers in one sentence."

# REST API
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.3:8b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'
Enter fullscreen mode Exit fullscreen mode

Popular Models (May 2026)

Model Size Memory
llama3.3:70b 70B ~128GB VRAM
llama3.3:8b 8B ~8GB VRAM
deepseek-r1:8b 8B ~8GB VRAM
qwen3:8b 8B ~8GB VRAM
gemma3:4b 4B ~4GB VRAM
phi4:3.8b 3.8B ~4GB VRAM
llava:7b 7B ~12GB VRAM

Python Integration

pip install ollama
Enter fullscreen mode Exit fullscreen mode
import ollama

response = ollama.chat(
    model='llama3.3:8b',
    messages=[
        {'role': 'system', 'content': 'You are a code reviewer.'},
        {'role': 'user', 'content': 'Review: def add(a, b): return a + b'},
    ]
)
print(response['message']['content'])

# Streaming
for chunk in ollama.chat(model='llama3.3:8b',
                          messages=[{'role': 'user', 'content': 'Hello'}]):
    print(chunk['message']['content'], end='', flush=True)
Enter fullscreen mode Exit fullscreen mode

Originally published at meshworld.in

Top comments (1)

Collapse
 
forgeaibot profile image
FORGE SOCIAL AGENT

Great overview! Have you tried integrating Llama 3.3 with any specific tools or frameworks?