DEV Community

Lingdas1
Lingdas1

Posted on • Originally published at github.com

Getting Started: Run Your First Local LLM in 5 Minutes

01 — Getting Started: Run Your First Local LLM (5 Minutes)

🟢 Beginner — No experience needed. Just a computer and 5 minutes.


What Is a Local LLM? (Plain English)

An LLM (Large Language Model) is the brain behind ChatGPT, Claude, and Gemini.

A local LLM runs that brain on your own computer — not on someone else's server.

Why does that matter?

Cloud AI (ChatGPT, Claude) Local AI (Ollama + models)
$20–$200/month subscription $0 — completely free
Your data is sent to their servers Private — everything stays on your machine
Requires internet Works offline
Censored, filtered, rate-limited No limits — you control everything
One-size-fits-all model Choose any model for any task

💡 Think of it this way: Cloud AI is like renting a car. Local AI is like owning a bicycle. The bicycle is slower, but it's yours, it's free, and nobody can take it away from you.


What You Need

Minimum requirements:

  • A computer (Windows, macOS, or Linux)
  • At least 8 GB of RAM (16 GB recommended)
  • A few GB of free disk space

Nice to have (but not required):

  • A GPU with 4+ GB VRAM (models run faster, but CPU is fine to start)

My setup: I'm running this on a [your hardware] with [your specs]. If it works for me, it'll work for you.


Step 1: Install Ollama

Ollama is the easiest way to run local LLMs. Think of it as the "App Store for AI models."

macOS

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Linux

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Windows

Download the installer from ollama.com/download and run it.

Verify Installation

Open a new terminal and type:

ollama --version
Enter fullscreen mode Exit fullscreen mode

You should see something like:

ollama version 0.6.0
Enter fullscreen mode Exit fullscreen mode

🔥 Pro tip: If you get "command not found" on Linux/macOS, restart your terminal or run: export PATH=$PATH:/usr/local/bin


Step 2: Pull Your First Model

Now for the fun part — downloading an actual AI brain to run on your computer.

ollama pull qwen2.5:7b
Enter fullscreen mode Exit fullscreen mode

This downloads a 4.7 GB model. On a typical internet connection, it takes 2–5 minutes.

While it downloads, here's what's happening:

  • Ollama is downloading a GGUF file (the compressed model format)
  • It's auto-detecting your GPU
  • It's setting up the inference engine

What if the download is too big? Try a smaller model:

# For 8 GB RAM laptops — works on almost anything
ollama pull qwen2.5:1.5b

# For 4 GB RAM or very old computers
ollama pull qwen2.5:0.5b
Enter fullscreen mode Exit fullscreen mode

Step 3: Chat With Your Model

ollama run qwen2.5:7b
Enter fullscreen mode Exit fullscreen mode

You'll see a prompt like >>>. Type something:

>>> Write a haiku about a cat sitting on a computer
Enter fullscreen mode Exit fullscreen mode

The model will think for a moment and then respond. Congratulations — you just ran an AI on your own hardware! 🎉

Try These First Commands

>>> Write a Python function to calculate fibonacci

>>> Explain quantum computing like I'm 10

>>> What's the meaning of life?

>>> /? -- show all available commands

>>> /exit -- quit the chat
Enter fullscreen mode Exit fullscreen mode

⚠️ Expect it to be slower than ChatGPT. That's normal! Local models run at 15–40 tokens per second on a GPU, or 2–6 tok/s on CPU. It's still faster than most people read.


Step 4: Choose the Right Model for Your Hardware

Not sure which model to pick? Use this decision tree:

Your GPU VRAM?
├── No GPU (CPU only)
│   ├── 32 GB RAM → qwen2.5:7b (slow but works)
│   ├── 16 GB RAM → qwen2.5:1.5b
│   └── 8 GB RAM  → qwen2.5:0.5b
├── 4–6 GB VRAM   → qwen2.5:7b
├── 8–12 GB VRAM  → deepseek-r1:14b (🟢 BEST for most people)
├── 12–16 GB VRAM → deepseek-r1:32b
├── 24 GB VRAM    → qwen3.6:27b or deepseek-r1:32b (Q4)
└── 36+ GB VRAM   → deepseek-r1:70b or qwen2.5:72b
Enter fullscreen mode Exit fullscreen mode

Model Comparison Table

Model Ollama Command Size (Disk) Min RAM Min VRAM Quality
Qwen 2.5:0.5B ollama pull qwen2.5:0.5b 0.5 GB 4 GB None Basic text
Qwen 2.5:1.5B ollama pull qwen2.5:1.5b 1.1 GB 8 GB None Simple tasks
Qwen 2.5:7B ollama pull qwen2.5:7b 4.7 GB 8 GB 4 GB 🟢 Good start
Qwen 2.5:14B ollama pull qwen2.5:14b 9.0 GB 16 GB 8 GB Excellent
DeepSeek-R1:14B ollama pull deepseek-r1:14b 8.2 GB 16 GB 8 GB 🏆 Best value
DeepSeek-R1:32B ollama pull deepseek-r1:32b 18.7 GB 32 GB 16 GB Near o1 level
Qwen 3.6:27B ollama pull qwen3.6:27b 15 GB 32 GB 16 GB Cutting-edge
Llama 4:8B ollama pull llama4 4.9 GB 8 GB 4 GB Good general

My recommendation for first-timers: Start with qwen2.5:7b. It runs on almost anything, and it's good enough to be genuinely useful.


What to Do After Your First Chat

You've run your first local LLM. Now what?

Next steps in order:

# Task Why Guide
1 Customize your model with a Modelfile Control temperature, context length, and behavior GGUF & Modelfile Guide
2 Install Open WebUI Get a ChatGPT-like web interface instead of the terminal Open WebUI Setup
3 Benchmark your hardware See what speeds your setup can achieve Script: ./scripts/ollama-benchmark.sh
4 Add document search (RAG) Let your LLM answer questions about your own files RAG Guide
5 Try a reasoning model Switch to DeepSeek-R1 for harder problems DeepSeek-R1 Guide

Common First-Timer Problems (And Fixes)

Problem Why Fix
"ollama: command not found" Ollama not in PATH Restart terminal, or run: export PATH=$PATH:/usr/local/bin
Download is very slow Big file on slow internet Try ollama pull qwen2.5:1.5b instead (much smaller)
Model responds very slowly Running on CPU This is normal! See speed expectations in the table above
Model responds in Chinese Default template includes Chinese Add SYSTEM "Always respond in English." to a Modelfile
"CUDA out of memory" Model too big for your GPU Use a smaller model or lower quantization
"Connection refused" Ollama server not running Run ollama serve in a separate terminal first

Quick Reference: Common Ollama Commands

# List all downloaded models
ollama list

# Show currently running models
ollama ps

# Delete a model to free space
ollama rm qwen2.5:7b

# Update a model to the latest version
ollama pull qwen2.5:7b

# Run a model with a one-shot prompt (non-interactive)
ollama run qwen2.5:7b "Write a Python script to download images from a URL"

# Use the API (OpenAI compatible)
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen2.5:7b", "messages": [{"role": "user", "content": "Hello!"}]}'
Enter fullscreen mode Exit fullscreen mode

Your First Week Plan

Day Task Time
Day 1 Install Ollama + pull a model + chat with it 5 minutes ✅
Day 2 Try different models (small vs large) 15 minutes
Day 3 Customize with a Modelfile 30 minutes
Day 4 Install Open WebUI 30 minutes
Day 5 Ask your LLM to write code or help with real work 1 hour
Weekend Try RAG — let your LLM read your documents 1 hour

🎯 You've taken the first step. Running a local LLM is like learning to ride a bike — wobbly at first, but once you get it, you'll wonder why you didn't start sooner.

Found this helpful? ⭐ Star the repo — it helps others find it too.

— Ling, a medical student who accidentally fell into AI and wants to help you do the same.

Top comments (0)