You don't need a $10K rig or a cloud subscription to run AI locally. These five models work on a normal laptop — I tested them on a ThinkPad with 16GB RAM.
1. Phi-3 Mini (3.8B parameters)
Microsoft's tiny powerhouse. It outperforms models twice its size on reasoning tasks.
ollama pull phi3:mini
Use case: Quick Q&A, brainstorming, text formatting.
RAM needed: ~4GB
Speed on my laptop: ~25 tokens/sec
I use this daily for "how do I..." questions where I'd normally Google.
2. Mistral 7B
The best 7B model, period. French engineering at its finest.
ollama pull mistral:7b
Use case: General-purpose chat, summarization, translation.
RAM needed: ~6GB
Speed on my laptop: ~15 tokens/sec
Mistral punches way above its weight. I've had it write passable Python, explain complex papers, and draft emails. It's not GPT-4, but it's free and private.
3. CodeLlama 7B
Meta's code specialist. If you only install one model, make it this one.
ollama pull codellama:7b
Use case: Code completion, debugging, explaining code.
RAM needed: ~6GB
Speed on my laptop: ~18 tokens/sec
Feed it a function and ask "what's wrong here?" — it catches bugs about 70% of the time. Not perfect, but free and instant.
4. LLaVA (7B)
This one handles images. Show it a screenshot, a diagram, or a photo and ask questions about it.
ollama pull llava:7b
Use case: Image analysis, OCR, diagram understanding.
RAM needed: ~8GB
Speed on my laptop: ~10 tokens/sec
I've used it to read text from screenshots, explain architecture diagrams, and even debug CSS by showing it a screenshot of the broken layout.
5. Whisper (via faster-whisper)
Not Ollama-based, but essential. Local speech-to-text that's nearly as good as cloud services.
pip install faster-whisper
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe("audio.mp3")
for segment in segments:
print(segment.text)
Use case: Transcription, subtitles, voice notes.
RAM needed: ~2GB for the "base" model
Honorable Mentions
- Gemma 2B — Google's tiny model, great for very low-RAM machines
- Qwen2 7B — Excellent multilingual support
- DeepSeek Coder 6.7B — Strong alternative to CodeLlama
Performance Tips
-
Use quantized models —
q4_0orq4_K_Mvariants are 2-3x faster with minimal quality loss - Close other apps — AI models compete with your browser for RAM
- SSD matters — Model loading is disk-bound. NVMe > SATA > HDD
The Full Stack
For a complete guide to building your local AI stack (including image generation, voice cloning, and more), check privacy-ai-guide.vercel.app.
And if you occasionally need cloud-level quality without cloud-level tracking, NanoGPT lets you access top models with crypto payments and zero accounts.
The era of needing a cloud subscription for AI is over. These models prove it.
Top comments (0)