I've been running AI models locally on a Mac Mini M4 (64GB unified memory) for three months straight. Not for fun — this machine runs my entire business automation 24/7.
Here's the honest breakdown of every model I've tested, what actually works, and when local LLMs are a waste of time.
The Setup
- Machine: Mac Mini M4 with 64GB unified memory
- Runtime: Ollama (dead simple, just works)
- Use case: Content generation, code review, summarization, translation
- Models tested: qwen3:30b, devstral-small-2, qwen3:14b, gemma3:27b, qwen3:8b, deepseek-r1:70b, llama3.1:70b
Total cost after 3 months: $0 in API fees. The machine paid for itself in month 2.
The Tier List (Brutal Honesty)
S-Tier: Daily Drivers
Qwen3 30B — The sweet spot. Fast enough for real-time use, smart enough for 90% of tasks. I use this for:
- Blog post drafts and rewrites
- Korean ↔ English translation (surprisingly good)
- Code explanation and documentation
- First-pass content review
Generation speed: ~25 tokens/sec on M4 64GB. That's fast enough to feel like a conversation, not a waiting game.
Gemma3 27B — Google's dark horse. Better than Qwen for:
- Structured data extraction
- Following complex formatting instructions
- Technical writing with specific constraints
Slightly slower than Qwen3 30B but more reliable at following instructions precisely.
A-Tier: Specialized Use
Devstral Small 2 — Mistral's coding model. When I need code-specific tasks:
- Refactoring suggestions
- Bug detection in Python/JS
- Generating test cases
Not great for general conversation, but for code? It punches way above its weight class.
Qwen3 14B — The "good enough" model. When 30B is overkill:
- Quick summaries
- Simple translations
- Template filling
Runs at ~40 tokens/sec. For batch processing 50 product descriptions? This is the one.
B-Tier: Impressive but Impractical
DeepSeek-R1 70B — The thinking model. It's genuinely smart. The chain-of-thought reasoning is impressive. But:
- ~8 tokens/sec on 64GB (memory pressure is real)
- Takes 30-60 seconds just to start generating
- Eats all your RAM — nothing else runs smoothly
I use it maybe once a week for complex analysis. The rest of the time? Qwen3 30B at 3x the speed gives 95% of the quality.
Llama 3.1 70B — Meta's flagship. Similar problem:
- Too slow for interactive use
- Great quality, terrible experience
- Swap death if you try to multitask
C-Tier: Skip It
Qwen3 8B — Too dumb for anything that matters. Saves RAM but the quality drop isn't worth it. If you need something this small, just use the API.
The Numbers That Matter
| Model | Speed (tok/s) | RAM Used | Quality (1-10) | Daily Use? |
|---|---|---|---|---|
| Qwen3 30B | ~25 | 22GB | 8 | ✅ Primary |
| Gemma3 27B | ~22 | 20GB | 8 | ✅ Formatting |
| Devstral Small | ~35 | 12GB | 7 (code: 9) | ✅ Code only |
| Qwen3 14B | ~40 | 11GB | 7 | ✅ Batch jobs |
| DeepSeek-R1 70B | ~8 | 45GB | 9.5 | ⚠️ Weekly |
| Llama 3.1 70B | ~10 | 42GB | 9 | ❌ Retired |
| Qwen3 8B | ~55 | 6GB | 5 | ❌ Too weak |
When Local LLMs Are a Waste of Time
Let me save you the experimentation:
Don't bother with local if:
- You need GPT-4/Claude-level reasoning consistently
- Your tasks require real-time conversation with users
- You're processing images or audio (multimodal local = pain)
- You need the model to stay updated on current events
Local absolutely wins when:
- Privacy matters (financial data, personal info)
- You're doing batch processing (translate 200 descriptions = $0)
- Uptime is critical (no API outages, no rate limits)
- You're iterating fast (no token counting, no billing anxiety)
The Hidden Benefit Nobody Talks About
When AI costs $0, you use it differently. I run my LLM on every commit message, every blog draft, every product description — because why not? There's no meter running.
With APIs, I'd think twice about "wasting" tokens on a commit message. With local? I generate 5 variations and pick the best one. The quality compound effect is massive.
My Actual Daily Workflow
6 AM: Qwen3 30B generates blog drafts from outlines
9 AM: Devstral reviews overnight code changes
12 PM: Qwen3 14B batch-processes product descriptions
3 PM: Gemma3 27B formats and structures data exports
Night: DeepSeek-R1 70B analyzes weekly business metrics (runs while I sleep)
Total API cost: $0/month
Electricity: ~$8/month (Mac Mini M4 is stupidly efficient)
Should You Do This?
If you have a Mac with 32GB+: Yes, start with Ollama + Qwen3 (14B for 32GB, 30B for 64GB). You'll be shocked how capable it is.
If you have 16GB or less: Skip it. The experience is terrible. Just use the API.
If you're on Linux with an NVIDIA GPU: Even better. You'll get 2-3x the speed I get on Apple Silicon.
The $600 Mac Mini running local AI 24/7 was the best infrastructure investment I've made this year. Not because any single model beats GPT-4 — it doesn't. But because "free" and "always available" changes how you work.
I run 6 businesses from this Mac Mini using AI agents and local LLMs. If you're building your own automation stack, here are some resources that might help:
📦 The $0 Developer Playbook — The complete free toolkit I use daily
🎮 Indie Game Dev Complete Toolkit — If you're building games on a budget
Top comments (0)