"Just use ChatGPT" — sure, until your API bill hits $500/month.
I've been running both local and cloud AI for over a year. Here are the real numbers.
The Test Setup
Cloud: OpenAI GPT-4o, Anthropic Claude Sonnet, Google Gemini Pro
Local: Ollama with Qwen 3.5 9B (Mac Mini M4) + Qwen 3 Coder 30B (RTX 3060)
Workload: ~500 queries/day — code review, content generation, customer support, data analysis.
Monthly Cloud API Costs
For 500 queries/day:
- OpenAI GPT-4o (200 queries): ~$90/month
- Anthropic Claude Sonnet (200 queries): ~$72/month
- Google Gemini Pro (100 queries): ~$25/month
- Total: ~$187/month
Monthly Local Setup Costs
- Mac Mini M4 (already owned): $0
- RTX 3060 12GB (used, eBay): $150 one-time
- Electricity 24/7: ~$12/month
- Total: ~$12/month ongoing
Break-even: less than 1 month.
Quality Comparison (What Surprised Me)
For 80% of daily tasks, local models are good enough:
- General chat: Qwen 3.5 9B is roughly GPT-4o quality (~90%)
- Code generation: Qwen 3 Coder 30B is close to Claude Sonnet (~85-90%)
- Simple Q&A and extraction: any 7B model matches cloud (~95%+)
- Complex multi-step reasoning: cloud still wins here
The Hybrid Approach I Use
User query
-> Simple? (Q&A, formatting, extraction)
-> Local Qwen 3.5 9B (free, instant)
-> Code-heavy?
-> Local Qwen 3 Coder 30B (free, ~12s)
-> Complex reasoning?
-> Cloud Claude Sonnet ($0.003-0.015 per query)
Result: cloud costs dropped from ~$187/month to ~$25/month.
Hidden Costs of Cloud
Things people forget:
- Rate limits — hit the ceiling during a deadline? Too bad.
- Latency — 500-2000ms per request vs 100-500ms local
- Privacy — your code and data live on someone else's server
- Vendor lock-in — OpenAI changes pricing, you're stuck
- Downtime — their outage = your workflow stops
Hidden Costs of Local
Being fair:
- Initial hardware — $150-500 for a GPU (pays off in under a month)
- Setup time — 30 minutes with Ollama these days
- Storage — models are 4-40GB each
- Power — $10-15/month for 24/7 operation
- No frontier models — you won't run GPT-4 locally yet
Getting Started in 10 Minutes
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model
ollama pull qwen3.5:9b
# Start chatting
ollama run qwen3.5:9b
Total time: 10 minutes. Total cost: $0.
Need help setting up a local AI server? I do this professionally.
Follow along: Telegram @celebibot_en
Sam Hartley — building AI things that actually work.
Top comments (0)