DEV Community

brian austin
brian austin

Posted on

I tried running Qwen3.6 locally for a week — here's why I went back to a $2/month API

I tried running Qwen3.6 locally for a week — here's why I went back to a $2/month API

Qwen3.6-35B-A3B is genuinely impressive. 917 points on Hacker News. An open-source model that rivals GPT-4 on several benchmarks. The dream of free, private, local AI — finally real?

I spent a week running it. Here's what actually happened.

The Setup

I have a decent machine — 32GB RAM, RTX 4070 Ti. Not top-tier, but respectable. Enough to run a 35B model at Q4 quantization.

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull Qwen3.6
ollama pull qwen3.6:35b-a3b

# Run it
ollama run qwen3.6:35b-a3b
Enter fullscreen mode Exit fullscreen mode

First response: 47 seconds.

For a simple "explain this function" prompt.

The Real Costs

Electricity

RTX 4070 Ti at full load: ~285W. Plus CPU, fans, RAM: ~380W total.

A week of 4 hours/day usage:

  • 380W × 4h × 7 days = 10.6 kWh
  • At $0.12/kWh = $1.27 just in electricity

That's already halfway to $2/month in one week of casual use.

If you're in a country with higher electricity costs (Germany: $0.36/kWh, Australia: $0.29/kWh), that week of local inference costs $3.80 — almost 2 months of a $2/month API subscription.

Time Tax

Every prompt: 15-47 seconds to first token. Compare:

  • $2/month Claude API via SimplyLouie: 1.2 seconds average
  • Local Qwen3.6: 15-47 seconds

If you run 20 prompts per day, you're spending 5-15 minutes just waiting for responses. That's 35-105 minutes per week.

At even minimum wage, that's real money.

Thermal Impact

My GPU temps: 83°C sustained. That's not great for long-term hardware longevity. GPU degradation is real — high sustained temps accelerate VRAM degradation.

The "free" model has hidden hardware depreciation costs.

When Local AI Actually Makes Sense

I'm not anti-local-AI. There are legitimate use cases:

  • Sensitive data you cannot send to any cloud (medical records, classified code, NDA'd projects)
  • Offline development (no internet, air-gapped systems)
  • Research/experimentation (you want to understand the model internals)
  • High-volume batch processing (millions of prompts where API costs would be enormous)

But for most developers doing daily coding work? The economics don't hold.

The Actual Math for Daily Dev Work

Let me be direct about what I use AI for daily:

  • Code review (explain what this function does)
  • Writing tests (generate Jest tests for this module)
  • Debugging (why is this throwing a TypeError)
  • Documentation (write JSDoc for this API)
  • Email drafts (professional response to client complaint)

For these tasks:

Option Cost Speed Privacy
ChatGPT Plus $20/month Fast Low
Local Qwen3.6 "Free" + electricity + hardware wear Slow High
$2/month Claude API $2/month Fast Medium
Run nothing $0 N/A Perfect

The $2/month API wins on the cost/speed tradeoff for casual daily use.

What I Actually Use Now

I kept Ollama installed. I use it for things that are genuinely sensitive — API keys in code, client NDA work, anything I wouldn't want trained on.

For everything else, I use SimplyLouie's $2/month Claude API — it's Claude Sonnet via a simple REST API, no rate limit gymnastics, no subscription tier confusion.

curl https://simplylouie.com/api/chat \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message": "explain this function", "context": "const fn = (x) => x.reduce((a,b) => a+b, 0)"}'
Enter fullscreen mode Exit fullscreen mode

Response in 1.2 seconds. $2/month total. It runs on Anthropic's infrastructure, not my GPU.

The Honest Verdict

Qwen3.6 is a landmark model. The open-source AI community deserves credit — this is genuinely impressive work.

But "open source" and "free" are different things. Running a 35B model locally has real costs: electricity, time, hardware wear, setup complexity.

For developers in markets where $20/month ChatGPT is genuinely unaffordable — Nigeria, Philippines, Indonesia, Kenya, India — a $2/month hosted API is often the better tradeoff than local inference on aging hardware.

  • Nigeria: ₦3,200/month (vs ₦32,000 for ChatGPT)
  • Philippines: ₱112/month (vs ₱1,120 for ChatGPT)
  • Indonesia: Rp32,000/month (vs Rp320,000 for ChatGPT)
  • Kenya: KSh260/month (vs KSh2,600 for ChatGPT)
  • India: ₹165/month (vs ₹1,600+ for ChatGPT)

If you have the hardware and the use case, run local. If you're doing daily dev work and want fast, cheap, reliable AI — the $2/month API is the more honest choice.


SimplyLouie is a $2/month Claude API. 50% of revenue goes to animal rescue. simplylouie.com

Top comments (0)