DEV Community

David
David

Posted on

i cancelled my AI subscriptions. qwen3.6 on my own GPU does the same thing for free.

You're paying $20/month for ChatGPT. $10 for Copilot. Maybe another $20 for Midjourney. And every prompt you type goes through someone else's server.

Meanwhile, Alibaba just open-sourced a model that scores 73.4 on SWE-bench Verified — the benchmark where an AI autonomously reads a GitHub issue, understands the codebase, writes a fix, and runs the tests. That's frontier-level coding ability. And it runs on your gaming laptop.

the model

Qwen3.6-35B-A3B. It's a Mixture-of-Experts model: 35 billion parameters total, but only 3 billion active per token. Your GPU loads 9 experts per token (8 routed + 1 shared) out of 256 total. The rest sit idle.

Result: it runs like a 3B model but thinks like a 30B+ model.

Apache 2.0 license. No usage restrictions. No rate limits. No one reading your code.

what your $0/month gets you

Let's do the math on what you're replacing:

ChatGPT Plus ($20/month) — Qwen3.6 scores 86.0 on GPQA Diamond (graduate-level reasoning), 83.6 on HMMT (Harvard-MIT Math Tournament), and handles 119 languages. It has vision built in — drag an image into the chat and ask questions about it. For most daily tasks, you won't notice a difference. For coding tasks, this model is arguably better than GPT-4 for the stuff you actually do (fixing bugs, writing functions, understanding codebases).

GitHub Copilot ($10/month) — 73.4 on SWE-bench means this model can autonomously fix real bugs in real repositories. 51.5 on Terminal-Bench means it can operate a terminal to solve coding tasks. With the right frontend, it functions as a full coding agent, not just autocomplete.

Cloud API costs — no per-token pricing. Run it 24/7 on your own hardware. The model doesn't get slower during peak hours. It doesn't have outages. It doesn't change its behavior because the provider decided to add more safety filters.

the hardware you already own is enough

This is the part that surprises people. With Q4_K_M quantization:

  • 8 GB VRAM (RTX 3060, RTX 4060): runs at 30+ tokens/second
  • 12-14 GB VRAM (RTX 4070, RTX 3090): Q8 quantization, 20+ tok/s
  • Apple Silicon M1/M2/M3: runs great on unified memory

If you bought a GPU in the last 3-4 years, you probably have enough. The MoE architecture is the key — your GPU only processes 3B parameters per token regardless of the total model size.

the catch (being honest)

There are trade-offs. You should know them before you cancel anything:

  1. No real-time internet access — the model only knows what it was trained on. No "search the web" or "check the latest docs." You need to paste context manually or use RAG.

  2. Setup isn't zero — you need Ollama or a similar runtime, and a frontend. It's not "open a browser tab and start typing." More like 10-15 minutes to set up if you've never done it.

  3. Long context costs more locally — 262K native context is great on paper, but processing 100K+ tokens on consumer hardware gets slow. Cloud APIs hide this cost from you.

  4. No multimodal generation — Qwen3.6 can understand images (vision input) but can't generate them. For image generation you need a separate model (Stable Diffusion, Flux, etc.)

  5. Updates are manual — when a better model drops, you download and switch yourself. No silent upgrades.

For people who type "write me a poem" into ChatGPT twice a week, this is overkill. For developers, researchers, and anyone processing sensitive data — the trade-offs are overwhelmingly in favor of local.

the stack that replaces everything

Here's what a complete local setup looks like in 2026:

  • Chat + reasoning: Qwen3.6-35B-A3B (this article)
  • Image generation: Stable Diffusion 3.5, Flux, or SDXL via ComfyUI
  • Video generation: Wan 2.1, FramePack
  • Code completion: same Qwen3.6, connected as a coding agent
  • Speech-to-text: Whisper (runs on CPU)

Total cost after hardware you already own: $0/month. Forever.

Or use a tool that bundles all of this. Locally Uncensored wraps Ollama + ComfyUI into one desktop app — chat, image gen, video gen, coding agent. v2.3.3 has Qwen3.6 day-0 support with vision and a full agent mode. AGPL-3.0, open source.

the real question

It's not "is local AI good enough yet?" — it passed that threshold months ago.

The real question is: how much longer are you going to pay monthly fees to send your data to someone else's server when the same capability runs on hardware sitting under your desk?

Qwen3.6 weights: HuggingFace


Locally Uncensored — open-source desktop app for local AI. Chat, coding agents, image gen, video gen. No cloud, no subscription. AGPL-3.0.

Top comments (0)