Anthropic charges $25 per million output tokens for Claude Opus 4.7. That's their new flagship coding model, released today. It's good — 13% better than Opus 4.6 on coding benchmarks, improved vision, stronger at multi-step agentic work.
Meanwhile, also this week: Alibaba released Qwen3.6-35B-A3B under Apache 2.0. Scores 73.4 on SWE-bench Verified. Runs on an 8 GB GPU. Costs nothing.
Two models. Same week. Completely opposite philosophies. Let's break down what's actually happening.
the cloud tax is getting harder to justify
When GPT-4 launched in 2023, there was nothing local that came close. Paying for API access made sense because there was no alternative.
In 2024, open models started catching up. Llama 3, Qwen 2.5, Mistral — good enough for many tasks, but still clearly behind frontier models on the hard stuff.
In 2026, the gap has narrowed to the point where you have to really think about whether the remaining difference is worth $25 per million output tokens.
Here's a concrete example. A developer using Opus 4.7 as their primary coding agent, running maybe 50 complex coding sessions a day:
- Average session: ~10K input tokens (code context) + ~5K output tokens (response)
- 50 sessions: 500K input + 250K output tokens
- Daily cost: $2.50 + $6.25 = $8.75/day
- Monthly: ~$190/month just for one developer
Now scale that to a team of 5. That's nearly $1,000/month on AI coding assistance.
The same team could buy a single RTX 4070 ($550 one-time) and run Qwen3.6 at 20+ tokens/second with zero ongoing costs.
what you actually get for $0
Qwen3.6-35B-A3B isn't just "a free model." It's specifically designed for the exact use case Opus 4.7 targets — coding agents:
Agentic coding benchmarks:
- SWE-bench Verified: 73.4 (fix real bugs in real repos autonomously)
- Terminal-Bench 2.0: 51.5 (operate a terminal to solve coding tasks)
- MCPMark: 37.0 (tool calling and agent protocols)
- QwenWebBench: 1397 Elo (frontend artifact generation)
Architecture advantages for local deployment:
- MoE: 35B total params, 3B active — runs like a small model, thinks like a big one
- Gated DeltaNet: 3 of 4 layers use linear attention — memory efficient on long contexts
- Native vision: understand screenshots, diagrams, code images without a separate model
- 262K context: plenty for most codebase contexts
What you give up vs Opus 4.7:
- Probably some edge on the hardest 10% of tasks
- Anthropic's specific safety/self-verification features
- The polish of a model trained with massive RLHF compute
- Cloud convenience (no GPU needed)
What you gain:
- Your code never leaves your machine
- No rate limits, no outages, no API key management
- No per-token costs, ever
- Full control over the model behavior
- Works offline, on a plane, in an air-gapped environment
- Apache 2.0 — fine-tune it, modify it, deploy it commercially
the $25/M question
Opus 4.7 is genuinely impressive. Anthropic's coding models have been best-in-class for a while and this extends that lead. The self-verification feature — where the model checks its own work before reporting back — is particularly useful for autonomous workflows.
But the honest question every developer should ask is: for my specific tasks, does the delta between Opus 4.7 and Qwen3.6 justify the cost?
For a solo developer building a startup: probably not. Qwen3.6 handles 73.4% of real-world GitHub issues autonomously. That's more than enough for daily coding work.
For a large enterprise with strict compliance requirements and deep pockets: maybe. The convenience and Anthropic's enterprise features have real value.
For anyone processing sensitive code: local wins by default. No amount of ToS promises equals "the data literally never left my hardware."
how to try both and decide
Opus 4.7:
API key from anthropic.com
Model: claude-opus-4-7
$5/M input, $25/M output
Qwen3.6 locally:
ollama run qwen3.6:35b-a3b
Or for a complete setup with a coding agent, vision, and tool calling — Locally Uncensored v2.3.3 supports both. Connect Anthropic's API for Opus 4.7 when you need it, run Qwen3.6 locally for everything else. Switch between them in the same interface. Best of both worlds.
where this is heading
The pattern is clear. Every 3-4 months, a new open model appears that matches the paid frontier model from 6 months ago. The cost of "good enough" is trending toward zero.
Anthropic, OpenAI, and Google will keep pushing the frontier. Open models will keep closing the gap. And the developers in the middle will increasingly ask: "Is the remaining gap worth $25 per million tokens?"
Today, for most coding tasks, the answer is already no.
Locally Uncensored — open-source desktop app for running AI locally. Supports cloud APIs AND local models. Chat, coding agents, image gen, video gen. AGPL-3.0.
Top comments (0)