TL;DR
Alibaba released Qwen3.6-35B-A3B on April 16, 2026 under Apache 2.0. It's a Mixture-of-Experts model with 35B total / 3B active parameters that:
- Sets a new record on Terminal-Bench 2.0 (51.5) — agentic coding SOTA
- Beats Claude Sonnet 4.5 on 4 core vision benchmarks
- Exposes an Anthropic Messages API compatible endpoint
- Runs locally on a single 24GB GPU at 196 tok/s
You can switch your entire Claude Code workflow to Qwen with two environment variables. I'm showing you how below.
Why You Should Care
If you're paying for Claude API and most of your usage is everyday coding (refactoring, bug fixes, doc writing), you're probably overspending by 15–16×. The cost math is straightforward:
| Model | Input /1M tokens | Output /1M tokens |
|---|---|---|
| qwen3.6-flash | $0.10 | $0.40 |
| claude-sonnet-4.5 | $3.00 | $15.00 |
At 100k tokens/day, that's ~$4.50/month vs ~$72/month. More importantly: the benchmarks say Qwen beats Claude on agentic coding tasks.
The Integration
Qwen3.6's endpoint speaks Anthropic Messages API natively. Claude Code CLI just needs to be told where to send requests.
Step 1: Grab a Model Studio API key
Sign up at modelstudio.alibabacloud.com, create a key, save it to your env.
echo "DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx" >> ~/.zshrc
source ~/.zshrc
Step 2: Point Claude Code at Qwen
export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic"
export ANTHROPIC_MODEL="qwen3.6-flash"
export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY"
Step 3: Verify
claude code chat "Write a Python FizzBuzz."
If it works, you're done. Your existing agents, skills, tools — all of them now run on Qwen.
A Toggle Script You'll Actually Use
In practice you want Qwen for routine work and Claude for deep reasoning. Here's the switch script I use:
#!/bin/bash
# ~/.claude/scripts/switch-provider.sh
# Usage: source switch-provider.sh [claude|qwen]
case "$1" in
claude)
export ANTHROPIC_BASE_URL="https://api.anthropic.com"
export ANTHROPIC_MODEL="claude-opus-4-5-20250210"
export ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY_CLAUDE"
echo "→ Claude Opus 4.5"
;;
qwen)
export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic"
export ANTHROPIC_MODEL="qwen3.6-flash"
export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY"
echo "→ Qwen3.6 Flash"
;;
esac
source ~/.claude/scripts/switch-provider.sh qwen # routine work
source ~/.claude/scripts/switch-provider.sh claude # architecture decisions
The Benchmarks That Matter
Agentic coding:
SWE-bench Verified: 73.4 (Qwen3.6) vs 52.0 (Gemma4-31B)
Terminal-Bench 2.0: 51.5 (Qwen3.6) vs 42.9 (Gemma4-31B)
QwenWebBench: 1397 (Qwen3.6) vs 1068 (Qwen3.5) — +30.9%
The QwenWebBench jump is the sleeper hit. That's browser-driven agent reliability. If you're building with browser-use or a similar stack, this is a huge step.
Vision (vs Claude Sonnet 4.5):
RealWorldQA: 85.3 vs 70.3 (+15.0)
MMMU: 81.7 vs 79.6 (+2.1)
OmniDocBench 1.5: 89.9 vs 85.8 (+4.1)
First open-weight model I've seen beat Claude on multiple vision benchmarks simultaneously.
Local Deployment (vLLM)
If you can't route dev data through Alibaba Cloud, self-host.
# 1. Download weights (~70GB)
huggingface-cli download Qwen/Qwen3.6-35B-A3B \
--local-dir ~/models/qwen3.6-35b-a3b
# 2. Serve with vLLM + FP8 to fit 24GB VRAM
pip install vllm==0.8.0
vllm serve ~/models/qwen3.6-35b-a3b \
--quantization fp8 \
--max-model-len 131072 \
--gpu-memory-utilization 0.92 \
--port 8000 \
--served-model-name qwen3.6-35b-a3b
Then point OpenClaw / Claude Code at http://localhost:8000/v1.
Expected throughput:
| Hardware | Quant | tok/s |
|---|---|---|
| RTX 4090 (24GB) | FP8 | 196 |
| M4 Max (64GB) | Q4_K_M | 65 |
| M3 Ultra (192GB) | FP16 | 95 |
Preserve_thinking: The Feature Nobody's Talking About
Qwen3.6 has a preserve_thinking header that carries the thinking trace across turns. For agentic workflows (multi-step refactoring, long debugging sessions), this is a game-changer. Turn it on:
export CLAUDE_PRESERVE_THINKING=1
In my limited testing, this cut my average "number of turns to solve a bug" by about 20%.
What I'd Still Use Claude For
Being honest: Qwen doesn't replace Claude for everything.
- Architecture decisions — Claude's "why not X?" reasoning is still clearly ahead
- Cultural/linguistic nuance — especially for non-English creative work
- Long-form creative writing — Claude feels more "alive"
Everything else — yes, Qwen is now my default.
Verdict
This is the first open-source release that makes a concrete economic case to switch. Not "maybe in a few months." Not "once they fix X." Today, with two env vars.
If you're running Claude Code daily, do the A/B test this week. I'd be genuinely surprised if Qwen doesn't handle 70%+ of your workflow at a fraction of the cost.
Official sources:
- Announcement: https://qwen.ai/blog?id=qwen3.6-35b-a3b
- GitHub: https://github.com/QwenLM/Qwen3.6
- Weights:
huggingface.co/Qwen/Qwen3.6-35B-A3B
What's your experience been? Drop your Qwen3.6 vs Claude comparisons in the comments — I'm curious how it holds up across different codebases and workflows.
Top comments (0)