Qwen3.6-35B-A3B: Swap Claude Code Backend With 2 Env Vars (and save 93% on inference)

TL;DR

Alibaba released Qwen3.6-35B-A3B on April 16, 2026 under Apache 2.0. It's a Mixture-of-Experts model with 35B total / 3B active parameters that:

Sets a new record on Terminal-Bench 2.0 (51.5) — agentic coding SOTA
Beats Claude Sonnet 4.5 on 4 core vision benchmarks
Exposes an Anthropic Messages API compatible endpoint
Runs locally on a single 24GB GPU at 196 tok/s

You can switch your entire Claude Code workflow to Qwen with two environment variables. I'm showing you how below.

Why You Should Care

If you're paying for Claude API and most of your usage is everyday coding (refactoring, bug fixes, doc writing), you're probably overspending by 15–16×. The cost math is straightforward:

Model	Input /1M tokens	Output /1M tokens
qwen3.6-flash	$0.10	$0.40
claude-sonnet-4.5	$3.00	$15.00

At 100k tokens/day, that's ~$4.50/month vs ~$72/month. More importantly: the benchmarks say Qwen beats Claude on agentic coding tasks.

The Integration

Qwen3.6's endpoint speaks Anthropic Messages API natively. Claude Code CLI just needs to be told where to send requests.

Step 1: Grab a Model Studio API key

echo "DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxx" >> ~/.zshrc
source ~/.zshrc

Step 2: Point Claude Code at Qwen

export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic"
export ANTHROPIC_MODEL="qwen3.6-flash"
export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY"

Step 3: Verify

claude code chat "Write a Python FizzBuzz."

If it works, you're done. Your existing agents, skills, tools — all of them now run on Qwen.

A Toggle Script You'll Actually Use

In practice you want Qwen for routine work and Claude for deep reasoning. Here's the switch script I use:

#!/bin/bash
# ~/.claude/scripts/switch-provider.sh
# Usage: source switch-provider.sh [claude|qwen]

case "$1" in
  claude)
    export ANTHROPIC_BASE_URL="https://api.anthropic.com"
    export ANTHROPIC_MODEL="claude-opus-4-5-20250210"
    export ANTHROPIC_API_KEY="$ANTHROPIC_API_KEY_CLAUDE"
    echo "→ Claude Opus 4.5"
    ;;
  qwen)
    export ANTHROPIC_BASE_URL="https://dashscope-intl.aliyuncs.com/apps/anthropic"
    export ANTHROPIC_MODEL="qwen3.6-flash"
    export ANTHROPIC_API_KEY="$DASHSCOPE_API_KEY"
    echo "→ Qwen3.6 Flash"
    ;;
esac

source ~/.claude/scripts/switch-provider.sh qwen     # routine work
source ~/.claude/scripts/switch-provider.sh claude   # architecture decisions

The Benchmarks That Matter

Agentic coding:

SWE-bench Verified:  73.4 (Qwen3.6)  vs  52.0 (Gemma4-31B)
Terminal-Bench 2.0:  51.5 (Qwen3.6)  vs  42.9 (Gemma4-31B)
QwenWebBench:        1397 (Qwen3.6)  vs  1068 (Qwen3.5)  — +30.9%

The QwenWebBench jump is the sleeper hit. That's browser-driven agent reliability. If you're building with browser-use or a similar stack, this is a huge step.

Vision (vs Claude Sonnet 4.5):

RealWorldQA:        85.3 vs 70.3   (+15.0)
MMMU:              81.7 vs 79.6   (+2.1)
OmniDocBench 1.5:  89.9 vs 85.8   (+4.1)

First open-weight model I've seen beat Claude on multiple vision benchmarks simultaneously.

Local Deployment (vLLM)

If you can't route dev data through Alibaba Cloud, self-host.

# 1. Download weights (~70GB)
huggingface-cli download Qwen/Qwen3.6-35B-A3B \
  --local-dir ~/models/qwen3.6-35b-a3b

# 2. Serve with vLLM + FP8 to fit 24GB VRAM
pip install vllm==0.8.0
vllm serve ~/models/qwen3.6-35b-a3b \
  --quantization fp8 \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.92 \
  --port 8000 \
  --served-model-name qwen3.6-35b-a3b

Then point OpenClaw / Claude Code at http://localhost:8000/v1.

Expected throughput:

Hardware	Quant	tok/s
RTX 4090 (24GB)	FP8	196
M4 Max (64GB)	Q4_K_M	65
M3 Ultra (192GB)	FP16	95

Preserve_thinking: The Feature Nobody's Talking About

Qwen3.6 has a preserve_thinking header that carries the thinking trace across turns. For agentic workflows (multi-step refactoring, long debugging sessions), this is a game-changer. Turn it on:

export CLAUDE_PRESERVE_THINKING=1

In my limited testing, this cut my average "number of turns to solve a bug" by about 20%.

What I'd Still Use Claude For

Being honest: Qwen doesn't replace Claude for everything.

Architecture decisions — Claude's "why not X?" reasoning is still clearly ahead
Cultural/linguistic nuance — especially for non-English creative work
Long-form creative writing — Claude feels more "alive"

Everything else — yes, Qwen is now my default.

Verdict

This is the first open-source release that makes a concrete economic case to switch. Not "maybe in a few months." Not "once they fix X." Today, with two env vars.

If you're running Claude Code daily, do the A/B test this week. I'd be genuinely surprised if Qwen doesn't handle 70%+ of your workflow at a fraction of the cost.

Official sources:

Announcement: https://qwen.ai/blog?id=qwen3.6-35b-a3b
GitHub: https://github.com/QwenLM/Qwen3.6
Weights: huggingface.co/Qwen/Qwen3.6-35B-A3B

What's your experience been? Drop your Qwen3.6 vs Claude comparisons in the comments — I'm curious how it holds up across different codebases and workflows.