TL;DR
Qwen3.6-Plus is now officially released. It scores 78.8% on SWE-bench Verified and 61.6% on Terminal-Bench 2.0, outperforming Claude Opus 4.5 on terminal tasks. Features include a 1M token context window, the new preserve_thinking parameter for agent loops, and seamless integration with Claude Code, OpenClaw, and Qwen Code via an OpenAI-compatible API.
From Preview to Release
If you read our previous guide on Qwen 3.6 Plus Preview on OpenRouter, you know what this model can deliver. The preview launched on March 30 with no waitlist and free OpenRouter access, processing over 400 million tokens in just two days.
The official release brings a production-ready model available via Alibaba Cloud Model Studio. Now you get a stable API, SLA-backed uptime, and a new API parameter (preserve_thinking) that improves multi-step agent workflows.
This guide covers the key changes, how to use the API, and how to test your integrations with Apidog before production deployment.
What Qwen3.6-Plus Is
Qwen3.6-Plus is a hosted mixture-of-experts model from Alibaba's Qwen team. Like the Qwen3.5 series, it uses sparse activation for efficient compute.
Key specs:
- 1M token context window
- Mandatory chain-of-thought reasoning
- New
preserve_thinkingparameter for agentic tasks - Native multimodal support (vision, video, document understanding)
- OpenAI-compatible API, Anthropic-compatible API, OpenAI Responses API
Open-source smaller variants will be available soon for self-hosted setups.
Benchmark Results
Coding Agents
Qwen3.6-Plus is just behind Claude Opus 4.5 on SWE-bench tasks, but leads on terminal operations.
Terminal-Bench 2.0 evaluates real shell operations—file management, process control, multi-step workflows with significant compute. Qwen3.6-Plus scores 61.6%, beating Claude Opus 4.5 at 59.3%.
General Agents and Tool Use
| Benchmark | Claude Opus 4.5 | Qwen3.6-Plus |
|---|---|---|
| TAU3-Bench | 70.2% | 70.7% |
| DeepPlanning | 33.9% | 41.5% |
| MCPMark | 42.3% | 48.2% |
| MCP-Atlas | 71.8% | 74.1% |
| WideSearch | 76.4% | 74.3% |
MCPMark tests GitHub MCP tool calls. Qwen3.6-Plus leads on key planning and tool use tasks.
Reasoning and Knowledge
| Benchmark | Claude Opus 4.5 | Qwen3.6-Plus |
|---|---|---|
| GPQA | 87.0% | 90.4% |
| LiveCodeBench v6 | 84.8% | 87.1% |
| IFEval strict | 90.9% | 94.3% |
| MMLU-Pro | 89.5% | 88.5% |
Qwen3.6-Plus leads in science reasoning and instruction-following benchmarks, key for structured agentic tasks.
Multimodal
| Benchmark | Qwen3.6-Plus | Notes |
|---|---|---|
| OmniDocBench 1.5 | 91.2% | Top in table |
| RefCOCO avg | 93.5% | Top in table |
| We-Math | 89.0% | Top in table |
| CountBench | 97.6% | Top in table |
| OSWorld-Verified | 62.5% | Behind Claude (66.3%) |
Qwen3.6-Plus is ahead in document, spatial, and object detection tasks, though Claude leads in desktop automation.
How to Call the API
Qwen3.6-Plus is available on Alibaba Cloud Model Studio. Get your API key at modelstudio.alibabacloud.com.
Regional Base URLs:
- Singapore:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 - Beijing:
https://dashscope.aliyuncs.com/compatible-mode/v1 - US Virginia:
https://dashscope-us.aliyuncs.com/compatible-mode/v1
Basic Call With Streaming
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.6-plus",
messages=[{"role": "user", "content": "Review this Python function and find bugs."}],
extra_body={"enable_thinking": True},
stream=True
)
reasoning = ""
answer = ""
is_answering = False
for chunk in completion:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
if not is_answering:
reasoning += delta.reasoning_content
if delta.content:
if not is_answering:
is_answering = True
answer += delta.content
print(delta.content, end="", flush=True)
The preserve_thinking Parameter
The preview only retained reasoning from the current turn. The official release adds preserve_thinking.
When preserve_thinking: true is set, the model keeps chain-of-thought from all prior turns—ideal for multi-step agent workflows. Disabled by default to save tokens.
completion = client.chat.completions.create(
model="qwen3.6-plus",
messages=conversation_history,
extra_body={
"enable_thinking": True,
"preserve_thinking": True, # keep reasoning across all turns
},
stream=True
)
Use Qwen3.6-Plus With Claude Code
Qwen's API supports Anthropic's protocol. Use Claude Code with Qwen3.6-Plus by setting environment variables:
npm install -g @anthropic-ai/claude-code
export ANTHROPIC_MODEL="qwen3.6-plus"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.6-plus"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=your_dashscope_api_key
claude
Use Qwen3.6-Plus With OpenClaw
OpenClaw is a self-hosted coding agent. Install and configure for Model Studio:
# Install (Node.js 22+)
curl -fsSL https://molt.bot/install.sh | bash
export DASHSCOPE_API_KEY=your_key
openclaw dashboard
Edit ~/.openclaw/openclaw.json to include:
{
"models": {
"providers": [{
"name": "alibaba-coding-plan",
"baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
"apiKey": "${DASHSCOPE_API_KEY}",
"models": [{"id": "qwen3.6-plus", "reasoning": true}]
}]
},
"agents": {
"defaults": {"models": ["qwen3.6-plus"]}
}
}
Use Qwen3.6-Plus With Qwen Code
Qwen Code is Alibaba's open-source terminal agent. 1,000 free API calls/day with Qwen OAuth.
npm install -g @qwen-code/qwen-code@latest
qwen
# Type /auth to sign in and activate free tier
Why preserve_thinking Changes Agent Behavior
Typical LLM APIs reset reasoning each turn. For multi-step agent tasks, this causes context drift.
With preserve_thinking, the model keeps all prior reasoning visible, making decisions more consistent over complex workflows and reducing repeated reasoning (saves tokens).
Example agent loop:
conversation = []
def agent_step(user_message, preserve=True):
conversation.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="qwen3.6-plus",
messages=conversation,
extra_body={
"enable_thinking": True,
"preserve_thinking": preserve,
},
stream=False
)
message = response.choices[0].message
conversation.append({"role": "assistant", "content": message.content})
return message.content
# Multi-step code review agent
result = agent_step("Analyze the auth module for security issues.")
result = agent_step("Now suggest fixes for the top 3 issues you found.")
result = agent_step("Write tests that validate each fix.")
Without preserve_thinking, step 3 won't have access to the issues found in step 1.
What It's Best For
- Repository-level bug fixing: SWE-bench Verified 78.8%, Pro 56.6%. Strong for automated code repair/review pipelines.
- Terminal automation: Top performer on Terminal-Bench 2.0; ideal for shell-heavy workflows and build pipelines.
- MCP tool calling: MCPMark at 48.2%—best for MCP-based integrations.
- Long-context document/code analysis: 1M token window handles codebase reviews and large documents.
- Frontend code generation: Nearly tied with Claude Opus 4.5 for frontend tasks (QwenWebBench 1501.7 vs 1517.9).
- Multilingual scenarios: WMT24++ at 84.3%, MAXIFE at 88.2% across 23 languages.
Testing Qwen3.6-Plus API Calls With Apidog
The endpoint is OpenAI-compatible. Import it into Apidog and test like any other API.
-
POST to
https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions - API key:
Authorization: Bearer {{DASHSCOPE_API_KEY}}
Sample response assertions:
pm.test("Response contains choices", () => {
const body = pm.response.json();
pm.expect(body).to.have.property("choices");
pm.expect(body.choices[0].message.content).to.be.a("string").and.not.empty;
});
pm.test("No empty reasoning when thinking enabled", () => {
const choice = pm.response.json().choices[0];
if (choice.message.reasoning_content !== undefined) {
pm.expect(choice.message.reasoning_content).to.not.be.empty;
}
});
- Use Smart Mock in Apidog to generate test responses without hitting the live API.
- For multi-turn agents, create a Test Scenario chaining requests. Validate that
preserve_thinkingcarries reasoning across turns before production.
Download Apidog free to start testing.
What's Coming Next
Smaller open-source variants will be released soon, following the Qwen3.5 pattern (sparse MoE, Apache 2.0 weights).
Roadmap:
- Longer-horizon, repository-level tasks (complex, multi-file problem solving)
- Multimodal agent development, including GUI agents and visual coding
Qwen3.5 open-source models quickly became a default for self-hosted coding agents. Expect the same for Qwen3.6 variants.
Conclusion
Qwen3.6-Plus closes the gap with Claude Opus 4.5 on coding and leads in terminal, MCP tool use, and planning. With a 1M token context, Anthropic protocol support, and the new preserve_thinking parameter, it's ready for production agentic systems.
The official API brings stability, SLA coverage, and reliable agent-focused workflows.
Apidog simplifies testing: import the endpoint, add assertions, use mocks, and run regression tests as you update your model or API version.
FAQ
What is the difference between Qwen3.6-Plus and the preview?
The preview (qwen/qwen3.6-plus-preview) launched on OpenRouter on March 30, 2026. The official release adds the preserve_thinking parameter, SLA-backed uptime, and full Model Studio support. Smaller open-source variants are also coming.
What is preserve_thinking and when should I use it?
By default, only current-turn reasoning is kept. Set preserve_thinking: true to retain reasoning from all previous turns. Use for multi-step agent loops where past reasoning should inform next actions.
How does Qwen3.6-Plus compare to Claude Opus 4.5?
Claude leads on SWE-bench Verified (80.9% vs 78.8%) and OSWorld-Verified (66.3% vs 62.5%). Qwen3.6-Plus leads on Terminal-Bench 2.0 (61.6% vs 59.3%), MCPMark (48.2% vs 42.3%), DeepPlanning (41.5% vs 33.9%), and GPQA (90.4% vs 87.0%).
Can I use Qwen3.6-Plus with Claude Code?
Yes. Set ANTHROPIC_BASE_URL to the Dashscope Anthropic-compatible endpoint, ANTHROPIC_MODEL to qwen3.6-plus, and ANTHROPIC_AUTH_TOKEN to your Dashscope API key.
Is Qwen3.6-Plus open source?
The hosted API model is not open-weight. Smaller variants with public weights will be released soon.
How do I get free access?
Install Qwen Code (npm install -g @qwen-code/qwen-code@latest), run qwen, then /auth. Sign in with Qwen Code OAuth for 1,000 free API calls/day against Qwen3.6-Plus.
What context window does it support?
1 million tokens by default. Some benchmarks used 256K for comparison, but the API default is 1M.
How do I test the API integration before deploying?
Import the endpoint into Apidog, add your API key as an environment variable, write response assertions, and use Smart Mock for offline development. Chain requests into a Test Scenario to validate multi-turn agent behavior before production deployment.


Top comments (0)