TL;DR
Qwen3.6-Plus is now officially available. It scores 78.8% on SWE-bench Verified and 61.6% on Terminal-Bench 2.0, where it beats Claude Opus 4.5. It also ships with a 1M token context window, a new preserve_thinking parameter for agent loops, and compatibility with Claude Code, OpenClaw, and Qwen Code through OpenAI-compatible and Anthropic-compatible APIs.
From preview to release
If you tested Qwen3.6-Plus Preview on OpenRouter, the official release is the production-ready version of that model. The preview launched quietly on March 30 with no waitlist and free access through OpenRouter. In its first two days, it processed more than 400 million completion tokens across roughly 400,000 requests.
The official release moves Qwen3.6-Plus to Alibaba Cloud Model Studio with a stable API, SLA-backed uptime, and a new API parameter designed for multi-step agent workflows.
This guide focuses on implementation:
- What changed from the preview
- How to call the API
- How to enable
preserve_thinking - How to connect Qwen3.6-Plus to Claude Code, OpenClaw, and Qwen Code
- How to test the integration with Apidog before deploying
What Qwen3.6-Plus is
Qwen3.6-Plus is a hosted mixture-of-experts model from Alibaba's Qwen team. Like the Qwen3.5 series, it uses sparse activation, so only a subset of parameters is active for each token. That helps deliver strong performance without the compute cost of a similarly capable dense model.
Launch specs:
- 1 million token context window by default
- Mandatory chain-of-thought reasoning, same as the preview
- New
preserve_thinkingparameter for agentic tasks - Native multimodal support for vision, video, and document understanding
- OpenAI-compatible API
- Anthropic-compatible API
- OpenAI Responses API support
Smaller open-source variants are expected to follow. If you need self-hosted weights, those are not part of the hosted API model but are on the roadmap.
Benchmark results
Coding agents
Qwen3.6-Plus is close to Claude Opus 4.5 on most SWE-bench tasks and leads the comparison on terminal operations.
Terminal-Bench 2.0 evaluates real shell workflows, including file operations, process control, and multi-step terminal tasks under a 3-hour timeout with 32 CPU cores and 48GB RAM.
Qwen3.6-Plus scores 61.6% compared with Claude Opus 4.5 at 59.3%. That matters if your agent needs to run shell-heavy workflows such as builds, test execution, file edits, and process management.
General agents and tool use
| Benchmark | Claude Opus 4.5 | Qwen3.6-Plus |
|---|---|---|
| TAU3-Bench | 70.2% | 70.7% |
| DeepPlanning | 33.9% | 41.5% |
| MCPMark | 42.3% | 48.2% |
| MCP-Atlas | 71.8% | 74.1% |
| WideSearch | 76.4% | 74.3% |
MCPMark tests GitHub MCP v0.30.3 tool calls, with Playwright responses truncated at 32K tokens. Qwen3.6-Plus leading at 48.2% is relevant if you are building MCP-based developer tooling.
DeepPlanning at 41.5% versus Claude Opus 4.5 at 33.9% also points to stronger long-horizon planning performance.
Reasoning and knowledge
| Benchmark | Claude Opus 4.5 | Qwen3.6-Plus |
|---|---|---|
| GPQA | 87.0% | 90.4% |
| LiveCodeBench v6 | 84.8% | 87.1% |
| IFEval strict | 90.9% | 94.3% |
| MMLU-Pro | 89.5% | 88.5% |
GPQA evaluates graduate-level science reasoning. IFEval strict measures whether a model follows precise formatting and constraint instructions.
Qwen3.6-Plus leads both GPQA and IFEval strict, which is useful for structured output, multi-step agents, and tool-calling workflows where instruction drift can break execution.
Multimodal
Qwen3.6-Plus is a native multimodal model and leads several document, spatial, and object detection benchmarks.
| Benchmark | Qwen3.6-Plus | Notes |
|---|---|---|
| OmniDocBench 1.5 | 91.2% | Top in table |
| RefCOCO avg | 93.5% | Top in table |
| We-Math | 89.0% | Top in table |
| CountBench | 97.6% | Top in table |
| OSWorld-Verified | 62.5% | Behind Claude, which scored 66.3% |
Claude Opus 4.5 leads OSWorld-Verified at 66.3% versus Qwen3.6-Plus at 62.5%. For document understanding and spatial grounding tasks, Qwen3.6-Plus leads in the reported comparison.
How to call the API
Qwen3.6-Plus is available through Alibaba Cloud Model Studio. Get an API key from modelstudio.alibabacloud.com.
Regional base URLs:
- Singapore:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 - Beijing:
https://dashscope.aliyuncs.com/compatible-mode/v1 - US Virginia:
https://dashscope-us.aliyuncs.com/compatible-mode/v1
Basic streaming call
Install the OpenAI Python SDK if needed:
pip install openai
Then call the OpenAI-compatible endpoint:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["DASHSCOPE_API_KEY"],
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.6-plus",
messages=[
{
"role": "user",
"content": "Review this Python function and find bugs."
}
],
extra_body={
"enable_thinking": True
},
stream=True,
)
reasoning = ""
answer = ""
is_answering = False
for chunk in completion:
if not chunk.choices:
continue
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
if not is_answering:
reasoning += delta.reasoning_content
if delta.content:
if not is_answering:
is_answering = True
answer += delta.content
print(delta.content, end="", flush=True)
Use this pattern when you want to stream the final answer while optionally collecting reasoning content separately.
Use preserve_thinking for agent loops
The preview version only kept reasoning from the current turn. The official release adds preserve_thinking.
When you set preserve_thinking: true, the model retains chain-of-thought from previous turns in the conversation. Alibaba recommends this for agent scenarios because multi-step agents often need to reference the reasoning that led to earlier decisions.
It is disabled by default to control token usage. Enable it for multi-turn agent workflows.
completion = client.chat.completions.create(
model="qwen3.6-plus",
messages=conversation_history,
extra_body={
"enable_thinking": True,
"preserve_thinking": True,
},
stream=True,
)
Minimal multi-turn agent pattern
conversation = []
def agent_step(user_message, preserve=True):
conversation.append({
"role": "user",
"content": user_message
})
response = client.chat.completions.create(
model="qwen3.6-plus",
messages=conversation,
extra_body={
"enable_thinking": True,
"preserve_thinking": preserve,
},
stream=False,
)
message = response.choices[0].message
conversation.append({
"role": "assistant",
"content": message.content
})
return message.content
# Example: multi-step code review agent
result = agent_step("Analyze the auth module for security issues.")
result = agent_step("Now suggest fixes for the top 3 issues you found.")
result = agent_step("Write tests that validate each fix.")
Without preserve_thinking, the model on step 3 may not retain why it selected the issues in step 1. With preserve_thinking, the reasoning chain is preserved across turns.
Use Qwen3.6-Plus with Claude Code
The Qwen API supports the Anthropic protocol, so Claude Code can run against Qwen3.6-Plus by changing environment variables.
Install Claude Code:
npm install -g @anthropic-ai/claude-code
Set the Qwen-compatible Anthropic endpoint:
export ANTHROPIC_MODEL="qwen3.6-plus"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.6-plus"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=your_dashscope_api_key
claude
That lets you keep your Claude Code workflow while routing model calls to Qwen3.6-Plus.
Use Qwen3.6-Plus with OpenClaw
OpenClaw, formerly Moltbot / Clawdbot, is an open-source self-hosted coding agent.
Install it:
# Node.js 22+
curl -fsSL https://molt.bot/install.sh | bash
Set your API key and start the dashboard:
export DASHSCOPE_API_KEY=your_key
openclaw dashboard
Edit ~/.openclaw/openclaw.json and merge these fields. Do not overwrite the whole file.
{
"models": {
"providers": [
{
"name": "alibaba-coding-plan",
"baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
"apiKey": "${DASHSCOPE_API_KEY}",
"models": [
{
"id": "qwen3.6-plus",
"reasoning": true
}
]
}
]
},
"agents": {
"defaults": {
"models": ["qwen3.6-plus"]
}
}
}
Use Qwen3.6-Plus with Qwen Code
Qwen Code is Alibaba's open-source terminal agent for the Qwen model family. It provides 1,000 free API calls per day when you sign in with Qwen Code OAuth.
Install and authenticate:
npm install -g @qwen-code/qwen-code@latest
qwen
# In the Qwen Code UI, type:
/auth
After authentication, you can use Qwen3.6-Plus directly from the terminal agent.
Why preserve_thinking changes agent behavior
Most LLM APIs treat each turn independently. The model generates a response, reasoning is discarded, and the next turn starts with the visible conversation only.
That works for simple Q&A. It is weaker for agents running 10 to 20 step tasks because the model may lose track of why it made earlier choices.
preserve_thinking keeps the reasoning from previous turns available when generating the next response. In practice, an agent working through a repository-level task on step 8 can still use its analysis from steps 2, 4, and 6.
Alibaba's benchmarks also indicate this can reduce redundant reasoning. If the model does not need to re-derive context it already established, it can use fewer tokens per turn on complex multi-step workflows.
Use preserve_thinking when your agent:
- Reviews or modifies multiple files over several steps
- Uses tools repeatedly
- Needs to keep a plan consistent
- Must explain or act on earlier decisions
- Chains terminal operations, tests, and code edits
What Qwen3.6-Plus is best for
Repository-level bug fixing
SWE-bench Verified at 78.8% and SWE-bench Pro at 56.6% make Qwen3.6-Plus competitive for automated code repair, review, and repository-level debugging pipelines.
Terminal automation
Terminal-Bench 2.0 leadership makes it strong for shell-heavy workflows such as:
- Multi-step file operations
- Process management
- Build pipelines
- Test execution
- CLI-driven debugging
MCP tool calling
MCPMark at 48.2% makes Qwen3.6-Plus a strong option for MCP-based tool integrations, especially GitHub and browser automation workflows.
Long-context document analysis
The 1M token context window is useful for:
- Full codebase review
- Large specification analysis
- Multi-file reasoning
- Long policy, legal, or technical document processing
Frontend code generation
Qwen team's internal QwenWebBench covers seven categories:
- Web Design
- Web Apps
- Games
- SVG
- Data Visualization
- Animation
- 3D
Qwen3.6-Plus scores 1501.7 versus Claude Opus 4.5 at 1517.9, making them effectively close in this reported frontend generation benchmark.
Multilingual workflows
Qwen3.6-Plus scores 84.3% on WMT24++ and 88.2% on MAXIFE across 23 language settings, making it useful for non-English and multilingual applications.
Testing Qwen3.6-Plus API calls with Apidog
The API is OpenAI-compatible, so you can test it in Apidog like any other HTTP API.
Create a POST request:
https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Add your API key as an environment variable and pass it in the Authorization header:
Authorization: Bearer {{DASHSCOPE_API_KEY}}
Example request body:
{
"model": "qwen3.6-plus",
"messages": [
{
"role": "user",
"content": "Review this Python function and find bugs."
}
],
"stream": false,
"enable_thinking": true
}
Write assertions to validate the response before wiring it into production code:
pm.test("Response contains choices", () => {
const body = pm.response.json();
pm.expect(body).to.have.property("choices");
pm.expect(body.choices).to.be.an("array").and.not.empty;
pm.expect(body.choices[0].message.content).to.be.a("string").and.not.empty;
});
pm.test("No empty reasoning when thinking enabled", () => {
const choice = pm.response.json().choices[0];
if (choice.message.reasoning_content !== undefined) {
pm.expect(choice.message.reasoning_content).to.not.be.empty;
}
});
For agent workflows, create an Apidog Test Scenario that chains multiple requests:
- Send the first prompt.
- Save the assistant response.
- Send the second prompt with conversation history.
- Enable
preserve_thinking. - Validate the response structure after each step.
You can also use Apidog Smart Mock to generate test responses while developing your orchestration layer. That lets you test request handling, retries, parsing, and error paths without calling the live API on every run.
What's coming next
The Qwen team confirmed that smaller open-source variants are expected within days. These are planned to follow the Qwen3.5 pattern: sparse MoE models with public Apache 2.0 weights.
The roadmap also includes:
- Longer-horizon repository-level tasks
- More complex multi-file problem solving
- Continued multimodal agent development
- GUI agents and visual coding as first-class capabilities
The Qwen3.5 open-source variants became widely deployed self-hosted models shortly after release. If Qwen3.6 follows the same pattern, the smaller variants may become common choices for self-hosted coding agents after they land.
Conclusion
Qwen3.6-Plus narrows the gap with Claude Opus 4.5 on coding tasks and leads in terminal operations, MCP tool calling, and long-horizon planning in the reported benchmarks.
For developers, the main implementation points are:
- Use the OpenAI-compatible API for standard chat completions.
- Enable
enable_thinkingwhen you need reasoning. - Enable
preserve_thinkingfor multi-step agents. - Use the Anthropic-compatible endpoint for Claude Code.
- Test requests, assertions, mocks, and multi-turn scenarios in Apidog before deployment.
The official API adds production stability, SLA coverage, and the new agent-focused preserve_thinking parameter. If you are building coding agents, terminal automation, or MCP-based tooling, Qwen3.6-Plus is worth benchmarking against your current model stack.
FAQ
What is the difference between Qwen3.6-Plus and the preview?
The preview, qwen/qwen3.6-plus-preview, launched on OpenRouter on March 30, 2026. The official release adds the preserve_thinking parameter, SLA-backed uptime, and full Model Studio support. Smaller open-source variants are also expected.
What is preserve_thinking and when should I use it?
By default, only reasoning from the current turn is kept. When preserve_thinking: true is set, the model retains chain-of-thought from previous conversation turns. Use it for multi-step agent loops where previous reasoning should inform later actions.
How does Qwen3.6-Plus compare to Claude Opus 4.5?
Claude Opus 4.5 leads on SWE-bench Verified, 80.9% versus 78.8%, and OSWorld-Verified, 66.3% versus 62.5%.
Qwen3.6-Plus leads on:
- Terminal-Bench 2.0: 61.6% versus 59.3%
- MCPMark: 48.2% versus 42.3%
- DeepPlanning: 41.5% versus 33.9%
- GPQA: 90.4% versus 87.0%
Can I use Qwen3.6-Plus with Claude Code?
Yes. Set ANTHROPIC_BASE_URL to the Dashscope Anthropic-compatible endpoint, ANTHROPIC_MODEL to qwen3.6-plus, and ANTHROPIC_AUTH_TOKEN to your Dashscope API key.
Is Qwen3.6-Plus open source?
The hosted API model is not open-weight. Smaller variants with public weights are confirmed to be releasing within days.
How do I get free access?
Install Qwen Code:
npm install -g @qwen-code/qwen-code@latest
Run it:
qwen
Then type:
/auth
Sign in with Qwen Code OAuth to get 1,000 free API calls per day against Qwen3.6-Plus.
What context window does Qwen3.6-Plus support?
It supports 1 million tokens by default. Some official benchmarks used 256K for standardized comparison, but the API default is 1M.
How do I test the API integration before deploying?
Import the OpenAI-compatible endpoint into Apidog, add your API key as an environment variable, write response assertions, and use Smart Mock for offline development. For multi-turn agents, chain requests into a Test Scenario to validate behavior end to end.


Top comments (0)