Thanawat Wongchai

Posted on Apr 2 • Originally published at apidog.com

Qwen3.6-Plus API: ชนะ Claude บน Terminal Benchmarks

#ai #api #llm #news

สรุป (TL;DR)

Qwen3.6-Plus เปิดตัวอย่างเป็นทางการแล้ว คะแนนสูงใน SWE-bench Verified 78.8% และ Terminal-Bench 2.0 ที่ 61.6% (สูงกว่า Claude Opus 4.5) พร้อม Context Window 1M โทเค็น มีพารามิเตอร์ใหม่ preserve_thinking สำหรับ Agent Loops และใช้งานกับ Claude Code, OpenClaw และ Qwen Code ได้โดยตรงผ่าน API ที่เข้ากันได้กับ OpenAI

ทดลองใช้ Apidog วันนี้

จากการพรีวิวสู่การเปิดตัว

หากคุณเคยอ่าน คู่มือ Qwen 3.6 Plus Preview บน OpenRouter จะทราบถึงความสามารถของโมเดลนี้ รุ่นพรีวิวเปิดตัวเมื่อ 30 มีนาคม ประมวลผลโทเค็นไปกว่า 400 ล้านโทเค็นใน 2 วันแรก

ตอนนี้ Qwen3.6-Plus เปิดตัวอย่างเป็นทางการแล้ว ใช้งานได้ผ่าน Alibaba Cloud Model Studio มี API ที่เสถียร, uptime การันตี และพารามิเตอร์ใหม่ที่เหมาะกับงาน Agent หลายขั้นตอน

บทความนี้สรุปของใหม่ วิธีเรียก API อย่างถูกต้อง และวิธีทดสอบการผสานกับ Apidog ก่อนนำไปใช้จริง

Qwen3.6-Plus คืออะไร

Qwen3.6-Plus เป็นโมเดล Mixture-of-Experts จากทีม Qwen ของ Alibaba ใช้ Sparse Activation (มีแค่บางส่วนของพารามิเตอร์ทำงานแต่ละโทเค็น) ช่วยให้มีประสิทธิภาพสูงแต่ต้นทุนต่ำกว่ารุ่น Dense

ฟีเจอร์หลัก:

Context Window 1M โทเค็น
Reasoning แบบ Chain-of-thought (บังคับใช้)
พารามิเตอร์ preserve_thinking สำหรับ Agent
รองรับ Multimodal (ภาพ, วิดีโอ, เอกสาร)
API compatible กับ OpenAI, Anthropic และ OpenAI Responses API

เวอร์ชัน Open-source ขนาดเล็กจะเปิดตัวเร็วๆ นี้ หากต้องการ Weights สำหรับ self-hosted ให้ติดตามข่าว

ผลลัพธ์ Benchmark

Coding Agents

Qwen3.6-Plus ตามหลัง Claude Opus 4.5 เพียงเล็กน้อยใน SWE-bench ส่วนใหญ่ แต่เป็นผู้นำทุกโมเดลใน Terminal-Bench

Terminal-Bench 2.0 ทดสอบ shell จริง เช่น การจัดการไฟล์ การควบคุมกระบวนการ และเวิร์กโฟลว์หลายขั้นตอน Qwen3.6-Plus ได้ 61.6% vs Claude Opus 59.3% เหนือกว่าอย่างชัดเจนในงานที่ developer ใช้งานจริง

General Agents และการใช้เครื่องมือ

Benchmark	Claude Opus 4.5	Qwen3.6-Plus
TAU3-Bench	70.2%	70.7%
DeepPlanning	33.9%	41.5%
MCPMark	42.3%	48.2%
MCP-Atlas	71.8%	74.1%
WideSearch	76.4%	74.3%

MCPMark ทดสอบการใช้เครื่องมือ GitHub MCP โดดเด่นที่ 48.2% DeepPlanning ที่ 41.5% เทียบกับ 33.9% ของ Claude แสดงความต่างในงานวางแผนยาว

การใช้เหตุผลและความรู้

Benchmark	Claude Opus 4.5	Qwen3.6-Plus
GPQA	87.0%	90.4%
LiveCodeBench v6	84.8%	87.1%
IFEval strict	90.9%	94.3%
MMLU-Pro	89.5%	88.5%

GPQA และ IFEval strict - Qwen3.6-Plus เป็นผู้นำ เหมาะกับ output มีโครงสร้างและงาน Agent ที่ต้องปฏิบัติตามข้อจำกัด

Multimodal

Qwen3.6-Plus เป็นโมเดล Multimodal แบบเนทีฟ นำหน้าในงานเอกสาร, spatial reasoning, object detection

Benchmark	Qwen3.6-Plus	หมายเหตุ
OmniDocBench 1.5	91.2%	อันดับสูงสุด
RefCOCO avg	93.5%	อันดับสูงสุด
We-Math	89.0%	อันดับสูงสุด
CountBench	97.6%	อันดับสูงสุด
OSWorld-Verified	62.5%	ตามหลัง Claude (66.3%)

สำหรับงานเอกสารและ spatial Qwen3.6-Plus เป็นผู้นำ

วิธีเรียกใช้ API

Qwen3.6-Plus มีบน Alibaba Cloud Model Studio รับ API key ได้ที่ modelstudio.alibabacloud.com

Base URLs:

สิงคโปร์: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
ปักกิ่ง: https://dashscope.aliyuncs.com/compatible-mode/v1
สหรัฐอเมริกา เวอร์จิเนีย: https://dashscope-us.aliyuncs.com/compatible-mode/v1

การเรียกใช้พื้นฐานพร้อม Streaming

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=[{"role": "user", "content": "Review this Python function and find bugs."}],
    extra_body={"enable_thinking": True},
    stream=True
)

reasoning = ""
answer = ""
is_answering = False

for chunk in completion:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        if not is_answering:
            reasoning += delta.reasoning_content
    if delta.content:
        if not is_answering:
            is_answering = True
        answer += delta.content
        print(delta.content, end="", flush=True)

พารามิเตอร์ preserve_thinking

ตั้งค่า preserve_thinking: true เพื่อให้โมเดลคง Chain-of-Thought จากทุกรอบก่อนหน้า เหมาะกับ Agent Loops หลายขั้นตอน (ปิดไว้โดย default เพื่อประหยัดโทเค็น)

completion = client.chat.completions.create(
    model="qwen3.6-plus",
    messages=conversation_history,
    extra_body={
        "enable_thinking": True,
        "preserve_thinking": True, # รักษา reasoning ทุกรอบ
    },
    stream=True
)

ใช้ Qwen3.6-Plus กับ Claude Code

Qwen API รองรับโปรโตคอล Anthropic แค่เปลี่ยน ENV ก็ใช้งาน Claude Code กับ Qwen3.6-Plus ได้ทันที

npm install -g @anthropic-ai/claude-code

export ANTHROPIC_MODEL="qwen3.6-plus"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.6-plus"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=your_dashscope_api_key

claude

ใช้ Qwen3.6-Plus กับ OpenClaw

OpenClaw (เดิม Moltbot / Clawdbot) เป็น Agent เขียนโค้ดแบบ self-hosted

# ติดตั้ง (Node.js 22+)
curl -fsSL https://molt.bot/install.sh | bash

export DASHSCOPE_API_KEY=your_key
openclaw dashboard

แก้ไข ~/.openclaw/openclaw.json เพิ่มฟิลด์ด้านล่าง:

{
  "models": {
    "providers": [{
      "name": "alibaba-coding-plan",
      "baseUrl": "https://coding-intl.dashscope.aliyuncs.com/v1",
      "apiKey": "${DASHSCOPE_API_KEY}",
      "models": [{"id": "qwen3.6-plus", "reasoning": true}]
    }]
  },
  "agents": {
    "defaults": {"models": ["qwen3.6-plus"]}
  }
}

ใช้ Qwen3.6-Plus กับ Qwen Code

Qwen Code เป็น Agent เทอร์มินัลแบบ Open-source จาก Alibaba ใช้ API ฟรี 1,000 ครั้ง/วันหลัง sign-in

npm install -g @qwen-code/qwen-code@latest
qwen
# พิมพ์ /auth เพื่อลงชื่อเข้าใช้และเปิดใช้งาน Free Tier

เหตุใด preserve_thinking จึงเปลี่ยนแปลงพฤติกรรมของ Agent

API LLM ทั่วไปจะไม่คง reasoning ข้ามแต่ละรอบ ทำให้ Agent หลุดประเด็นเมื่อต้องทำหลาย step

preserve_thinking จะเก็บ chain-of-reasoning ข้ามรอบ Agent เหมาะกับงานที่ต้องใช้ context reasoning หลายขั้นตอน

conversation = []

def agent_step(user_message, preserve=True):
    conversation.append({"role": "user", "content": user_message})

    response = client.chat.completions.create(
        model="qwen3.6-plus",
        messages=conversation,
        extra_body={
            "enable_thinking": True,
            "preserve_thinking": preserve,
        },
        stream=False
    )

    message = response.choices[0].message
    conversation.append({"role": "assistant", "content": message.content})
    return message.content

# ตัวอย่าง: Agent ตรวจสอบโค้ดแบบหลายขั้นตอน
result = agent_step("Analyze the auth module for security issues.")
result = agent_step("Now suggest fixes for the top 3 issues you found.")
result = agent_step("Write tests that validate each fix.")

ถ้าไม่มี preserve_thinking โมเดลจะไม่รู้ว่าขั้นที่ 1 เจอปัญหาอะไร แต่เมื่อเปิด reasoning chain จะไม่หลุด context

เหมาะสำหรับอะไรมากที่สุด

การแก้ไขข้อบกพร่องระดับ Repository: SWE-bench Verified 78.8% และ Pro 56.6%
ระบบอัตโนมัติสำหรับเทอร์มินัล: นำใน Terminal-Bench 2.0
การเรียกใช้เครื่องมือ MCP: MCPMark 48.2% (สูงสุด)
การวิเคราะห์เอกสารแบบ Long-context: Context Window 1M โทเค็น เหมาะกับ Codebase/Doc ขนาดใหญ่
Frontend code generation: QwenWebBench 1501.7 คะแนน ใกล้เคียง Claude Opus 4.5
หลายภาษา (Multilingual): WMT24++ 84.3%, MAXIFE 88.2% ครอบคลุม 23 ภาษา

การทดสอบการเรียกใช้ API ของ Qwen3.6-Plus ด้วย Apidog

Endpoint เข้ากันได้กับ OpenAI สามารถนำเข้าไปใน Apidog และทดสอบเหมือน API อื่นๆ

ตัวอย่างการตั้งค่า:

URL: https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions
Header: Authorization: Bearer {{DASHSCOPE_API_KEY}}

ตัวอย่าง Assertion ใน Apidog:

pm.test("Response contains choices", () => {
  const body = pm.response.json();
  pm.expect(body).to.have.property("choices");
  pm.expect(body.choices[0].message.content).to.be.a("string").and.not.empty;
});

pm.test("No empty reasoning when thinking enabled", () => {
  const choice = pm.response.json().choices[0];
  if (choice.message.reasoning_content !== undefined) {
    pm.expect(choice.message.reasoning_content).to.not.be.empty;
  }
});

ใช้ Smart Mock ของ Apidog เพื่อจำลอง response ระหว่างการพัฒนา ไม่ต้องเปลืองโทเค็นและทดสอบได้เร็ว

ถ้าสร้าง Agent หลายรอบ ให้สร้าง Test Scenario เชื่อมโยงหลาย request ตรวจสอบว่า preserve_thinking คง context reasoning ได้จริงก่อนนำไป production

ดาวน์โหลด Apidog ฟรี เพื่อตั้งค่าการทดสอบนี้

มีอะไรกำลังจะมา

ทีม Qwen เตรียมเปิดตัวเวอร์ชัน Open-source ขนาดเล็กในไม่กี่วัน (รูปแบบ Apache 2.0, Sparse MoE เหมือน Qwen3.5)

Roadmap ที่จะตามมา:

งานระดับ Repository กับไฟล์ขนาดใหญ่ขึ้น
Agent แบบ Multimodal ที่ควบคุม GUI และ Visual Coding เป็นความสามารถหลัก

หาก Qwen3.6 เหมือน Qwen3.5 เวอร์ชัน Open-source จะกลายเป็นตัวเลือกหลักสำหรับ self-hosted coding agent เร็วๆ นี้

สรุป

Qwen3.6-Plus ลดช่องว่างกับ Claude Opus 4.5 ในงานเขียนโค้ด นำในเทอร์มินัล, เครื่องมือ MCP, การวางแผนยาว Context Window 1M, Protocol Anthropic-compatible และ preserve_thinking เหมาะกับ Agentic System พร้อมใช้งานจริง

API อย่างเป็นทางการให้ความเสถียร SLA และฟีเจอร์ใหม่ที่เหมาะกับ Agent โดยเฉพาะ

Apidog ช่วยให้ทดสอบ API, Assertion, Mock และ Regression Test ได้สะดวกทุกครั้งที่อัปเดตโมเดลหรือ API

คำถามที่พบบ่อย

Qwen3.6-Plus และเวอร์ชันพรีวิวแตกต่างกันอย่างไร?

เวอร์ชันพรีวิว (qwen/qwen3.6-plus-preview) เปิดตัวบน OpenRouter เมื่อ 30 มี.ค. 2026 เวอร์ชันทางการเพิ่ม preserve_thinking, SLA และรองรับ Model Studio เต็มรูปแบบ เวอร์ชัน Open-source เล็กกว่ากำลังจะมา

preserve_thinking คืออะไร และควรใช้เมื่อใด?

โดยปกติ reasoning จะคงเฉพาะรอบปัจจุบัน ถ้าตั้งค่า preserve_thinking: true reasoning chain จะคงข้ามรอบ ใช้กับ Agent Loops หลายขั้นตอนที่ต้องเห็น reasoning เก่าด้วย

Qwen3.6-Plus เปรียบเทียบกับ Claude Opus 4.5 อย่างไร?

Claude Opus 4.5 นำใน SWE-bench Verified (80.9% vs 78.8%) และ OSWorld-Verified (66.3% vs 62.5%) แต่ Qwen3.6-Plus นำใน Terminal-Bench 2.0 (61.6% vs 59.3%), MCPMark (48.2% vs 42.3%), DeepPlanning (41.5% vs 33.9%) และ GPQA (90.4% vs 87.0%)

ฉันสามารถใช้ Qwen3.6-Plus กับ Claude Code ได้หรือไม่?

ได้ ตั้งค่า ANTHROPIC_BASE_URL ไปยัง Dashscope Anthropic endpoint, ANTHROPIC_MODEL เป็น qwen3.6-plus และ ANTHROPIC_AUTH_TOKEN เป็น Dashscope API key

Qwen3.6-Plus เป็น Open Source หรือไม่?

API โมเดลที่โฮสต์ไม่ใช่ Open-weight แต่เวอร์ชันเล็กกว่าและ Weights สาธารณะยืนยันแล้วว่าจะเปิดตัวเร็วๆ นี้

ฉันจะเข้าถึงได้ฟรีได้อย่างไร?

ติดตั้ง Qwen Code (npm install -g @qwen-code/qwen-code@latest), รัน qwen แล้ว /auth เพื่อ sign-in ด้วย Qwen Code OAuth รับสิทธิ์เรียก API ฟรี 1,000 ครั้ง/วัน

รองรับ Context Window ขนาดเท่าใด?

1 ล้านโทเค็นเป็นค่าเริ่มต้น บาง benchmark ใช้ 256K เพื่อมาตรฐานเปรียบเทียบ แต่ API default คือ 1M

ฉันจะทดสอบการผสาน API ก่อนปรับใช้งานจริงได้อย่างไร?

นำเข้า Endpoint ไปยัง Apidog, เพิ่ม API key เป็น env, เขียน Assertion และใช้ Smart Mock ระหว่างพัฒนา เชื่อมโยง request เป็น Test Scenario เพื่อตรวจสอบ Agent หลายรอบตั้งแต่ต้นจนจบ

DEV Community