DEV Community: Super Jarvis

Qwen3.6-Plus API: How to Access and Integrate Qwen 3.6

Super Jarvis — Fri, 22 May 2026 13:08:40 +0000

If you have been working with Qwen 3.5 models through APIs and are wondering how to access Qwen3.6-Plus, this guide covers the key differences and how to get started.

Want to test the model before writing any code? Chat with Qwen3.6-Plus free.

How Qwen3.6-Plus API Access Works

Qwen3.6-Plus is a hosted model, which means you access it through API calls rather than downloading weights. The primary access paths are:

Alibaba Cloud DashScope API — the first-party API from the Qwen team
OpenRouter — third-party aggregator that provides a unified API for multiple model providers
Other API aggregators — several providers have added Qwen 3.6 models to their catalogs

The API follows the OpenAI-compatible chat completions format, which means if you have existing code that works with GPT-4 or Claude, switching to Qwen3.6-Plus usually requires changing the model name and endpoint.

Basic API Request

Here is a standard chat completion request:

curl https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus-latest",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between TCP and UDP in simple terms."}
    ]
  }'

Tool Calling with Qwen3.6-Plus

One of the key improvements in Qwen3.6-Plus is tool calling. Here is how to define and use tools:

import openai

client = openai.OpenAI(
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    api_key="YOUR_API_KEY"
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="qwen-plus-latest",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

Enabling Thinking Mode

To use the step-by-step reasoning mode:

response = client.chat.completions.create(
    model="qwen-plus-latest",
    messages=[{"role": "user", "content": "Debug this Python function..."}],
    extra_body={"enable_thinking": True}
)

Thinking mode adds latency but significantly improves output quality for complex reasoning, debugging, and multi-step planning tasks.

Key Differences from Qwen 3.5 APIs

Feature	Qwen 3.5 API	Qwen3.6-Plus API
Context window	262K (open models)	1M default
Tool calling	Supported	Improved reliability
Multimodal input	Varies by model	Text + images + docs
Thinking mode	Supported	Supported
Self-hosting	Yes (open weights)	No (hosted only)

Pricing Considerations

Qwen3.6-Plus is a hosted model, so you pay per token. Pricing varies by provider:

DashScope — check the current pricing on the Alibaba Cloud console
OpenRouter — typically shows per-token pricing on the model page
QChat — you can try the model for free with credits on qwen35.com

If cost is a concern and your tasks do not need 1M context or advanced tool calling, the open Qwen 3.5 models (self-hosted) may be more economical.

Integration Tips

Start with the chat interface at qwen35.com to validate your use case before writing API code.
Use streaming for better UX in interactive applications — the API supports server-sent events.
Set reasonable max_tokens — do not default to the maximum. Shorter limits reduce cost and latency.
Handle tool calls gracefully — always validate tool call arguments before executing them.
Test with and without thinking mode to find the right balance for your specific tasks.

Try It First

Before integrating the API, test Qwen3.6-Plus in the browser to see if it handles your prompts well. Then move to API integration once you have confirmed the model fits your use case.

Original links

Original article: https://qwen35.com/qwen3.6-plus-api
Homepage: https://qwen35.com/
Model page: https://qwen35.com/qwen3.6-plus

Qwen3.7-Max Context Window: What 1M Tokens Changes

Super Jarvis — Fri, 22 May 2026 13:07:34 +0000

The Qwen3.7-Max context window is one of the most important practical specs in the release. The Qwen Cloud model card lists 1M tokens of context, with 991.80K max input and 65.53K max output.

That makes qwen 3.7 max context window, qwen-3.7 context window, and qwen3.7 context window searches worth answering carefully. A 1M window is useful, but it does not mean every prompt should be a token dump.

For the model overview, see Qwen3.7-Max.

The Confirmed Context Specs

Field	Qwen3.7-Max value
Context window	1M tokens
Max input	991.80K tokens
Max output	65.53K tokens
Input modality	Text
Output modality	Text

Those numbers make Qwen 3.7 Max a serious long-context model for documents, repositories, multi-turn agent sessions, and large task histories.

Why 1M Context Matters for Agents

Long context is not only about pasting bigger documents. For qwen3.7, the more important use case is agent continuity.

Agent tasks accumulate state:

original goal
constraints
tool calls
test output
failed attempts
user corrections
intermediate plans
final acceptance criteria

When a model loses that state, it starts repeating work or changing direction. A 1M context window gives Qwen3.7-Max more room to keep the full task visible, especially when paired with thinking mode and careful message structure.

Where the Bigger Window Helps Most

Repository work

A code task often needs more than one file. You may need a route, component, schema, config, failing test, and the original product requirement. The qwen-3.7 context window lets you keep more of that material together before you have to summarize or retrieve.

Long documents

Contracts, policies, specs, meeting transcripts, and research notes benefit from fewer early cuts. The model can compare more original text instead of depending on compressed summaries.

Multi-hour agent runs

The official Qwen3.7-Max release emphasizes long-horizon execution, including a 35-hour kernel optimization run. A large context window is not the only reason that works, but it is part of the infrastructure that helps the model preserve task history and avoid instruction drift.

Office automation

Spreadsheet work, document formatting, report synthesis, and MCP workflows often mix instructions with source data. A larger context window leaves room for both.

What 1M Context Does Not Solve

A 1M context window is room, not judgment.

It does not fix:

irrelevant source material
duplicated context
weak prompts
missing retrieval
unsafe tool execution
unclear acceptance criteria

Sometimes a shorter, cleaner prompt will beat a massive prompt. Long context helps when the extra material is relevant and well labeled.

Prompting Tips for Qwen3.7-Max Long Context

Use this structure for long qwen 3.7 Max prompts:

State the task in one sentence.
List the constraints before the source material.
Label each document or file section.
Tell the model what evidence to prioritize.
Ask for a plan before asking for final output.
Keep generated summaries separate from raw source text.
Use preserve_thinking only when you have tested the cost and quality tradeoff.

The goal is to help the model search inside the context, not merely to fill the window.

How It Compares to Qwen3.6-Plus

Qwen3.6-Plus also uses a 1M context story, but Qwen3.7-Max is framed more heavily around agent execution and long-horizon autonomy. If your task is a long document summary, both may be worth testing. If your task mixes documents, tools, and multi-step coding, Qwen3.7-Max is the more relevant comparison point.

Bottom Line

The Qwen3.7-Max context window is a real product-level advantage: 1M tokens of room, nearly 992K tokens of input, and a large output ceiling.

Use it for long documents, multi-file coding, and agent sessions where losing early context would break the task. Do not use it as an excuse to paste everything. qwen-3.7, qwen3.7, and qwen 3.7 Max work best when long context is organized, labeled, and tied to a clear goal.

References

Original links

Original article: https://qwen35.com/blog/qwen3.7-max-context-window
Homepage: https://qwen35.com/
Model page: https://qwen35.com/qwen-3.7-max

Qwen3.7-Max API: How to Call Qwen 3.7 Max with Model Studio

Super Jarvis — Fri, 22 May 2026 13:06:31 +0000

The Qwen3.7-Max API is now documented through the Qwen release materials and Qwen Cloud model card. If you are searching for qwen-3.7 API, qwen3.7 API, or qwen 3.7 API, the important first detail is the model name.

For Model Studio compatible-mode calls, the release example uses:

qwen3.7-max

The Qwen Cloud model card also lists a dated snapshot:

qwen3.7-max-2026-05-20

Use the stable alias when you want the current route. Use the dated ID when your provider exposes it and you need reproducibility.

Try the model first on the Qwen3.7-Max page.

Official Access Paths

The first-party path is Alibaba Cloud Model Studio. The official Qwen3.7-Max release shows OpenAI-compatible chat completions, responses APIs, and an Anthropic-compatible interface for agent tools.

Common compatible-mode base URLs:

Region	Base URL
Beijing	`https://dashscope.aliyuncs.com/compatible-mode/v1`
Singapore	`https://dashscope-intl.aliyuncs.com/compatible-mode/v1`
US Virginia	`https://dashscope-us.aliyuncs.com/compatible-mode/v1`

The Qwen Cloud model card also shows a DashScope SDK example using:

https://dashscope-intl.aliyuncs.com/api/v1

For most app integrations, the OpenAI-compatible endpoint is the easiest migration path.

Minimal Python Example

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url=os.environ.get(
        "DASHSCOPE_BASE_URL",
        "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    ),
)

completion = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to merge two sorted linked lists.",
        }
    ],
    extra_body={
        "enable_thinking": True,
    },
    stream=True,
)

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if getattr(delta, "content", None):
            print(delta.content, end="")

This is the cleanest qwen 3.7 API shape if your existing code already uses the OpenAI SDK.

Thinking Mode and preserve_thinking

Qwen3.7-Max is positioned for agentic tasks, so thinking mode matters. The official example enables thinking through:

extra_body={"enable_thinking": True}

The release also describes preserve_thinking, which keeps thinking content from preceding turns in messages. That is useful for long agent runs where the model needs to keep track of prior reasoning, tool outcomes, and next-step strategy.

Use it carefully. Preserving extra thinking content can improve continuity, but it also increases token usage. For short chat, leave it off. For multi-step qwen3.7 coding agents, test it directly.

Claude Code and Other Agent Harnesses

Qwen APIs also support an Anthropic-compatible route. The official release shows this shape for Claude Code:

export ANTHROPIC_MODEL="qwen3.7-max"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.7-max"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=<your_api_key>

That is important because Qwen 3.7 Max is meant to run inside coding assistants and agent scaffolds, not only inside direct chat completion calls.

Pricing and Context

The Qwen Cloud model card lists Qwen3.7-Max with:

Field	Value
Context	1M tokens
Max input	991.80K tokens
Max output	65.53K tokens
Input price	$2.50 per 1M tokens
Output price	$7.50 per 1M tokens
RPM	600
TPM	1M

Always confirm pricing in your actual provider console before committing production traffic. Providers can change price, quota, and region availability independently.

Integration Tips

Start with qwen3.7-max in a staging environment.
Use streaming for coding and agent UX.
Set max_tokens intentionally instead of relying on the maximum output size.
Log tool calls and final answers separately.
Test enable_thinking and preserve_thinking only on workflows where they are likely to help.
Compare qwen-3.7 against Qwen3.6-Plus on the same prompts before switching all traffic.

Bottom Line

The Qwen3.7-Max API is no longer just a watchlist item. The official materials now give a model alias, regional compatible-mode endpoints, thinking mode, preserve_thinking, and agent harness examples.

For production work, treat qwen-3.7, qwen3.7, and qwen 3.7 API integration like any other hosted model migration: pin the model where possible, validate costs, test long-context behavior, and keep fallback routing until your own workloads pass.

References

Original links

Original article: https://qwen35.com/blog/qwen3.7-max-api
Homepage: https://qwen35.com/
Model page: https://qwen35.com/qwen-3.7-max

Qwen3.7-Max Benchmark: Agentic Coding, Reasoning, and Long-Horizon Scores

Super Jarvis — Fri, 22 May 2026 13:05:31 +0000

Qwen3.7-Max is not being framed as a small chat refresh. The official Qwen3.7 release is built around agent work: coding, tool use, office automation, and long-horizon execution.

That matters when reading any qwen-3.7 benchmark, qwen3.7 benchmark, or qwen 3.7 max benchmark page. The headline is not only whether Qwen3.7-Max answers harder questions. The more useful question is whether it can keep a real task alive across tools, files, tests, and feedback.

For the product overview, start with the Qwen3.7-Max model page.

The Benchmark Story Starts with Agentic Coding

The official Qwen3.7-Max benchmark table puts a lot of weight on repository and terminal tasks:

Benchmark	Qwen3.7-Max result	What it suggests
Terminal-Bench 2.0-Terminus	69.7	Strong terminal execution and repair loop behavior
SWE-Verified	80.4	Competitive repository-level bug fixing
SWE-Pro	60.6	Harder software engineering tasks beyond the standard set
SWE-Multilingual	78.3	Cross-language coding and issue handling
SciCode	53.5	Scientific coding and technical implementation

The important detail is the harness. Qwen says the SWE-Bench series used an internal agent scaffold with bash and file-edit tools, and Terminal-Bench used a 256K context setup with a five-hour timeout. Those conditions are closer to real agent operation than a single-turn coding prompt.

So the right takeaway is not "Qwen 3.7 Max writes snippets." It is that qwen3.7 is being optimized and evaluated as a model that can operate inside a loop.

Tool Use Is the Bigger Signal

Several Qwen3.7-Max results are more interesting than classic coding scores:

MCP-Mark: 60.8
MCP-Atlas: 76.4
SkillsBench: 59.2
BFCL-V4: 75.0
SpreadSheetBench-v1: 87.0
Kernel Bench L3: 1.98x median speedup with a 96% win rate

That cluster says more about the release than a generic leaderboard rank. Qwen3.7-Max is being tested on whether it can call tools, work through agent harnesses, and produce useful results in environments where the answer is not already packaged into the prompt.

This is also why the Qwen team emphasizes cross-harness generalization. Qwen3.7-Max is presented as working across Claude Code, OpenClaw, Qwen Code, and custom tool-use systems. If that holds up in production, it is more valuable than a model that only performs inside one carefully tuned demo shell.

The 35-Hour Kernel Run Is the Release's Sharpest Demo

The most memorable Qwen 3.7 Max benchmark is not a leaderboard row. It is the long autonomous kernel optimization run.

In the official write-up, Qwen3.7-Max worked for about 35 hours on an unseen T-Head ZW-M890 hardware platform. It performed 432 kernel evaluations across 1,158 tool calls, then reached a 10.0x geometric mean speedup over the Triton reference.

This is the clearest signal about what qwen-3.7 is trying to be. The point is not that every user will ask it to optimize kernels for a new chip. The point is that the model kept an execution strategy coherent after many tool calls, compile failures, profiling loops, and redesign attempts.

That is the kind of behavior ordinary chat benchmarks rarely measure.

Reasoning Scores Still Matter

Qwen3.7-Max also has strong reasoning numbers:

Benchmark	Result
GPQA Diamond	92.4
HLE	41.4
HMMT 2026 Feb	97.1
IMOAnswerBench	90.0
IFBench	79.1
WMT24++	85.8

These scores matter because agents still need reasoning. Tool use without judgment becomes noisy automation. The interesting part is that Qwen 3.7 Max combines reasoning results with agent execution results, rather than positioning the model as only a math or chat upgrade.

How to Test the Benchmark Claims Yourself

Do not validate qwen3.7 with only a short prompt. Use tasks that expose the thing this release claims to improve:

Give it a real bug report plus logs and ask for an evidence-ranked fix plan.
Ask it to compare two implementation paths and name the safer one.
Give it a multi-file feature request and require tests before finalizing.
Ask it to explain when it would call tools, when it would stop, and what it would verify.
Run the same task on Qwen3.6-Plus or Qwen3.6-Max-Preview and compare failure recovery.

That is the useful way to read a Qwen3.7-Max benchmark. The question is not only "did it score higher?" The question is "does it keep working when the task becomes messy?"

Bottom Line

Qwen3.7-Max benchmark results point to a model designed for agent workflows: coding agents, tool orchestration, long documents, office automation, and multi-hour execution.

The scores are strong, but the release is most interesting because of the shape of the evaluation. qwen-3.7, qwen3.7, and qwen 3.7 Max are being judged less like ordinary chat models and more like systems that need to plan, act, observe, and recover.

Next, read the Qwen3.7-Max API guide or the Qwen3.7-Max context window guide.

References

Original links

Original article: https://qwen35.com/blog/qwen3.7-max-benchmark
Homepage: https://qwen35.com/
Model page: https://qwen35.com/qwen-3.7-max

Qwen3.7-Max and Agentic Coding: What to Watch First

Super Jarvis — Fri, 22 May 2026 13:02:55 +0000

The most interesting thing about Qwen3.7-Max is not that it is another newer model. The important signal is that Alibaba is presenting qwen-3.7, qwen3.7, and qwen 3.7 as a model family for agentic coding, complex reasoning, and long-running tool workflows.

If you want the model overview first, start with the Qwen3.7-Max page.

Why agentic coding matters

Short coding prompts hide the difference between models. A model can write a function and still fail at planning a migration, reading a stack trace, choosing the next file to inspect, or recovering after a failing test.

That is why qwen 3.7 should be evaluated with workflows, not toy prompts:

ask it to inspect a real diff
make it produce an implementation plan before editing
include tests and failure criteria
require tool-use decisions
compare the final plan against a lighter Qwen model

Qwen3.7-Max will matter most if it can keep a long engineering thread intact.

What is now confirmed

The official Qwen3.7 materials now provide enough detail to move beyond a watchlist. Model Studio examples use qwen3.7-max, Qwen Cloud lists the dated snapshot qwen3.7-max-2026-05-20, and the model card shows a 1M context window.

That makes the evaluation more concrete. The key question is no longer whether qwen-3.7 has an API path. The key question is whether Qwen 3.7 Max actually improves your agent workflow compared with Qwen3.6-Plus or Qwen3.6-Max-Preview.

Practical test prompts

Use prompts that force the model to stay organized:

"Review this migration plan, identify the most likely production failure, and propose a safer sequence."
"Given these logs and files, diagnose the bug, list evidence, and suggest the smallest patch."
"Design an agent workflow that searches documentation, edits code, runs tests, and stops safely."
"Compare Qwen3.7-Max with the current Qwen 3.6 option on this exact repo task."

That is a better way to test qwen 3.7 than asking for a generic Python snippet.

Bottom line

Qwen3.7-Max is an agentic-coding model first. Treat qwen-3.7 and qwen3.7 as serious new production candidates, but keep the final decision tied to official API documentation, cost checks, and your own long-running tests.

Original links

Original article: https://qwen35.com/blog/qwen3.7-max-agentic-coding
Homepage: https://qwen35.com/
Model page: https://qwen35.com/qwen-3.7-max

DeepSeek V4 Benchmark: Pro and Flash Scores

Super Jarvis — Fri, 22 May 2026 12:31:44 +0000

DeepSeek V4 Benchmark: Pro and Flash Scores

The DeepSeek V4 release materials include benchmark rows for DeepSeek V4 Flash and DeepSeek V4 Pro in Max mode.

Benchmarks are useful as a first routing signal, but production defaults should still be decided with prompts from your own workload.

Official snapshot

Model	MMLU-Pro	LiveCodeBench	SWE Verified
DeepSeek V4 Flash	86.2	91.6	79.0
DeepSeek V4 Pro	87.5	93.5	80.6

Sources: DeepSeek-V4-Pro model card and DeepSeek_V4.pdf.

What the numbers suggest

Pro leads the snapshot, especially where reasoning and coding ceilings matter. Flash is close enough that it can be the default for many high-volume workflows, especially when the task can tolerate a second pass or escalation.

How to evaluate in production

Do not ship on public benchmarks alone. Build a small internal eval set with your real prompts:

20 frequent user requests
20 difficult edge cases
20 code or reasoning tasks
10 long-context tasks

Run Flash first, Pro second, then compare correctness, latency, and cost. The best default is usually workload-specific.

Source article: Read the original post

Homepage: Visit the site

Model pages:

DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks Guide

Super Jarvis — Fri, 22 May 2026 11:15:24 +0000

DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks

The DeepSeek V4 technical report describes a preview V4 family with two Mixture-of-Experts language models:

DeepSeek V4 Pro: 1.6T total parameters, 49B activated parameters, 1M context.
DeepSeek V4 Flash: 284B total parameters, 13B activated parameters, 1M context.

Primary sources:

What the technical report focuses on

The report frames DeepSeek V4 around efficient long-context intelligence. The headline product implication is simple: both V4 Pro and V4 Flash expose a 1M-token context window, but they target different cost and capability envelopes.

Pro is the higher-capacity model for hard reasoning, coding, and agentic workflows. Flash is the lower-cost model for high-volume chat, summarization, routing, and everyday product paths.

Architecture notes

The report highlights several architecture and optimization upgrades:

Hybrid attention for long-context efficiency.
Manifold-Constrained Hyper-Connections for stronger signal propagation.
Muon optimizer for training stability and convergence.
MoE scaling with separate Pro and Flash model sizes.

Use the architecture section to decide what to measure, not as a substitute for measuring your own prompts.

For builders, the practical question is not just which model has the larger parameter count. The question is where longer context, cache behavior, and reasoning effort change the cost-quality curve.

Training and post-training

DeepSeek says the V4 models are pre-trained on more than 32T tokens and then post-trained with a multi-stage process. The release materials describe domain-specific expert cultivation followed by model consolidation.

That matters for product evaluation because one benchmark score is not enough. You should test domain tasks directly: code repair, long document synthesis, tool-use workflows, structured extraction, and high-volume support chat.

Reasoning modes

The technical report and model card describe non-thinking, thinking, and max-thinking styles. In practice:

Use non-thinking mode for low-risk, fast, low-cost responses.
Use thinking mode for math, coding, planning, and multi-step reasoning.
Use max-style reasoning only when the added latency and cost are justified.

The current DeepSeek API pricing page lists deepseek-v4-flash and deepseek-v4-pro as the V4 model IDs.

Benchmark signals

The release materials include benchmark snapshots across knowledge, coding, long-context, and agentic tasks. The site tracks a few practical anchor scores:

Model	MMLU-Pro	LiveCodeBench	SWE Verified
DeepSeek V4 Flash Max	86.2	91.6	79.0
DeepSeek V4 Pro Max	87.5	93.5	80.6

Treat these as routing hints, not final product truth. If your application depends on code changes, retrieval quality, or tool calls, build an eval set from your own traffic and compare Flash against Pro with the same prompts.

Implementation checklist

Before adopting DeepSeek V4 in production, verify:

Which workflows need Pro instead of Flash.
Whether Thinking improves your specific task enough to justify the cost.
How much prompt caching reduces repeated-context cost.
Whether your longest real documents fit cleanly inside the 1M context window.
Whether tool-use and JSON outputs are stable enough for your product contracts.

The technical report explains the direction. Your own evals should decide routing, retry behavior, and credit pricing.

Source article: Read the original post

Homepage: Visit the site

Model pages:

Qwen3.7-Max Launch Roundup for Open Builders: Benchmark, API, 1M Context, and Agentic Coding

Super Jarvis — Fri, 22 May 2026 10:55:28 +0000

Qwen3.7-Max is being positioned as a flagship Qwen route for agentic coding, long-horizon execution, and complex reasoning. Instead of one generic launch recap, we published four focused guides that explain where the release matters in practice.

What the release changes

Agentic coding: Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.
Benchmarks: The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.
API integration: The stable alias is qwen3.7-max, with DashScope compatible-mode endpoints, thinking mode, and preserve_thinking.
Context window: Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.

Start here

Homepage: https://qwen35.com/
Qwen3.7-Max model page: https://qwen35.com/qwen-3.7-max

Original guides

The short version is that Qwen3.7-Max should be evaluated like an agent model, not only a chat model. The real question is whether it can keep a plan intact across tools, files, tests, and long context better than Qwen3.6-Plus or Qwen3.6-Max-Preview on your own workloads.

How to test it

run a real repo task, not a toy snippet
compare planning quality before editing
compare failure recovery after a bad intermediate step
compare long-context document work and tool-using workflows

We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.

Qwen3.7-Max Launch Roundup for Future Builders: Benchmark, API, 1M Context, and Agentic Coding

Super Jarvis — Fri, 22 May 2026 10:52:49 +0000

What the release changes

Agentic coding: Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.
Benchmarks: The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.
API integration: The stable alias is qwen3.7-max, with DashScope compatible-mode endpoints, thinking mode, and preserve_thinking.
Context window: Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.

Start here

Homepage: https://qwen35.com/
Qwen3.7-Max model page: https://qwen35.com/qwen-3.7-max

Original guides

How to test it

run a real repo task, not a toy snippet
compare planning quality before editing
compare failure recovery after a bad intermediate step
compare long-context document work and tool-using workflows

We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.

Qwen3.7-Max Launch Roundup: Benchmark, API, 1M Context, and Agentic Coding

Super Jarvis — Fri, 22 May 2026 10:49:58 +0000

What the release changes

Agentic coding: Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.
Benchmarks: The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.
API integration: The stable alias is qwen3.7-max, with DashScope compatible-mode endpoints, thinking mode, and preserve_thinking.
Context window: Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.

Start here

Homepage: https://qwen35.com/
Qwen3.7-Max model page: https://qwen35.com/qwen-3.7-max

Original guides

How to test it

run a real repo task, not a toy snippet
compare planning quality before editing
compare failure recovery after a bad intermediate step
compare long-context document work and tool-using workflows

We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.

DeepSeek V4 vs Other Models: When Pro or Flash Makes Sense

Super Jarvis — Tue, 28 Apr 2026 17:29:55 +0000

DeepSeek V4 is best evaluated as a two-model family rather than one model.

DeepSeek V4 Pro is the flagship path. DeepSeek V4 Flash is the efficient path. Both list 1M context in the current DeepSeek API pricing table.

A comparison is only useful when it turns into a routing rule: default to the cheaper reliable path, then escalate when quality risk increases.

V4 Pro vs V4 Flash

Choose Pro when:

The task needs the best available DeepSeek V4 benchmark ceiling.
The prompt involves code repair, planning, math, or multi-step tools.
A wrong answer is more expensive than a slower or pricier answer.

Choose Flash when:

The task is high-volume.
The output can be checked, retried, or escalated.
You need 1M context but want lower input and output token costs.

Comparing to other model families

Against other frontier models, DeepSeek V4 Pro should be tested on your hardest real workflows: coding, long-context reasoning, and agentic tasks.

Against efficient models, DeepSeek V4 Flash is the more natural comparison because it keeps 1M context while using lower per-token prices.

Best routing pattern

A practical routing setup is:

Start with Flash for cheap comprehension and summaries.
Escalate to Pro when the task is complex or user-visible.
Add web search only when freshness matters.
Add Thinking only when the task benefits from deeper reasoning.

This keeps cost predictable while preserving quality for hard prompts.

Source article: Read the original post

Homepage: Visit the site

Model pages:

DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks

Super Jarvis — Tue, 28 Apr 2026 17:29:08 +0000

The DeepSeek V4 technical report describes a preview V4 family with two Mixture-of-Experts language models:

DeepSeek V4 Pro: 1.6T total parameters, 49B activated parameters, 1M context.
DeepSeek V4 Flash: 284B total parameters, 13B activated parameters, 1M context.

Primary sources:

What the technical report focuses on

Pro is the higher-capacity model for hard reasoning, coding, and agentic workflows. Flash is the lower-cost model for high-volume chat, summarization, routing, and everyday product paths.

Architecture notes

The report highlights several architecture and optimization upgrades:

Hybrid attention for long-context efficiency.
Manifold-Constrained Hyper-Connections for stronger signal propagation.
Muon optimizer for training stability and convergence.
MoE scaling with separate Pro and Flash model sizes.

Use the architecture section to decide what to measure, not as a substitute for measuring your own prompts.

For builders, the practical question is not just which model has the larger parameter count. The question is where longer context, cache behavior, and reasoning effort change the cost-quality curve.

Training and post-training

Reasoning modes

The technical report and model card describe non-thinking, thinking, and max-thinking styles. In practice:

Use non-thinking mode for low-risk, fast, low-cost responses.
Use thinking mode for math, coding, planning, and multi-step reasoning.
Use max-style reasoning only when the added latency and cost are justified.

The current DeepSeek API pricing page lists deepseek-v4-flash and deepseek-v4-pro as the V4 model IDs.

Benchmark signals

The release materials include benchmark snapshots across knowledge, coding, long-context, and agentic tasks. The site tracks a few practical anchor scores:

Model	MMLU-Pro	LiveCodeBench	SWE Verified
DeepSeek V4 Flash Max	86.2	91.6	79.0
DeepSeek V4 Pro Max	87.5	93.5	80.6

Implementation checklist

Before adopting DeepSeek V4 in production, verify:

Which workflows need Pro instead of Flash.
Whether Thinking improves your specific task enough to justify the cost.
How much prompt caching reduces repeated-context cost.
Whether your longest real documents fit cleanly inside the 1M context window.
Whether tool-use and JSON outputs are stable enough for your product contracts.

The technical report explains the direction. Your own evals should decide routing, retry behavior, and credit pricing.

Source article: Read the original post

Homepage: Visit the site

Model pages: