DEV Community

Super Jarvis
Super Jarvis

Posted on

Qwen3.7-Max API: How to Call Qwen 3.7 Max with Model Studio

The Qwen3.7-Max API is now documented through the Qwen release materials and Qwen Cloud model card. If you are searching for qwen-3.7 API, qwen3.7 API, or qwen 3.7 API, the important first detail is the model name.

For Model Studio compatible-mode calls, the release example uses:

qwen3.7-max
Enter fullscreen mode Exit fullscreen mode

The Qwen Cloud model card also lists a dated snapshot:

qwen3.7-max-2026-05-20
Enter fullscreen mode Exit fullscreen mode

Use the stable alias when you want the current route. Use the dated ID when your provider exposes it and you need reproducibility.

Try the model first on the Qwen3.7-Max page.

Official Access Paths

The first-party path is Alibaba Cloud Model Studio. The official Qwen3.7-Max release shows OpenAI-compatible chat completions, responses APIs, and an Anthropic-compatible interface for agent tools.

Common compatible-mode base URLs:

Region Base URL
Beijing https://dashscope.aliyuncs.com/compatible-mode/v1
Singapore https://dashscope-intl.aliyuncs.com/compatible-mode/v1
US Virginia https://dashscope-us.aliyuncs.com/compatible-mode/v1

The Qwen Cloud model card also shows a DashScope SDK example using:

https://dashscope-intl.aliyuncs.com/api/v1
Enter fullscreen mode Exit fullscreen mode

For most app integrations, the OpenAI-compatible endpoint is the easiest migration path.

Minimal Python Example

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url=os.environ.get(
        "DASHSCOPE_BASE_URL",
        "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    ),
)

completion = client.chat.completions.create(
    model="qwen3.7-max",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to merge two sorted linked lists.",
        }
    ],
    extra_body={
        "enable_thinking": True,
    },
    stream=True,
)

for chunk in completion:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if getattr(delta, "content", None):
            print(delta.content, end="")
Enter fullscreen mode Exit fullscreen mode

This is the cleanest qwen 3.7 API shape if your existing code already uses the OpenAI SDK.

Thinking Mode and preserve_thinking

Qwen3.7-Max is positioned for agentic tasks, so thinking mode matters. The official example enables thinking through:

extra_body={"enable_thinking": True}
Enter fullscreen mode Exit fullscreen mode

The release also describes preserve_thinking, which keeps thinking content from preceding turns in messages. That is useful for long agent runs where the model needs to keep track of prior reasoning, tool outcomes, and next-step strategy.

Use it carefully. Preserving extra thinking content can improve continuity, but it also increases token usage. For short chat, leave it off. For multi-step qwen3.7 coding agents, test it directly.

Claude Code and Other Agent Harnesses

Qwen APIs also support an Anthropic-compatible route. The official release shows this shape for Claude Code:

export ANTHROPIC_MODEL="qwen3.7-max"
export ANTHROPIC_SMALL_FAST_MODEL="qwen3.7-max"
export ANTHROPIC_BASE_URL=https://dashscope-intl.aliyuncs.com/apps/anthropic
export ANTHROPIC_AUTH_TOKEN=<your_api_key>
Enter fullscreen mode Exit fullscreen mode

That is important because Qwen 3.7 Max is meant to run inside coding assistants and agent scaffolds, not only inside direct chat completion calls.

Pricing and Context

The Qwen Cloud model card lists Qwen3.7-Max with:

Field Value
Context 1M tokens
Max input 991.80K tokens
Max output 65.53K tokens
Input price $2.50 per 1M tokens
Output price $7.50 per 1M tokens
RPM 600
TPM 1M

Always confirm pricing in your actual provider console before committing production traffic. Providers can change price, quota, and region availability independently.

Integration Tips

  1. Start with qwen3.7-max in a staging environment.
  2. Use streaming for coding and agent UX.
  3. Set max_tokens intentionally instead of relying on the maximum output size.
  4. Log tool calls and final answers separately.
  5. Test enable_thinking and preserve_thinking only on workflows where they are likely to help.
  6. Compare qwen-3.7 against Qwen3.6-Plus on the same prompts before switching all traffic.

Bottom Line

The Qwen3.7-Max API is no longer just a watchlist item. The official materials now give a model alias, regional compatible-mode endpoints, thinking mode, preserve_thinking, and agent harness examples.

For production work, treat qwen-3.7, qwen3.7, and qwen 3.7 API integration like any other hosted model migration: pin the model where possible, validate costs, test long-context behavior, and keep fallback routing until your own workloads pass.

Related: Qwen3.7-Max benchmark and Qwen3.7-Max context window.

References

Original links

Top comments (0)