Hassann

Posted on May 21 • Originally published at apidog.com

How to Use the Qwen 3.7 API?

Alibaba’s Qwen team shipped Qwen3.7-Max-Preview in mid-May 2026, and the first implementation question is simple: how do you call it from your own code? Qwen3.7-Max-Preview is a flagship reasoning model with a 1M-token context window and explicit chain-of-thought traces, making it useful for agent backends, long-document workflows, and code generation. Because it is still a preview model, access is gated, model IDs can change, and you should verify the current API details before shipping.

Try Apidog today

TL;DR

Qwen3.7-Max-Preview is Alibaba’s preview reasoning model, released on May 14, 2026, with a 1M-token context window. The fastest way to evaluate it is Qwen Chat at chat.qwen.ai. For application integration, use Alibaba Cloud Model Studio, also known as DashScope, through its OpenAI-compatible /chat/completions endpoint.

In practice, you:

Create an Alibaba Cloud Model Studio API key.
Choose the correct regional base URL.
Use the OpenAI SDK or raw HTTP.
Set the Qwen model ID in the model field.
Validate the current preview model name in the official docs before deploying.

Because the 3.7 tier is preview-only, use Apidog to test, save, and mock requests while API availability stabilizes.

How to access Qwen 3.7 right now

Qwen models are exposed through multiple channels, but they do not all become available at the same time. As of late May 2026, these are the practical options.

Option 1: Qwen Chat

Use chat.qwen.ai if you want to evaluate the model quickly.

Steps:

Sign in with a Qwen account.
Select qwen3.7-max-preview in the model selector.
Enable Thinking Mode if you want to inspect the reasoning trace.
Test prompts directly in the browser.

This path is free and rate-limited during preview. It is useful for prompt testing, but it is not an API integration path.

Option 2: Alibaba Cloud Model Studio / DashScope

Use Alibaba Cloud Model Studio when you need API access.

Model Studio exposes Qwen through an OpenAI-compatible API, so existing OpenAI SDK code can usually be adapted by changing:

the base_url
the API key
the model value

Older tiers such as qwen3.6-max-preview and the qwen-max family are already available through this pattern. The 3.7 preview tier may not have a public API entry when you read this, so confirm availability in the current Model Studio docs.

Option 3: OpenAI-compatible request pattern

Recent Qwen models on Model Studio use the same basic request structure:

POST /chat/completions
Authorization: Bearer <DASHSCOPE_API_KEY>
Content-Type: application/json

You send a standard messages array and receive a standard chat completion response. When Qwen 3.7 API access is available for your account, the main change should be the model identifier.

Use these docs as the source of truth:

For zero-cost preview access, see this guide on how to use Qwen 3.7 for free.

Access methods at a glance

Method	API access	Cost	Best for
Qwen Chat (chat.qwen.ai)	No	Free, rate-limited	Quick evaluation and prompt testing
Alibaba Cloud Model Studio / DashScope	Yes, OpenAI-compatible	Pay per token	Production integration
Qwen on Hugging Face	Weights, when released	Free if self-hosted	Open-weight models, not Max preview
Third-party gateways	Varies	Varies	Multi-model routing

One important distinction: open-weight Qwen models may appear on Hugging Face, but the Max-Preview tier is proprietary. Do not expect downloadable weights for qwen3.7-max-preview.

Get a Qwen 3.7 API key

API access goes through Alibaba Cloud Model Studio.

Step 1: Create an Alibaba Cloud account

Open the Model Studio console:

modelstudio.console.alibabacloud.com

Step 2: Activate Model Studio

Enable Model Studio for your account and selected region.

Keys are region-scoped. For example, a key created for the Singapore endpoint will not authenticate against the Beijing endpoint.

Step 3: Generate an API key

In the Model Studio console:

Open the API keys section.
Create a new key.
Copy it immediately.
Store it securely.

The key usually starts with:

sk-

Step 4: Choose your regional base URL

Region	Base URL
Singapore	`https://dashscope-intl.aliyuncs.com/compatible-mode/v1`
US / Virginia	`https://dashscope-us.aliyuncs.com/compatible-mode/v1`
Beijing / China	`https://dashscope.aliyuncs.com/compatible-mode/v1`

Step 5: Store the key in an environment variable

Do not hardcode API keys in committed source code.

For macOS or Linux:

export DASHSCOPE_API_KEY="sk-your-key-here"

For Windows PowerShell:

setx DASHSCOPE_API_KEY "sk-your-key-here"

Your application should read DASHSCOPE_API_KEY at runtime. This keeps the secret out of your repository and lets you rotate keys without code changes.

The same pattern applies to other model APIs; for example, see this guide to the Gemini 3.5 API.

Make your first Qwen request

Qwen’s Model Studio endpoint is OpenAI-compatible. You can call it with:

the OpenAI SDK
curl
any HTTP client

Before running these examples, confirm the live model ID in the Model Studio model list. qwen3.7-max-preview is the Qwen Chat preview identifier, but the API identifier may differ during preview.

Python example

Install the SDK:

pip install openai

Send a chat completion request:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

response = client.chat.completions.create(
    model="qwen3.7-max-preview",  # Confirm current API model ID before use
    messages=[
        {"role": "system", "content": "You are a precise coding assistant."},
        {"role": "user", "content": "Write a Python function that reverses a linked list."},
    ],
)

print(response.choices[0].message.content)

The messages array follows the standard chat format:

system sets behavior.
user provides the request.
The generated answer is returned in choices[0].message.content.

curl example

Use curl to verify your key, model ID, and endpoint before writing app code:

curl 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
  --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "qwen3.7-max-preview",
    "messages": [
      {
        "role": "user",
        "content": "Explain idempotency in REST APIs in two sentences."
      }
    ]
  }'

If the request is valid, the API returns a JSON chat completion. If not, inspect the error body for the failing field, region, model ID, or credential issue.

JavaScript / Node.js example

Install the SDK:

npm install openai

Call Qwen through the DashScope-compatible endpoint:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

const response = await client.chat.completions.create({
  model: "qwen3.7-max-preview", // Confirm current API model ID before use
  messages: [
    {
      role: "user",
      content: "List three trade-offs of GraphQL versus REST.",
    },
  ],
});

console.log(response.choices[0].message.content);

The request shape is the same across Python, JavaScript, and raw HTTP.

Stream Qwen responses

For user-facing applications, stream responses instead of waiting for the full completion. This is especially useful for reasoning models, because they can take longer before producing a final answer.

Python streaming example

stream = client.chat.completions.create(
    model="qwen3.7-max-preview",
    messages=[
        {"role": "user", "content": "Summarize the CAP theorem."},
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Node.js streaming example

const stream = await client.chat.completions.create({
  model: "qwen3.7-max-preview",
  messages: [
    {
      role: "user",
      content: "Summarize the CAP theorem.",
    },
  ],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Streaming lets you show:

partial output
a typing indicator
reasoning progress, if your app chooses to expose it
the final answer as it forms

Enable reasoning / Thinking Mode

Qwen3.7-Max-Preview is a reasoning model. It can produce explicit reasoning in <think> blocks before the final answer.

On recent Qwen models served through DashScope, thinking behavior is controlled with an enable_thinking flag. Because reasoning controls have changed across Qwen versions, confirm the current parameter name and behavior in the Model Studio API reference before relying on it.

Conceptually, the request looks like this:

response = client.chat.completions.create(
    model="qwen3.7-max-preview",
    messages=[
        {
            "role": "user",
            "content": (
                "A train leaves at 2pm averaging 60mph. "
                "A second leaves at 3pm at 75mph on the same route. "
                "When does the second catch the first?"
            ),
        },
    ],
    extra_body={
        "enable_thinking": True
    },
)

print(response.choices[0].message.content)

Use thinking mode selectively.

When to enable thinking

Enable it for:

multi-step math
code generation with edge cases
planning
long-context analysis
agentic workflows

When to disable thinking

Disable it for:

simple classification
formatting tasks
short factual answers
low-latency UI paths

Decide whether to show the trace

Some applications expose <think> content so users can inspect the model’s reasoning. Others strip it and show only the final answer. Choose based on your product UX, safety requirements, and token budget.

For a broader comparison, see Qwen 3.7 vs GPT-5.5 vs Opus 4.7. If you are using reasoning models in agent loops, the techniques in this guide to reduce agent token costs apply directly.

Handle errors and rate limits

Preview APIs fail in predictable ways. Handle those cases explicitly.

HTTP status	Meaning	What to do
`400`	Bad request: malformed JSON or invalid parameter	Fix the request body; check model ID and field names
`401`	Invalid or missing API key	Verify the key and region
`403`	No access to the model	Confirm your account is enabled for the preview tier
`404`	Model not found	Check whether the model ID is valid in your region
`429`	Rate limit or quota exceeded	Back off and retry
`500` / `503`	Server-side error	Retry with exponential backoff

Preview models can return 403 or 404 more often than stable models because access is gated and identifiers may change. In those cases, the fix is usually account access or the model string, not your application code.

Python retry example

import os
import time
from openai import OpenAI, RateLimitError, APIStatusError

client = OpenAI(
    api_key=os.environ["DASHSCOPE_API_KEY"],
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

def ask_qwen(prompt, max_retries=4):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="qwen3.7-max-preview",
                messages=[
                    {"role": "user", "content": prompt}
                ],
            )
            return response.choices[0].message.content

        except RateLimitError:
            wait = 2 ** attempt
            print(f"Rate limited. Retrying in {wait}s...")
            time.sleep(wait)

        except APIStatusError as e:
            # 400/401/403/404 usually require a request or access fix.
            print(f"API error {e.status_code}: {e.message}")
            raise

    raise RuntimeError("Failed after retries")

Use this retry split:

retry 429 and 5xx
fail fast on most 4xx
log the response body for debugging
avoid retry loops that hammer the preview API

Test and mock the Qwen API with Apidog

Preview APIs are unstable by nature: model IDs may change, access may be gated, and rate limits may be tight. Instead of debugging only through application logs, test the Qwen request directly and save known-good scenarios.

Apidog helps with that workflow.

Build the request in Apidog

Create a request with:

POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Add headers:

Authorization: Bearer {{DASHSCOPE_API_KEY}}
Content-Type: application/json

Add the body:

{
  "model": "qwen3.7-max-preview",
  "messages": [
    {
      "role": "user",
      "content": "Explain idempotency in REST APIs in two sentences."
    }
  ]
}

Use an environment variable for the API key so you can switch between regions, accounts, and mock servers without editing the request.

Mock the endpoint while preview access is gated

If you do not have live access yet, define the expected request and response schema in Apidog, then use its mock server as a stand-in endpoint.

That lets your frontend, backend, or agent code develop against a stable API shape while the real Qwen preview endpoint is unavailable, throttled, or gated.

When live access is ready, switch the base URL from the Apidog mock server to DashScope. Your request body and client code can remain the same.

For more on schema-first API workflows, see the spec-first mode walkthrough.

The same test-and-mock loop works for Qwen, Gemini, and the ERNIE 5.1 API. Preview models make mocking especially useful because the real endpoint is often the least stable dependency.

Conclusion

Calling Qwen 3.7 is straightforward once you have the correct access path. The main friction is preview gating, not the API shape.

Use Qwen Chat for evaluation. Use Alibaba Cloud Model Studio for API integration. Confirm the current model ID before shipping. Then test, save, and mock your requests so your application can keep moving even while preview availability changes.

Download Apidog to design the Qwen endpoint, send real test requests, save reusable scenarios, and mock the API while you build.

DEV Community