<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Super Jarvis</title>
    <description>The latest articles on DEV Community by Super Jarvis (@super_jarvis_76aa3fc6035d).</description>
    <link>https://dev.to/super_jarvis_76aa3fc6035d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3890917%2F7334052a-5af2-49af-b96a-5f2e69309689.png</url>
      <title>DEV Community: Super Jarvis</title>
      <link>https://dev.to/super_jarvis_76aa3fc6035d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/super_jarvis_76aa3fc6035d"/>
    <language>en</language>
    <item>
      <title>Qwen3.6-Plus API: How to Access and Integrate Qwen 3.6</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 13:08:40 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen36-plus-api-how-to-access-and-integrate-qwen-36-36em</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen36-plus-api-how-to-access-and-integrate-qwen-36-36em</guid>
      <description>&lt;p&gt;If you have been working with Qwen 3.5 models through APIs and are wondering how to access Qwen3.6-Plus, this guide covers the key differences and how to get started.&lt;/p&gt;

&lt;p&gt;Want to test the model before writing any code? &lt;a href="https://qwen35.com/chat" rel="noopener noreferrer"&gt;Chat with Qwen3.6-Plus free&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Qwen3.6-Plus API Access Works
&lt;/h2&gt;

&lt;p&gt;Qwen3.6-Plus is a hosted model, which means you access it through API calls rather than downloading weights. The primary access paths are:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Alibaba Cloud DashScope API&lt;/strong&gt; — the first-party API from the Qwen team&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt; — third-party aggregator that provides a unified API for multiple model providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other API aggregators&lt;/strong&gt; — several providers have added Qwen 3.6 models to their catalogs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The API follows the OpenAI-compatible chat completions format, which means if you have existing code that works with GPT-4 or Claude, switching to Qwen3.6-Plus usually requires changing the model name and endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Basic API Request
&lt;/h2&gt;

&lt;p&gt;Here is a standard chat completion request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen-plus-latest",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain the difference between TCP and UDP in simple terms."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Tool Calling with Qwen3.6-Plus
&lt;/h2&gt;

&lt;p&gt;One of the key improvements in Qwen3.6-Plus is tool calling. Here is how to define and use tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://dashscope.aliyuncs.com/compatible-mode/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_weather&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;City name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen-plus-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in Tokyo?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Enabling Thinking Mode
&lt;/h2&gt;

&lt;p&gt;To use the step-by-step reasoning mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen-plus-latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Debug this Python function...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable_thinking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thinking mode adds latency but significantly improves output quality for complex reasoning, debugging, and multi-step planning tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Differences from Qwen 3.5 APIs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Qwen 3.5 API&lt;/th&gt;
&lt;th&gt;Qwen3.6-Plus API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;262K (open models)&lt;/td&gt;
&lt;td&gt;1M default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool calling&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Improved reliability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal input&lt;/td&gt;
&lt;td&gt;Varies by model&lt;/td&gt;
&lt;td&gt;Text + images + docs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thinking mode&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-hosting&lt;/td&gt;
&lt;td&gt;Yes (open weights)&lt;/td&gt;
&lt;td&gt;No (hosted only)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Pricing Considerations
&lt;/h2&gt;

&lt;p&gt;Qwen3.6-Plus is a hosted model, so you pay per token. Pricing varies by provider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DashScope&lt;/strong&gt; — check the current pricing on the Alibaba Cloud console&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt; — typically shows per-token pricing on the model page&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;QChat&lt;/strong&gt; — you can try the model for free with credits on &lt;a href="https://qwen35.com/chat" rel="noopener noreferrer"&gt;qwen35.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If cost is a concern and your tasks do not need 1M context or advanced tool calling, the open Qwen 3.5 models (self-hosted) may be more economical.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Tips
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the chat interface&lt;/strong&gt; at &lt;a href="https://qwen35.com/chat" rel="noopener noreferrer"&gt;qwen35.com&lt;/a&gt; to validate your use case before writing API code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use streaming&lt;/strong&gt; for better UX in interactive applications — the API supports server-sent events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set reasonable max_tokens&lt;/strong&gt; — do not default to the maximum. Shorter limits reduce cost and latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handle tool calls gracefully&lt;/strong&gt; — always validate tool call arguments before executing them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test with and without thinking mode&lt;/strong&gt; to find the right balance for your specific tasks.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try It First
&lt;/h2&gt;

&lt;p&gt;Before integrating the API, &lt;a href="https://qwen35.com/chat" rel="noopener noreferrer"&gt;test Qwen3.6-Plus in the browser&lt;/a&gt; to see if it handles your prompts well. Then move to API integration once you have confirmed the model fits your use case.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original article: &lt;a href="https://qwen35.com/qwen3.6-plus-api" rel="noopener noreferrer"&gt;https://qwen35.com/qwen3.6-plus-api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model page: &lt;a href="https://qwen35.com/qwen3.6-plus" rel="noopener noreferrer"&gt;https://qwen35.com/qwen3.6-plus&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Qwen3.7-Max Context Window: What 1M Tokens Changes</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 13:07:34 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-context-window-what-1m-tokens-changes-3fo4</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-context-window-what-1m-tokens-changes-3fo4</guid>
      <description>&lt;p&gt;The Qwen3.7-Max context window is one of the most important practical specs in the release. The Qwen Cloud model card lists &lt;strong&gt;1M tokens&lt;/strong&gt; of context, with &lt;strong&gt;991.80K max input&lt;/strong&gt; and &lt;strong&gt;65.53K max output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That makes &lt;strong&gt;qwen 3.7 max context window&lt;/strong&gt;, &lt;strong&gt;qwen-3.7 context window&lt;/strong&gt;, and &lt;strong&gt;qwen3.7 context window&lt;/strong&gt; searches worth answering carefully. A 1M window is useful, but it does not mean every prompt should be a token dump.&lt;/p&gt;

&lt;p&gt;For the model overview, see &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;Qwen3.7-Max&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Confirmed Context Specs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Qwen3.7-Max value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max input&lt;/td&gt;
&lt;td&gt;991.80K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max output&lt;/td&gt;
&lt;td&gt;65.53K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input modality&lt;/td&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output modality&lt;/td&gt;
&lt;td&gt;Text&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Those numbers make Qwen 3.7 Max a serious long-context model for documents, repositories, multi-turn agent sessions, and large task histories.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 1M Context Matters for Agents
&lt;/h2&gt;

&lt;p&gt;Long context is not only about pasting bigger documents. For qwen3.7, the more important use case is agent continuity.&lt;/p&gt;

&lt;p&gt;Agent tasks accumulate state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;original goal&lt;/li&gt;
&lt;li&gt;constraints&lt;/li&gt;
&lt;li&gt;tool calls&lt;/li&gt;
&lt;li&gt;test output&lt;/li&gt;
&lt;li&gt;failed attempts&lt;/li&gt;
&lt;li&gt;user corrections&lt;/li&gt;
&lt;li&gt;intermediate plans&lt;/li&gt;
&lt;li&gt;final acceptance criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a model loses that state, it starts repeating work or changing direction. A 1M context window gives Qwen3.7-Max more room to keep the full task visible, especially when paired with thinking mode and careful message structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the Bigger Window Helps Most
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Repository work
&lt;/h3&gt;

&lt;p&gt;A code task often needs more than one file. You may need a route, component, schema, config, failing test, and the original product requirement. The qwen-3.7 context window lets you keep more of that material together before you have to summarize or retrieve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Long documents
&lt;/h3&gt;

&lt;p&gt;Contracts, policies, specs, meeting transcripts, and research notes benefit from fewer early cuts. The model can compare more original text instead of depending on compressed summaries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-hour agent runs
&lt;/h3&gt;

&lt;p&gt;The official Qwen3.7-Max release emphasizes long-horizon execution, including a 35-hour kernel optimization run. A large context window is not the only reason that works, but it is part of the infrastructure that helps the model preserve task history and avoid instruction drift.&lt;/p&gt;

&lt;h3&gt;
  
  
  Office automation
&lt;/h3&gt;

&lt;p&gt;Spreadsheet work, document formatting, report synthesis, and MCP workflows often mix instructions with source data. A larger context window leaves room for both.&lt;/p&gt;

&lt;h2&gt;
  
  
  What 1M Context Does Not Solve
&lt;/h2&gt;

&lt;p&gt;A 1M context window is room, not judgment.&lt;/p&gt;

&lt;p&gt;It does not fix:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;irrelevant source material&lt;/li&gt;
&lt;li&gt;duplicated context&lt;/li&gt;
&lt;li&gt;weak prompts&lt;/li&gt;
&lt;li&gt;missing retrieval&lt;/li&gt;
&lt;li&gt;unsafe tool execution&lt;/li&gt;
&lt;li&gt;unclear acceptance criteria&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sometimes a shorter, cleaner prompt will beat a massive prompt. Long context helps when the extra material is relevant and well labeled.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompting Tips for Qwen3.7-Max Long Context
&lt;/h2&gt;

&lt;p&gt;Use this structure for long qwen 3.7 Max prompts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;State the task in one sentence.&lt;/li&gt;
&lt;li&gt;List the constraints before the source material.&lt;/li&gt;
&lt;li&gt;Label each document or file section.&lt;/li&gt;
&lt;li&gt;Tell the model what evidence to prioritize.&lt;/li&gt;
&lt;li&gt;Ask for a plan before asking for final output.&lt;/li&gt;
&lt;li&gt;Keep generated summaries separate from raw source text.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;preserve_thinking&lt;/code&gt; only when you have tested the cost and quality tradeoff.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The goal is to help the model search inside the context, not merely to fill the window.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Compares to Qwen3.6-Plus
&lt;/h2&gt;

&lt;p&gt;Qwen3.6-Plus also uses a 1M context story, but Qwen3.7-Max is framed more heavily around agent execution and long-horizon autonomy. If your task is a long document summary, both may be worth testing. If your task mixes documents, tools, and multi-step coding, Qwen3.7-Max is the more relevant comparison point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;The Qwen3.7-Max context window is a real product-level advantage: 1M tokens of room, nearly 992K tokens of input, and a large output ceiling.&lt;/p&gt;

&lt;p&gt;Use it for long documents, multi-file coding, and agent sessions where losing early context would break the task. Do not use it as an excuse to paste everything. qwen-3.7, qwen3.7, and qwen 3.7 Max work best when long context is organized, labeled, and tied to a clear goal.&lt;/p&gt;

&lt;p&gt;Related: &lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API&lt;/a&gt; and &lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max benchmark&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.qwencloud.com/models/qwen3.7-max" rel="noopener noreferrer"&gt;Qwen Cloud, Qwen3.7-Max model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen.ai/blog?id=qwen3.7" rel="noopener noreferrer"&gt;Qwen Team, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.alibabacloud.com/blog/qwen3-7-the-agent-frontier_603154" rel="noopener noreferrer"&gt;Alibaba Cloud Community, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original article: &lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;https://qwen35.com/blog/qwen3.7-max-context-window&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Qwen3.7-Max API: How to Call Qwen 3.7 Max with Model Studio</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 13:06:31 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-api-how-to-call-qwen-37-max-with-model-studio-4f2k</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-api-how-to-call-qwen-37-max-with-model-studio-4f2k</guid>
      <description>&lt;p&gt;The Qwen3.7-Max API is now documented through the Qwen release materials and Qwen Cloud model card. If you are searching for &lt;strong&gt;qwen-3.7 API&lt;/strong&gt;, &lt;strong&gt;qwen3.7 API&lt;/strong&gt;, or &lt;strong&gt;qwen 3.7 API&lt;/strong&gt;, the important first detail is the model name.&lt;/p&gt;

&lt;p&gt;For Model Studio compatible-mode calls, the release example uses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qwen3.7-max
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Qwen Cloud model card also lists a dated snapshot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;qwen3.7-max-2026-05-20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the stable alias when you want the current route. Use the dated ID when your provider exposes it and you need reproducibility.&lt;/p&gt;

&lt;p&gt;Try the model first on the &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;Qwen3.7-Max page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Official Access Paths
&lt;/h2&gt;

&lt;p&gt;The first-party path is Alibaba Cloud Model Studio. The official Qwen3.7-Max release shows OpenAI-compatible chat completions, responses APIs, and an Anthropic-compatible interface for agent tools.&lt;/p&gt;

&lt;p&gt;Common compatible-mode base URLs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Region&lt;/th&gt;
&lt;th&gt;Base URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Beijing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://dashscope.aliyuncs.com/compatible-mode/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Singapore&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://dashscope-intl.aliyuncs.com/compatible-mode/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;US Virginia&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://dashscope-us.aliyuncs.com/compatible-mode/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Qwen Cloud model card also shows a DashScope SDK example using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://dashscope-intl.aliyuncs.com/api/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For most app integrations, the OpenAI-compatible endpoint is the easiest migration path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal Python Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DASHSCOPE_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DASHSCOPE_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://dashscope-intl.aliyuncs.com/compatible-mode/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen3.7-max&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function to merge two sorted linked lists.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable_thinking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the cleanest qwen 3.7 API shape if your existing code already uses the OpenAI SDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking Mode and preserve_thinking
&lt;/h2&gt;

&lt;p&gt;Qwen3.7-Max is positioned for agentic tasks, so thinking mode matters. The official example enables thinking through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;extra_body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable_thinking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The release also describes &lt;code&gt;preserve_thinking&lt;/code&gt;, which keeps thinking content from preceding turns in messages. That is useful for long agent runs where the model needs to keep track of prior reasoning, tool outcomes, and next-step strategy.&lt;/p&gt;

&lt;p&gt;Use it carefully. Preserving extra thinking content can improve continuity, but it also increases token usage. For short chat, leave it off. For multi-step qwen3.7 coding agents, test it directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code and Other Agent Harnesses
&lt;/h2&gt;

&lt;p&gt;Qwen APIs also support an Anthropic-compatible route. The official release shows this shape for Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.7-max"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_SMALL_FAST_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"qwen3.7-max"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://dashscope-intl.aliyuncs.com/apps/anthropic
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your_api_key&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is important because Qwen 3.7 Max is meant to run inside coding assistants and agent scaffolds, not only inside direct chat completion calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing and Context
&lt;/h2&gt;

&lt;p&gt;The Qwen Cloud model card lists Qwen3.7-Max with:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context&lt;/td&gt;
&lt;td&gt;1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max input&lt;/td&gt;
&lt;td&gt;991.80K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max output&lt;/td&gt;
&lt;td&gt;65.53K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input price&lt;/td&gt;
&lt;td&gt;$2.50 per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output price&lt;/td&gt;
&lt;td&gt;$7.50 per 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RPM&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TPM&lt;/td&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Always confirm pricing in your actual provider console before committing production traffic. Providers can change price, quota, and region availability independently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Integration Tips
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Start with &lt;code&gt;qwen3.7-max&lt;/code&gt; in a staging environment.&lt;/li&gt;
&lt;li&gt;Use streaming for coding and agent UX.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;max_tokens&lt;/code&gt; intentionally instead of relying on the maximum output size.&lt;/li&gt;
&lt;li&gt;Log tool calls and final answers separately.&lt;/li&gt;
&lt;li&gt;Test &lt;code&gt;enable_thinking&lt;/code&gt; and &lt;code&gt;preserve_thinking&lt;/code&gt; only on workflows where they are likely to help.&lt;/li&gt;
&lt;li&gt;Compare qwen-3.7 against Qwen3.6-Plus on the same prompts before switching all traffic.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;The Qwen3.7-Max API is no longer just a watchlist item. The official materials now give a model alias, regional compatible-mode endpoints, thinking mode, preserve_thinking, and agent harness examples.&lt;/p&gt;

&lt;p&gt;For production work, treat qwen-3.7, qwen3.7, and qwen 3.7 API integration like any other hosted model migration: pin the model where possible, validate costs, test long-context behavior, and keep fallback routing until your own workloads pass.&lt;/p&gt;

&lt;p&gt;Related: &lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max benchmark&lt;/a&gt; and &lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max context window&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qwen.ai/blog?id=qwen3.7" rel="noopener noreferrer"&gt;Qwen Team, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.alibabacloud.com/blog/qwen3-7-the-agent-frontier_603154" rel="noopener noreferrer"&gt;Alibaba Cloud Community, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.qwencloud.com/models/qwen3.7-max" rel="noopener noreferrer"&gt;Qwen Cloud, Qwen3.7-Max model card&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original article: &lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;https://qwen35.com/blog/qwen3.7-max-api&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Qwen3.7-Max Benchmark: Agentic Coding, Reasoning, and Long-Horizon Scores</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 13:05:31 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-benchmark-agentic-coding-reasoning-and-long-horizon-scores-1n9b</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-benchmark-agentic-coding-reasoning-and-long-horizon-scores-1n9b</guid>
      <description>&lt;p&gt;Qwen3.7-Max is not being framed as a small chat refresh. The official Qwen3.7 release is built around agent work: coding, tool use, office automation, and long-horizon execution.&lt;/p&gt;

&lt;p&gt;That matters when reading any &lt;strong&gt;qwen-3.7 benchmark&lt;/strong&gt;, &lt;strong&gt;qwen3.7 benchmark&lt;/strong&gt;, or &lt;strong&gt;qwen 3.7 max benchmark&lt;/strong&gt; page. The headline is not only whether Qwen3.7-Max answers harder questions. The more useful question is whether it can keep a real task alive across tools, files, tests, and feedback.&lt;/p&gt;

&lt;p&gt;For the product overview, start with the &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;Qwen3.7-Max model page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmark Story Starts with Agentic Coding
&lt;/h2&gt;

&lt;p&gt;The official Qwen3.7-Max benchmark table puts a lot of weight on repository and terminal tasks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Qwen3.7-Max result&lt;/th&gt;
&lt;th&gt;What it suggests&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Terminal-Bench 2.0-Terminus&lt;/td&gt;
&lt;td&gt;69.7&lt;/td&gt;
&lt;td&gt;Strong terminal execution and repair loop behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Verified&lt;/td&gt;
&lt;td&gt;80.4&lt;/td&gt;
&lt;td&gt;Competitive repository-level bug fixing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Pro&lt;/td&gt;
&lt;td&gt;60.6&lt;/td&gt;
&lt;td&gt;Harder software engineering tasks beyond the standard set&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Multilingual&lt;/td&gt;
&lt;td&gt;78.3&lt;/td&gt;
&lt;td&gt;Cross-language coding and issue handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SciCode&lt;/td&gt;
&lt;td&gt;53.5&lt;/td&gt;
&lt;td&gt;Scientific coding and technical implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The important detail is the harness. Qwen says the SWE-Bench series used an internal agent scaffold with bash and file-edit tools, and Terminal-Bench used a 256K context setup with a five-hour timeout. Those conditions are closer to real agent operation than a single-turn coding prompt.&lt;/p&gt;

&lt;p&gt;So the right takeaway is not "Qwen 3.7 Max writes snippets." It is that qwen3.7 is being optimized and evaluated as a model that can operate inside a loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tool Use Is the Bigger Signal
&lt;/h2&gt;

&lt;p&gt;Several Qwen3.7-Max results are more interesting than classic coding scores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MCP-Mark: 60.8&lt;/li&gt;
&lt;li&gt;MCP-Atlas: 76.4&lt;/li&gt;
&lt;li&gt;SkillsBench: 59.2&lt;/li&gt;
&lt;li&gt;BFCL-V4: 75.0&lt;/li&gt;
&lt;li&gt;SpreadSheetBench-v1: 87.0&lt;/li&gt;
&lt;li&gt;Kernel Bench L3: 1.98x median speedup with a 96% win rate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That cluster says more about the release than a generic leaderboard rank. Qwen3.7-Max is being tested on whether it can call tools, work through agent harnesses, and produce useful results in environments where the answer is not already packaged into the prompt.&lt;/p&gt;

&lt;p&gt;This is also why the Qwen team emphasizes cross-harness generalization. Qwen3.7-Max is presented as working across Claude Code, OpenClaw, Qwen Code, and custom tool-use systems. If that holds up in production, it is more valuable than a model that only performs inside one carefully tuned demo shell.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 35-Hour Kernel Run Is the Release's Sharpest Demo
&lt;/h2&gt;

&lt;p&gt;The most memorable Qwen 3.7 Max benchmark is not a leaderboard row. It is the long autonomous kernel optimization run.&lt;/p&gt;

&lt;p&gt;In the official write-up, Qwen3.7-Max worked for about 35 hours on an unseen T-Head ZW-M890 hardware platform. It performed 432 kernel evaluations across 1,158 tool calls, then reached a 10.0x geometric mean speedup over the Triton reference.&lt;/p&gt;

&lt;p&gt;This is the clearest signal about what qwen-3.7 is trying to be. The point is not that every user will ask it to optimize kernels for a new chip. The point is that the model kept an execution strategy coherent after many tool calls, compile failures, profiling loops, and redesign attempts.&lt;/p&gt;

&lt;p&gt;That is the kind of behavior ordinary chat benchmarks rarely measure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning Scores Still Matter
&lt;/h2&gt;

&lt;p&gt;Qwen3.7-Max also has strong reasoning numbers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPQA Diamond&lt;/td&gt;
&lt;td&gt;92.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HLE&lt;/td&gt;
&lt;td&gt;41.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HMMT 2026 Feb&lt;/td&gt;
&lt;td&gt;97.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IMOAnswerBench&lt;/td&gt;
&lt;td&gt;90.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IFBench&lt;/td&gt;
&lt;td&gt;79.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WMT24++&lt;/td&gt;
&lt;td&gt;85.8&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These scores matter because agents still need reasoning. Tool use without judgment becomes noisy automation. The interesting part is that Qwen 3.7 Max combines reasoning results with agent execution results, rather than positioning the model as only a math or chat upgrade.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Test the Benchmark Claims Yourself
&lt;/h2&gt;

&lt;p&gt;Do not validate qwen3.7 with only a short prompt. Use tasks that expose the thing this release claims to improve:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Give it a real bug report plus logs and ask for an evidence-ranked fix plan.&lt;/li&gt;
&lt;li&gt;Ask it to compare two implementation paths and name the safer one.&lt;/li&gt;
&lt;li&gt;Give it a multi-file feature request and require tests before finalizing.&lt;/li&gt;
&lt;li&gt;Ask it to explain when it would call tools, when it would stop, and what it would verify.&lt;/li&gt;
&lt;li&gt;Run the same task on Qwen3.6-Plus or Qwen3.6-Max-Preview and compare failure recovery.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is the useful way to read a Qwen3.7-Max benchmark. The question is not only "did it score higher?" The question is "does it keep working when the task becomes messy?"&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Qwen3.7-Max benchmark results point to a model designed for agent workflows: coding agents, tool orchestration, long documents, office automation, and multi-hour execution.&lt;/p&gt;

&lt;p&gt;The scores are strong, but the release is most interesting because of the shape of the evaluation. qwen-3.7, qwen3.7, and qwen 3.7 Max are being judged less like ordinary chat models and more like systems that need to plan, act, observe, and recover.&lt;/p&gt;

&lt;p&gt;Next, read the &lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API guide&lt;/a&gt; or the &lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max context window guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qwen.ai/blog?id=qwen3.7" rel="noopener noreferrer"&gt;Qwen Team, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.alibabacloud.com/blog/qwen3-7-the-agent-frontier_603154" rel="noopener noreferrer"&gt;Alibaba Cloud Community, Qwen3.7: The Agent Frontier&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.qwencloud.com/models/qwen3.7-max" rel="noopener noreferrer"&gt;Qwen Cloud, Qwen3.7-Max model card&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original article: &lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;https://qwen35.com/blog/qwen3.7-max-benchmark&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Qwen3.7-Max and Agentic Coding: What to Watch First</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 13:02:55 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-and-agentic-coding-what-to-watch-first-1f8i</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-and-agentic-coding-what-to-watch-first-1f8i</guid>
      <description>&lt;p&gt;The most interesting thing about &lt;strong&gt;Qwen3.7-Max&lt;/strong&gt; is not that it is another newer model. The important signal is that Alibaba is presenting qwen-3.7, qwen3.7, and qwen 3.7 as a model family for agentic coding, complex reasoning, and long-running tool workflows.&lt;/p&gt;

&lt;p&gt;If you want the model overview first, start with the &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;Qwen3.7-Max page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why agentic coding matters
&lt;/h2&gt;

&lt;p&gt;Short coding prompts hide the difference between models. A model can write a function and still fail at planning a migration, reading a stack trace, choosing the next file to inspect, or recovering after a failing test.&lt;/p&gt;

&lt;p&gt;That is why qwen 3.7 should be evaluated with workflows, not toy prompts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ask it to inspect a real diff&lt;/li&gt;
&lt;li&gt;make it produce an implementation plan before editing&lt;/li&gt;
&lt;li&gt;include tests and failure criteria&lt;/li&gt;
&lt;li&gt;require tool-use decisions&lt;/li&gt;
&lt;li&gt;compare the final plan against a lighter Qwen model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Qwen3.7-Max will matter most if it can keep a long engineering thread intact.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is now confirmed
&lt;/h2&gt;

&lt;p&gt;The official Qwen3.7 materials now provide enough detail to move beyond a watchlist. Model Studio examples use &lt;code&gt;qwen3.7-max&lt;/code&gt;, Qwen Cloud lists the dated snapshot &lt;code&gt;qwen3.7-max-2026-05-20&lt;/code&gt;, and the model card shows a 1M context window.&lt;/p&gt;

&lt;p&gt;That makes the evaluation more concrete. The key question is no longer whether qwen-3.7 has an API path. The key question is whether Qwen 3.7 Max actually improves your agent workflow compared with Qwen3.6-Plus or Qwen3.6-Max-Preview.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical test prompts
&lt;/h2&gt;

&lt;p&gt;Use prompts that force the model to stay organized:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"Review this migration plan, identify the most likely production failure, and propose a safer sequence."&lt;/li&gt;
&lt;li&gt;"Given these logs and files, diagnose the bug, list evidence, and suggest the smallest patch."&lt;/li&gt;
&lt;li&gt;"Design an agent workflow that searches documentation, edits code, runs tests, and stops safely."&lt;/li&gt;
&lt;li&gt;"Compare Qwen3.7-Max with the current Qwen 3.6 option on this exact repo task."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is a better way to test qwen 3.7 than asking for a generic Python snippet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Qwen3.7-Max is an agentic-coding model first. Treat qwen-3.7 and qwen3.7 as serious new production candidates, but keep the final decision tied to official API documentation, cost checks, and your own long-running tests.&lt;/p&gt;

&lt;p&gt;Related: &lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max benchmark&lt;/a&gt;, &lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API&lt;/a&gt;, and &lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max context window&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Original links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Original article: &lt;a href="https://qwen35.com/blog/qwen3.7-max-agentic-coding" rel="noopener noreferrer"&gt;https://qwen35.com/blog/qwen3.7-max-agentic-coding&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>DeepSeek V4 Benchmark: Pro and Flash Scores</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 12:31:44 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-benchmark-pro-and-flash-scores-525j</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-benchmark-pro-and-flash-scores-525j</guid>
      <description>&lt;h1&gt;
  
  
  DeepSeek V4 Benchmark: Pro and Flash Scores
&lt;/h1&gt;

&lt;p&gt;The DeepSeek V4 release materials include benchmark rows for DeepSeek V4 Flash and DeepSeek V4 Pro in Max mode.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50vvm3yd07bcvk5qgckn.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F50vvm3yd07bcvk5qgckn.jpg" alt="DeepSeek V4 benchmark dashboard" width="799" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Benchmarks are useful as a first routing signal, but production defaults should still be decided with prompts from your own workload.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Official snapshot
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MMLU-Pro&lt;/th&gt;
&lt;th&gt;LiveCodeBench&lt;/th&gt;
&lt;th&gt;SWE Verified&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash&lt;/td&gt;
&lt;td&gt;86.2&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Pro&lt;/td&gt;
&lt;td&gt;87.5&lt;/td&gt;
&lt;td&gt;93.5&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Sources: &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek-V4-Pro model card&lt;/a&gt; and &lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf" rel="noopener noreferrer"&gt;DeepSeek_V4.pdf&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the numbers suggest
&lt;/h2&gt;

&lt;p&gt;Pro leads the snapshot, especially where reasoning and coding ceilings matter. Flash is close enough that it can be the default for many high-volume workflows, especially when the task can tolerate a second pass or escalation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to evaluate in production
&lt;/h2&gt;

&lt;p&gt;Do not ship on public benchmarks alone. Build a small internal eval set with your real prompts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;20 frequent user requests&lt;/li&gt;
&lt;li&gt;20 difficult edge cases&lt;/li&gt;
&lt;li&gt;20 code or reasoning tasks&lt;/li&gt;
&lt;li&gt;10 long-context tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run Flash first, Pro second, then compare correctness, latency, and cost. The best default is usually workload-specific.&lt;/p&gt;




&lt;p&gt;Source article: &lt;a href="https://deepseekv4.space/blog/deepseek-v4-benchmark?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=blog-en" rel="noopener noreferrer"&gt;Read the original post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Homepage: &lt;a href="https://deepseekv4.space/" rel="noopener noreferrer"&gt;Visit the site&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-pro" rel="noopener noreferrer"&gt;https://deepseekv4.space/deepseek-v4-pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-flash" rel="noopener noreferrer"&gt;https://deepseekv4.space/deepseek-v4-flash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks Guide</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 11:15:24 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-technical-report-architecture-training-and-benchmarks-guide-5do3</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-technical-report-architecture-training-and-benchmarks-guide-5do3</guid>
      <description>&lt;h1&gt;
  
  
  DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks
&lt;/h1&gt;

&lt;p&gt;The DeepSeek V4 technical report describes a preview V4 family with two Mixture-of-Experts language models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt;: 1.6T total parameters, 49B activated parameters, 1M context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt;: 284B total parameters, 13B activated parameters, 1M context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Primary sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek-V4-Pro model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf" rel="noopener noreferrer"&gt;DeepSeek_V4.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://api-docs.deepseek.com/quick_start/pricing/" rel="noopener noreferrer"&gt;DeepSeek API pricing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the technical report focuses on
&lt;/h2&gt;

&lt;p&gt;The report frames DeepSeek V4 around efficient long-context intelligence. The headline product implication is simple: both V4 Pro and V4 Flash expose a 1M-token context window, but they target different cost and capability envelopes.&lt;/p&gt;

&lt;p&gt;Pro is the higher-capacity model for hard reasoning, coding, and agentic workflows. Flash is the lower-cost model for high-volume chat, summarization, routing, and everyday product paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture notes
&lt;/h2&gt;

&lt;p&gt;The report highlights several architecture and optimization upgrades:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hybrid attention for long-context efficiency.&lt;/li&gt;
&lt;li&gt;Manifold-Constrained Hyper-Connections for stronger signal propagation.&lt;/li&gt;
&lt;li&gt;Muon optimizer for training stability and convergence.&lt;/li&gt;
&lt;li&gt;MoE scaling with separate Pro and Flash model sizes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60nybr3t9rb84icrqxq5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60nybr3t9rb84icrqxq5.jpg" alt="DeepSeek V4 report layers and evidence map" width="799" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Use the architecture section to decide what to measure, not as a substitute for measuring your own prompts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For builders, the practical question is not just which model has the larger parameter count. The question is where longer context, cache behavior, and reasoning effort change the cost-quality curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training and post-training
&lt;/h2&gt;

&lt;p&gt;DeepSeek says the V4 models are pre-trained on more than 32T tokens and then post-trained with a multi-stage process. The release materials describe domain-specific expert cultivation followed by model consolidation.&lt;/p&gt;

&lt;p&gt;That matters for product evaluation because one benchmark score is not enough. You should test domain tasks directly: code repair, long document synthesis, tool-use workflows, structured extraction, and high-volume support chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning modes
&lt;/h2&gt;

&lt;p&gt;The technical report and model card describe non-thinking, thinking, and max-thinking styles. In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use non-thinking mode for low-risk, fast, low-cost responses.&lt;/li&gt;
&lt;li&gt;Use thinking mode for math, coding, planning, and multi-step reasoning.&lt;/li&gt;
&lt;li&gt;Use max-style reasoning only when the added latency and cost are justified.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current DeepSeek API pricing page lists &lt;code&gt;deepseek-v4-flash&lt;/code&gt; and &lt;code&gt;deepseek-v4-pro&lt;/code&gt; as the V4 model IDs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark signals
&lt;/h2&gt;

&lt;p&gt;The release materials include benchmark snapshots across knowledge, coding, long-context, and agentic tasks. The site tracks a few practical anchor scores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MMLU-Pro&lt;/th&gt;
&lt;th&gt;LiveCodeBench&lt;/th&gt;
&lt;th&gt;SWE Verified&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash Max&lt;/td&gt;
&lt;td&gt;86.2&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Pro Max&lt;/td&gt;
&lt;td&gt;87.5&lt;/td&gt;
&lt;td&gt;93.5&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Treat these as routing hints, not final product truth. If your application depends on code changes, retrieval quality, or tool calls, build an eval set from your own traffic and compare Flash against Pro with the same prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation checklist
&lt;/h2&gt;

&lt;p&gt;Before adopting DeepSeek V4 in production, verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which workflows need Pro instead of Flash.&lt;/li&gt;
&lt;li&gt;Whether Thinking improves your specific task enough to justify the cost.&lt;/li&gt;
&lt;li&gt;How much prompt caching reduces repeated-context cost.&lt;/li&gt;
&lt;li&gt;Whether your longest real documents fit cleanly inside the 1M context window.&lt;/li&gt;
&lt;li&gt;Whether tool-use and JSON outputs are stable enough for your product contracts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The technical report explains the direction. Your own evals should decide routing, retry behavior, and credit pricing.&lt;/p&gt;




&lt;p&gt;Source article: &lt;a href="https://deepseekv4.space/blog/deepseek-v4-technical-report?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=blog-en" rel="noopener noreferrer"&gt;Read the original post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Homepage: &lt;a href="https://deepseekv4.space/" rel="noopener noreferrer"&gt;Visit the site&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-pro" rel="noopener noreferrer"&gt;https://deepseekv4.space/deepseek-v4-pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-flash" rel="noopener noreferrer"&gt;https://deepseekv4.space/deepseek-v4-flash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Qwen3.7-Max Launch Roundup for Open Builders: Benchmark, API, 1M Context, and Agentic Coding</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 10:55:28 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-for-open-builders-benchmark-api-1m-context-and-agentic-coding-2oa2</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-for-open-builders-benchmark-api-1m-context-and-agentic-coding-2oa2</guid>
      <description>&lt;p&gt;Qwen3.7-Max is being positioned as a flagship Qwen route for agentic coding, long-horizon execution, and complex reasoning. Instead of one generic launch recap, we published four focused guides that explain where the release matters in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the release changes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding:&lt;/strong&gt; Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks:&lt;/strong&gt; The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API integration:&lt;/strong&gt; The stable alias is &lt;code&gt;qwen3.7-max&lt;/code&gt;, with DashScope compatible-mode endpoints, thinking mode, and &lt;code&gt;preserve_thinking&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Start here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Qwen3.7-Max model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-agentic-coding" rel="noopener noreferrer"&gt;Qwen3.7-Max and Agentic Coding: What to Watch First&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max Benchmark: Agentic Coding, Reasoning, and Long-Horizon Scores&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API: How to Call Qwen 3.7 Max&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max Context Window: What 1M Tokens Changes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The short version is that Qwen3.7-Max should be evaluated like an agent model, not only a chat model. The real question is whether it can keep a plan intact across tools, files, tests, and long context better than Qwen3.6-Plus or Qwen3.6-Max-Preview on your own workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;run a real repo task, not a toy snippet&lt;/li&gt;
&lt;li&gt;compare planning quality before editing&lt;/li&gt;
&lt;li&gt;compare failure recovery after a bad intermediate step&lt;/li&gt;
&lt;li&gt;compare long-context document work and tool-using workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.&lt;/p&gt;

</description>
      <category>qwen</category>
    </item>
    <item>
      <title>Qwen3.7-Max Launch Roundup for Future Builders: Benchmark, API, 1M Context, and Agentic Coding</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 10:52:49 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-for-future-builders-benchmark-api-1m-context-and-agentic-coding-hd7</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-for-future-builders-benchmark-api-1m-context-and-agentic-coding-hd7</guid>
      <description>&lt;p&gt;Qwen3.7-Max is being positioned as a flagship Qwen route for agentic coding, long-horizon execution, and complex reasoning. Instead of one generic launch recap, we published four focused guides that explain where the release matters in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the release changes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding:&lt;/strong&gt; Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks:&lt;/strong&gt; The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API integration:&lt;/strong&gt; The stable alias is &lt;code&gt;qwen3.7-max&lt;/code&gt;, with DashScope compatible-mode endpoints, thinking mode, and &lt;code&gt;preserve_thinking&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Start here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Qwen3.7-Max model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-agentic-coding" rel="noopener noreferrer"&gt;Qwen3.7-Max and Agentic Coding: What to Watch First&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max Benchmark: Agentic Coding, Reasoning, and Long-Horizon Scores&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API: How to Call Qwen 3.7 Max&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max Context Window: What 1M Tokens Changes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The short version is that Qwen3.7-Max should be evaluated like an agent model, not only a chat model. The real question is whether it can keep a plan intact across tools, files, tests, and long context better than Qwen3.6-Plus or Qwen3.6-Max-Preview on your own workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;run a real repo task, not a toy snippet&lt;/li&gt;
&lt;li&gt;compare planning quality before editing&lt;/li&gt;
&lt;li&gt;compare failure recovery after a bad intermediate step&lt;/li&gt;
&lt;li&gt;compare long-context document work and tool-using workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.&lt;/p&gt;

</description>
      <category>qwen</category>
    </item>
    <item>
      <title>Qwen3.7-Max Launch Roundup: Benchmark, API, 1M Context, and Agentic Coding</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Fri, 22 May 2026 10:49:58 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-benchmark-api-1m-context-and-agentic-coding-op3</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/qwen37-max-launch-roundup-benchmark-api-1m-context-and-agentic-coding-op3</guid>
      <description>&lt;p&gt;Qwen3.7-Max is being positioned as a flagship Qwen route for agentic coding, long-horizon execution, and complex reasoning. Instead of one generic launch recap, we published four focused guides that explain where the release matters in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the release changes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agentic coding:&lt;/strong&gt; Qwen3.7-Max should be tested on multi-file changes, plans, tool use, and failure recovery, not toy prompts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks:&lt;/strong&gt; The strongest signals are Terminal-Bench, SWE-Verified, MCP-style tool use, and the long autonomous kernel optimization demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API integration:&lt;/strong&gt; The stable alias is &lt;code&gt;qwen3.7-max&lt;/code&gt;, with DashScope compatible-mode endpoints, thinking mode, and &lt;code&gt;preserve_thinking&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context window:&lt;/strong&gt; Qwen3.7-Max ships with 1M context, up to 991.80K input and 65.53K output, which matters for long documents, repos, and agent continuity.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Start here
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Homepage: &lt;a href="https://qwen35.com/" rel="noopener noreferrer"&gt;https://qwen35.com/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Qwen3.7-Max model page: &lt;a href="https://qwen35.com/qwen-3.7-max" rel="noopener noreferrer"&gt;https://qwen35.com/qwen-3.7-max&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Original guides
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-agentic-coding" rel="noopener noreferrer"&gt;Qwen3.7-Max and Agentic Coding: What to Watch First&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-benchmark" rel="noopener noreferrer"&gt;Qwen3.7-Max Benchmark: Agentic Coding, Reasoning, and Long-Horizon Scores&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-api" rel="noopener noreferrer"&gt;Qwen3.7-Max API: How to Call Qwen 3.7 Max&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qwen35.com/blog/qwen3.7-max-context-window" rel="noopener noreferrer"&gt;Qwen3.7-Max Context Window: What 1M Tokens Changes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The short version is that Qwen3.7-Max should be evaluated like an agent model, not only a chat model. The real question is whether it can keep a plan intact across tools, files, tests, and long context better than Qwen3.6-Plus or Qwen3.6-Max-Preview on your own workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;run a real repo task, not a toy snippet&lt;/li&gt;
&lt;li&gt;compare planning quality before editing&lt;/li&gt;
&lt;li&gt;compare failure recovery after a bad intermediate step&lt;/li&gt;
&lt;li&gt;compare long-context document work and tool-using workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We are using Q-Chat as a browser surface for trying the model and publishing deeper breakdowns as the release evolves.&lt;/p&gt;

</description>
      <category>qwen</category>
    </item>
    <item>
      <title>DeepSeek V4 vs Other Models: When Pro or Flash Makes Sense</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Tue, 28 Apr 2026 17:29:55 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-vs-other-models-when-pro-or-flash-makes-sense-5hc</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-vs-other-models-when-pro-or-flash-makes-sense-5hc</guid>
      <description>&lt;p&gt;DeepSeek V4 is best evaluated as a two-model family rather than one model.&lt;/p&gt;

&lt;p&gt;DeepSeek V4 Pro is the flagship path. DeepSeek V4 Flash is the efficient path. Both list 1M context in the current DeepSeek API pricing table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faj6gn1f0100zhcdljl7m.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faj6gn1f0100zhcdljl7m.jpg" alt="DeepSeek V4 routing comparison dashboard" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A comparison is only useful when it turns into a routing rule: default to the cheaper reliable path, then escalate when quality risk increases.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  V4 Pro vs V4 Flash
&lt;/h2&gt;

&lt;p&gt;Choose Pro when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The task needs the best available DeepSeek V4 benchmark ceiling.&lt;/li&gt;
&lt;li&gt;The prompt involves code repair, planning, math, or multi-step tools.&lt;/li&gt;
&lt;li&gt;A wrong answer is more expensive than a slower or pricier answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose Flash when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The task is high-volume.&lt;/li&gt;
&lt;li&gt;The output can be checked, retried, or escalated.&lt;/li&gt;
&lt;li&gt;You need 1M context but want lower input and output token costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparing to other model families
&lt;/h2&gt;

&lt;p&gt;Against other frontier models, DeepSeek V4 Pro should be tested on your hardest real workflows: coding, long-context reasoning, and agentic tasks.&lt;/p&gt;

&lt;p&gt;Against efficient models, DeepSeek V4 Flash is the more natural comparison because it keeps 1M context while using lower per-token prices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best routing pattern
&lt;/h2&gt;

&lt;p&gt;A practical routing setup is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Start with Flash for cheap comprehension and summaries.&lt;/li&gt;
&lt;li&gt;Escalate to Pro when the task is complex or user-visible.&lt;/li&gt;
&lt;li&gt;Add web search only when freshness matters.&lt;/li&gt;
&lt;li&gt;Add Thinking only when the task benefits from deeper reasoning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This keeps cost predictable while preserving quality for hard prompts.&lt;/p&gt;




&lt;p&gt;Source article: &lt;a href="https://deepseekv4.space/blog/deepseek-v4-vs-other-models?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=blog-en" rel="noopener noreferrer"&gt;Read the original post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Homepage: &lt;a href="https://deepseekv4.space/" rel="noopener noreferrer"&gt;Visit the site&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-pro" rel="noopener noreferrer"&gt;DeepSeek V4 Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-flash" rel="noopener noreferrer"&gt;DeepSeek V4 Flash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>DeepSeek V4 Technical Report: Architecture, Training, and Benchmarks</title>
      <dc:creator>Super Jarvis</dc:creator>
      <pubDate>Tue, 28 Apr 2026 17:29:08 +0000</pubDate>
      <link>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-technical-report-architecture-training-and-benchmarks-k03</link>
      <guid>https://dev.to/super_jarvis_76aa3fc6035d/deepseek-v4-technical-report-architecture-training-and-benchmarks-k03</guid>
      <description>&lt;p&gt;The DeepSeek V4 technical report describes a preview V4 family with two Mixture-of-Experts language models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Pro&lt;/strong&gt;: 1.6T total parameters, 49B activated parameters, 1M context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DeepSeek V4 Flash&lt;/strong&gt;: 284B total parameters, 13B activated parameters, 1M context.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Primary sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro" rel="noopener noreferrer"&gt;DeepSeek-V4-Pro model card&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf" rel="noopener noreferrer"&gt;DeepSeek_V4.pdf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://api-docs.deepseek.com/quick_start/pricing/" rel="noopener noreferrer"&gt;DeepSeek API pricing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What the technical report focuses on
&lt;/h2&gt;

&lt;p&gt;The report frames DeepSeek V4 around efficient long-context intelligence. The headline product implication is simple: both V4 Pro and V4 Flash expose a 1M-token context window, but they target different cost and capability envelopes.&lt;/p&gt;

&lt;p&gt;Pro is the higher-capacity model for hard reasoning, coding, and agentic workflows. Flash is the lower-cost model for high-volume chat, summarization, routing, and everyday product paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture notes
&lt;/h2&gt;

&lt;p&gt;The report highlights several architecture and optimization upgrades:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hybrid attention for long-context efficiency.&lt;/li&gt;
&lt;li&gt;Manifold-Constrained Hyper-Connections for stronger signal propagation.&lt;/li&gt;
&lt;li&gt;Muon optimizer for training stability and convergence.&lt;/li&gt;
&lt;li&gt;MoE scaling with separate Pro and Flash model sizes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60nybr3t9rb84icrqxq5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F60nybr3t9rb84icrqxq5.jpg" alt="DeepSeek V4 report layers and evidence map" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Use the architecture section to decide what to measure, not as a substitute for measuring your own prompts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For builders, the practical question is not just which model has the larger parameter count. The question is where longer context, cache behavior, and reasoning effort change the cost-quality curve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Training and post-training
&lt;/h2&gt;

&lt;p&gt;DeepSeek says the V4 models are pre-trained on more than 32T tokens and then post-trained with a multi-stage process. The release materials describe domain-specific expert cultivation followed by model consolidation.&lt;/p&gt;

&lt;p&gt;That matters for product evaluation because one benchmark score is not enough. You should test domain tasks directly: code repair, long document synthesis, tool-use workflows, structured extraction, and high-volume support chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reasoning modes
&lt;/h2&gt;

&lt;p&gt;The technical report and model card describe non-thinking, thinking, and max-thinking styles. In practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use non-thinking mode for low-risk, fast, low-cost responses.&lt;/li&gt;
&lt;li&gt;Use thinking mode for math, coding, planning, and multi-step reasoning.&lt;/li&gt;
&lt;li&gt;Use max-style reasoning only when the added latency and cost are justified.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The current DeepSeek API pricing page lists &lt;code&gt;deepseek-v4-flash&lt;/code&gt; and &lt;code&gt;deepseek-v4-pro&lt;/code&gt; as the V4 model IDs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark signals
&lt;/h2&gt;

&lt;p&gt;The release materials include benchmark snapshots across knowledge, coding, long-context, and agentic tasks. The site tracks a few practical anchor scores:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;MMLU-Pro&lt;/th&gt;
&lt;th&gt;LiveCodeBench&lt;/th&gt;
&lt;th&gt;SWE Verified&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Flash Max&lt;/td&gt;
&lt;td&gt;86.2&lt;/td&gt;
&lt;td&gt;91.6&lt;/td&gt;
&lt;td&gt;79.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V4 Pro Max&lt;/td&gt;
&lt;td&gt;87.5&lt;/td&gt;
&lt;td&gt;93.5&lt;/td&gt;
&lt;td&gt;80.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Treat these as routing hints, not final product truth. If your application depends on code changes, retrieval quality, or tool calls, build an eval set from your own traffic and compare Flash against Pro with the same prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation checklist
&lt;/h2&gt;

&lt;p&gt;Before adopting DeepSeek V4 in production, verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which workflows need Pro instead of Flash.&lt;/li&gt;
&lt;li&gt;Whether Thinking improves your specific task enough to justify the cost.&lt;/li&gt;
&lt;li&gt;How much prompt caching reduces repeated-context cost.&lt;/li&gt;
&lt;li&gt;Whether your longest real documents fit cleanly inside the 1M context window.&lt;/li&gt;
&lt;li&gt;Whether tool-use and JSON outputs are stable enough for your product contracts.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The technical report explains the direction. Your own evals should decide routing, retry behavior, and credit pricing.&lt;/p&gt;




&lt;p&gt;Source article: &lt;a href="https://deepseekv4.space/blog/deepseek-v4-technical-report?utm_source=devto&amp;amp;utm_medium=syndication&amp;amp;utm_campaign=blog-en" rel="noopener noreferrer"&gt;Read the original post&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Homepage: &lt;a href="https://deepseekv4.space/" rel="noopener noreferrer"&gt;Visit the site&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Model pages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-pro" rel="noopener noreferrer"&gt;DeepSeek V4 Pro&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://deepseekv4.space/deepseek-v4-flash" rel="noopener noreferrer"&gt;DeepSeek V4 Flash&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
  </channel>
</rss>
