<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Preecha</title>
    <description>The latest articles on DEV Community by Preecha (@preecha).</description>
    <link>https://dev.to/preecha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3891818%2Ffc0ea1ab-a477-4892-93a0-711e6f361ce2.png</url>
      <title>DEV Community: Preecha</title>
      <link>https://dev.to/preecha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/preecha"/>
    <language>en</language>
    <item>
      <title>How to use the GLM-5.1 API: complete guide with code examples</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Fri, 05 Jun 2026 13:02:32 +0000</pubDate>
      <link>https://dev.to/preecha/how-to-use-the-glm-51-api-complete-guide-with-code-examples-4dil</link>
      <guid>https://dev.to/preecha/how-to-use-the-glm-51-api-complete-guide-with-code-examples-4dil</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is available through the BigModel API at &lt;a href="https://open.bigmodel.cn/api/paas/v4/" rel="noopener noreferrer"&gt;https://open.bigmodel.cn/api/paas/v4/&lt;/a&gt;. The API is OpenAI-compatible: same endpoint structure, same request format, same streaming pattern. You need a BigModel account, an API key, and the model name &lt;code&gt;glm-5.1&lt;/code&gt;. This guide shows how to authenticate, send your first request, stream responses, handle tool calls, and test your integration with Apidog.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frwj1xywo0a4e75uuhqi6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frwj1xywo0a4e75uuhqi6.png" alt="Image" width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is Z.AI's flagship agentic model, released April 2026. It ranks #1 on SWE-Bench Pro and leads GLM-5 on every major coding benchmark. If you're building an AI coding assistant, autonomous agent, or application that needs long-horizon task execution, you can integrate GLM-5.1 through the BigModel API.&lt;/p&gt;

&lt;p&gt;The developer-friendly part: the API is OpenAI-compatible. If your app already uses GPT-style chat completions, you usually only need to change:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The base URL&lt;/li&gt;
&lt;li&gt;The model name&lt;/li&gt;
&lt;li&gt;The API key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing is the harder part for agentic workflows. A model-driven loop can run many tool calls over several minutes, and repeatedly testing against the live API consumes quota. Apidog's Smart Mock and Test Scenarios help you simulate normal completions, streaming responses, tool calls, and error states before production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before you start, prepare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A BigModel account at &lt;code&gt;bigmodel.cn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;An API key from the BigModel console under API Keys&lt;/li&gt;
&lt;li&gt;Python 3.8+ or Node.js 18+&lt;/li&gt;
&lt;li&gt;The OpenAI SDK, &lt;code&gt;requests&lt;/code&gt;, or &lt;code&gt;fetch&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Set your API key as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not hardcode API keys in source code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;p&gt;Every request requires a Bearer token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer YOUR_API_KEY
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BigModel API keys use a two-part format similar to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;xxxxxxxx.xxxxxxxxxxxxxxxx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This differs from OpenAI's &lt;code&gt;sk-&lt;/code&gt; prefix, but you use it the same way in the &lt;code&gt;Authorization&lt;/code&gt; header.&lt;/p&gt;

&lt;h2&gt;
  
  
  Base URL and endpoint
&lt;/h2&gt;

&lt;p&gt;Use this base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://open.bigmodel.cn/api/paas/v4/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The chat completions endpoint is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://open.bigmodel.cn/api/paas/v4/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Make your first request
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://open.bigmodel.cn/api/paas/v4/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes."
      }
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 2: Python with requests
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Option 3: OpenAI SDK
&lt;/h3&gt;

&lt;p&gt;Because GLM-5.1 is OpenAI-compatible, you can use the OpenAI SDK with a custom base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python function that finds all prime numbers up to n using the Sieve of Eratosthenes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is usually the simplest integration path if your app already uses OpenAI-compatible clients.&lt;/p&gt;

&lt;h2&gt;
  
  
  Response format
&lt;/h2&gt;

&lt;p&gt;The response structure follows the OpenAI chat completions format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-abc123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1744000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"def sieve_of_eratosthenes(n):&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;    ..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;215&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;247&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the assistant output from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Track the &lt;code&gt;usage&lt;/code&gt; field to monitor quota consumption. GLM-5.1 bills at 3x quota during peak hours, 14:00-18:00 UTC+8.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stream responses
&lt;/h2&gt;

&lt;p&gt;For long code generation or analysis tasks, enable streaming so your app can display tokens as they arrive.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain how a B-tree index works in a database, with a code example.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chunk contains only the newly generated delta. The final chunk includes a &lt;code&gt;finish_reason&lt;/code&gt;, such as &lt;code&gt;stop&lt;/code&gt; or &lt;code&gt;length&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming with raw requests
&lt;/h3&gt;

&lt;p&gt;If you do not want to use the OpenAI SDK, handle the server-sent event stream directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a merge sort in Python.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[DONE]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use tool calling
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 supports tool calling. The model can request function execution mid-conversation, which is useful for agents that need to run code, query systems, read files, call APIs, or perform actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Define tools
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execute Python code and return the output. Use this to test, profile, or benchmark code.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The Python code to execute&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Read the contents of a file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;File path to read&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a function to compute Fibonacci numbers, test it for n=10, and show me the output.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finish reason: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Tool called: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Arguments: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Execute tool calls and return results
&lt;/h3&gt;

&lt;p&gt;When the model returns &lt;code&gt;finish_reason: "tool_calls"&lt;/code&gt;, execute the requested tools and append their outputs as &lt;code&gt;tool&lt;/code&gt; messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Execute the tool and return the result.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
            &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdout&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;read_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;FileNotFoundError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error: file &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unknown tool: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the agent loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_agent_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Run a full agent loop with tool calling.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_message&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;glm-5.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;tool_choice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;choice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_dump&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;choice&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;finish_reason&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_calls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_calls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;tool_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tool_result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max iterations reached&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_agent_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a quicksort implementation, test it with a random list of 1000 integers, and report the time.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the standard loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send user request.&lt;/li&gt;
&lt;li&gt;Let the model decide whether to call tools.&lt;/li&gt;
&lt;li&gt;Execute requested tools.&lt;/li&gt;
&lt;li&gt;Return tool results.&lt;/li&gt;
&lt;li&gt;Continue until the model finishes.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Key parameters
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Default&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;required&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;"glm-5.1"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;required&lt;/td&gt;
&lt;td&gt;Conversation history&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;integer&lt;/td&gt;
&lt;td&gt;&lt;code&gt;1024&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Max tokens to generate, up to &lt;code&gt;163840&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;temperature&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.95&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Randomness. Lower values are more deterministic. Range: &lt;code&gt;0.0&lt;/code&gt;-&lt;code&gt;1.0&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;top_p&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Nucleus sampling. Z.AI recommends &lt;code&gt;0.7&lt;/code&gt; for coding tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;boolean&lt;/td&gt;
&lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Enable streaming responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tools&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;array&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Function definitions for tool calling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tool_choice&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string/object&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"auto"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;"auto"&lt;/code&gt;, &lt;code&gt;"none"&lt;/code&gt;, or a specific tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stop&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string/array&lt;/td&gt;
&lt;td&gt;&lt;code&gt;null&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Custom stop sequences&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Recommended settings for coding tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"top_p"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;163840&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Z.AI uses these settings in its benchmark evaluations. For more deterministic code generation, lower &lt;code&gt;temperature&lt;/code&gt; to &lt;code&gt;0.2&lt;/code&gt;-&lt;code&gt;0.4&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Use GLM-5.1 with coding assistants
&lt;/h2&gt;

&lt;p&gt;The Z.AI Coding Plan lets you route Claude Code, Cline, Kilo Code, and other AI coding assistants through GLM-5.1 via the BigModel API. This is useful if you want to test GLM-5.1 in an existing coding workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code setup
&lt;/h3&gt;

&lt;p&gt;In your Claude Code configuration file, such as &lt;code&gt;~/.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://open.bigmodel.cn/api/paas/v4/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your_bigmodel_api_key"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cline / Roo Code setup
&lt;/h3&gt;

&lt;p&gt;In your VS Code settings or Cline extension config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cline.apiProvider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cline.openAIBaseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://open.bigmodel.cn/api/paas/v4/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cline.openAIApiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your_bigmodel_api_key"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cline.openAIModelId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quota consumption
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 uses the Z.AI quota system rather than per-token billing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Peak hours, 14:00-18:00 UTC+8: 3x quota per request&lt;/li&gt;
&lt;li&gt;Off-peak: 2x quota per request&lt;/li&gt;
&lt;li&gt;Promotional rate through April 2026: 1x during off-peak&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For heavy agentic workloads, schedule long-running jobs during off-peak hours when possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test the GLM-5.1 API with Apidog
&lt;/h2&gt;

&lt;p&gt;Agentic integrations need to handle several response types:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard completions&lt;/li&gt;
&lt;li&gt;Streaming chunks&lt;/li&gt;
&lt;li&gt;Tool call requests&lt;/li&gt;
&lt;li&gt;Tool result messages&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;Server errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing every case against the real API consumes quota and depends on a live network connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxt99yzuso3v67x35fnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzxt99yzuso3v67x35fnz.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apidog's Smart Mock lets you define these response states and test your client without calling the real API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a mock endpoint
&lt;/h3&gt;

&lt;p&gt;In Apidog, create this endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://open.bigmodel.cn/api/paas/v4/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a mock expectation for a standard success response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-test123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1744000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"def sieve(n): ..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;152&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add another expectation for a tool call response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-tool456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1744000001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"call_abc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"function"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"run_python"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
              &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;code&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;print(2+2)&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;}"&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tool_calls"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;35&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;83&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a rate limit response with HTTP &lt;code&gt;429&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rate limit exceeded. Please retry after 60 seconds."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rate_limit_error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rate_limit_exceeded"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test the full agent loop
&lt;/h3&gt;

&lt;p&gt;Use Apidog Test Scenarios to chain requests together.&lt;/p&gt;

&lt;p&gt;Example scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send the initial &lt;code&gt;POST /chat/completions&lt;/code&gt; request.&lt;/li&gt;
&lt;li&gt;Assert status is &lt;code&gt;200&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assert &lt;code&gt;choices[0].finish_reason&lt;/code&gt; equals &lt;code&gt;"tool_calls"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Extract &lt;code&gt;choices[0].message.tool_calls[0].id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Send the next request with a &lt;code&gt;tool&lt;/code&gt; message containing the tool result.&lt;/li&gt;
&lt;li&gt;Assert status is &lt;code&gt;200&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assert &lt;code&gt;choices[0].finish_reason&lt;/code&gt; equals &lt;code&gt;"stop"&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Assert the final content contains the expected code or output.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This lets you test the agent loop without spending quota. You can also switch the mock to return &lt;code&gt;429&lt;/code&gt; and verify your retry logic.&lt;/p&gt;

&lt;p&gt;For multi-step workflows, use variables to pass values such as &lt;code&gt;request_id&lt;/code&gt; or &lt;code&gt;tool_call_id&lt;/code&gt; between steps. This mirrors a real agent loop and catches integration issues before production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Error handling
&lt;/h2&gt;

&lt;p&gt;The API returns standard HTTP status codes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;200&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Success&lt;/td&gt;
&lt;td&gt;Process response normally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;400&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bad request&lt;/td&gt;
&lt;td&gt;Check your request format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;401&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Unauthorized&lt;/td&gt;
&lt;td&gt;Verify your API key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;429&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Rate limit&lt;/td&gt;
&lt;td&gt;Retry after the &lt;code&gt;Retry-After&lt;/code&gt; header value&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;500&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server error&lt;/td&gt;
&lt;td&gt;Retry with exponential backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;503&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Service unavailable&lt;/td&gt;
&lt;td&gt;Retry with exponential backoff&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use retries for rate limits, timeouts, and transient server errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://open.bigmodel.cn/api/paas/v4/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;retry_after&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Retry-After&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
                &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limited. Waiting &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;retry_after&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;retry_after&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;

            &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Timeout on attempt &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. Retrying in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max retries exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For long agentic runs, use a generous timeout such as 120-300 seconds. Individual steps may take longer when the model generates complete code files or analyzes complex results.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;GLM-5.1's OpenAI-compatible API makes integration straightforward if you already use GPT-style chat completions. Update the base URL, use the &lt;code&gt;glm-5.1&lt;/code&gt; model name, and handle responses with the same chat completions structure.&lt;/p&gt;

&lt;p&gt;For agentic applications, focus on the loop: tool definitions, tool execution, streamed output, quota-aware retries, and mock-based testing. Apidog's Smart Mock and Test Scenarios help validate those paths before your agent runs against the live API.&lt;/p&gt;

&lt;p&gt;For background on what GLM-5.1 is and how its benchmarks compare, see the GLM-5.1 model overview. For more on building and testing AI agent workflows with Apidog, see how AI agent memory works.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is the GLM-5.1 API OpenAI-compatible?
&lt;/h3&gt;

&lt;p&gt;Yes. The request format, response structure, streaming protocol, and tool calling format are compatible with the OpenAI chat completions API. You can use the OpenAI Python SDK or another OpenAI-compatible client by setting the base URL to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://open.bigmodel.cn/api/paas/v4/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What model name should I use?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;glm-5.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not use a full versioned model name.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does GLM-5.1 API pricing work?
&lt;/h3&gt;

&lt;p&gt;The BigModel API uses a quota system. GLM-5.1 consumes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3x quota during peak hours, 14:00-18:00 UTC+8&lt;/li&gt;
&lt;li&gt;2x quota during off-peak hours&lt;/li&gt;
&lt;li&gt;1x quota during off-peak hours through the end of April 2026 as a promotional rate&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What is the maximum context length?
&lt;/h3&gt;

&lt;p&gt;GLM-5.1 supports a 200,000-token input context. Maximum output is 163,840 tokens. For long agentic runs, set &lt;code&gt;max_tokens&lt;/code&gt; to a large value such as &lt;code&gt;32768&lt;/code&gt; or higher to reduce the chance of truncating output mid-task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use GLM-5.1 for function calling or tool use?
&lt;/h3&gt;

&lt;p&gt;Yes. Define tools with &lt;code&gt;type: "function"&lt;/code&gt;, pass them in the &lt;code&gt;tools&lt;/code&gt; array, and handle responses where &lt;code&gt;finish_reason&lt;/code&gt; is &lt;code&gt;"tool_calls"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test GLM-5.1 API calls without spending quota?
&lt;/h3&gt;

&lt;p&gt;Use Apidog Smart Mock to define mock responses for success cases, tool calls, rate limits, and errors. Run your client or test suite against the mock endpoint during development, then use the real API for final validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where can I find the GLM-5.1 model weights?
&lt;/h3&gt;

&lt;p&gt;The open-source weights are on HuggingFace at &lt;code&gt;zai-org/GLM-5.1&lt;/code&gt;. They are released under the MIT License and support vLLM and SGLang for local inference.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to use GLM-5.1 with Claude Code: full setup guide</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Fri, 05 Jun 2026 01:01:54 +0000</pubDate>
      <link>https://dev.to/preecha/how-to-use-glm-51-with-claude-code-full-setup-guide-38j</link>
      <guid>https://dev.to/preecha/how-to-use-glm-51-with-claude-code-full-setup-guide-38j</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;You can use GLM-5.1 with Claude Code by routing Claude Code through the BigModel OpenAI-compatible API. Set the base URL to &lt;code&gt;https://open.bigmodel.cn/api/paas/v4/&lt;/code&gt;, use model name &lt;code&gt;glm-5.1&lt;/code&gt;, and authenticate with your BigModel API key. Once configured, Claude Code can use GLM-5.1 for coding tasks, repo exploration, refactoring, and longer agent-style workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Claude Code is a strong interface for AI-assisted coding, but the interface and the model are separate layers. If your Claude Code setup supports OpenAI-compatible providers, you can keep the same coding workflow while swapping the backend model.&lt;/p&gt;

&lt;p&gt;GLM-5.1 is worth testing in that setup. Z.AI released GLM-5.1 as its flagship model for agentic engineering, with published results including #1 on SWE-Bench Pro, a large improvement over GLM-5 on Terminal-Bench 2.0, and stronger long-horizon behavior on coding tasks that run for many iterations.&lt;/p&gt;

&lt;p&gt;If you already like how Claude Code handles files, tools, and iterative edits, this guide shows how to run GLM-5.1 behind that same interface.&lt;/p&gt;

&lt;p&gt;If you're comparing model backends for a coding workflow, &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=blog-sync"&gt;Apidog&lt;/a&gt; can help on the API side. You can document the BigModel endpoint, test OpenAI-compatible responses, and validate how your internal tooling handles different providers before wiring them into production systems.&lt;/p&gt;

&lt;p&gt;This guide covers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the exact Claude Code configuration values&lt;/li&gt;
&lt;li&gt;how the BigModel request path works&lt;/li&gt;
&lt;li&gt;a small validation workflow&lt;/li&gt;
&lt;li&gt;common setup issues&lt;/li&gt;
&lt;li&gt;when GLM-5.1 is worth using inside Claude Code&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why use GLM-5.1 with Claude Code?
&lt;/h2&gt;

&lt;p&gt;There are three practical reasons to try this setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Keep Claude Code's workflow, change the model
&lt;/h3&gt;

&lt;p&gt;Claude Code is useful because it can inspect files, propose edits, iterate on bugs, and stay inside a coding loop.&lt;/p&gt;

&lt;p&gt;If your setup supports custom OpenAI-compatible providers, you can keep that workflow while routing requests to GLM-5.1 instead of the default backend.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Test a model built for longer coding sessions
&lt;/h3&gt;

&lt;p&gt;GLM-5.1's strongest published results are focused on long-running, tool-heavy coding tasks rather than short answers. Z.AI showed improvements across hundreds of iterations and thousands of tool calls on optimization tasks.&lt;/p&gt;

&lt;p&gt;That maps well to Claude Code-style usage, where you usually run a coding session instead of asking one isolated prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Add another cost/performance option
&lt;/h3&gt;

&lt;p&gt;Depending on your workload, GLM-5.1 may be useful as another backend for coding-heavy sessions.&lt;/p&gt;

&lt;p&gt;The BigModel API uses quota rather than the usual per-token pricing pattern, so it can be worth comparing against Anthropic or OpenAI backends for your own usage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsk0rku7nw47otcpnfxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwsk0rku7nw47otcpnfxo.png" alt="Image" width="800" height="670"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For the full model overview and benchmark context, see what is GLM-5.1.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before configuring Claude Code, make sure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a BigModel account at &lt;code&gt;https://bigmodel.cn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a BigModel API key&lt;/li&gt;
&lt;li&gt;Claude Code installed locally&lt;/li&gt;
&lt;li&gt;a Claude Code build or configuration path that supports OpenAI-compatible custom providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important point: GLM-5.1 does not require a special GLM SDK. It works through BigModel's OpenAI-compatible API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration values
&lt;/h2&gt;

&lt;p&gt;You only need three core values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Base URL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://open.bigmodel.cn/api/paas/v4/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model name
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;glm-5.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Authorization header
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer YOUR_BIGMODEL_API_KEY
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything else depends on where your Claude Code setup expects provider settings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Create and store your BigModel API key
&lt;/h2&gt;

&lt;p&gt;Create an API key in the BigModel developer console.&lt;/p&gt;

&lt;p&gt;Then save it as an environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_api_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use &lt;code&gt;zsh&lt;/code&gt;, add it to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use &lt;code&gt;bash&lt;/code&gt;, add it to one of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.bashrc
~/.bash_profile
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Reload your shell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, for bash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the variable is available:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$BIGMODEL_API_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If nothing prints, Claude Code will not be able to authenticate with BigModel.&lt;/p&gt;

&lt;p&gt;Avoid hardcoding the key in project files. Environment variables are easier to rotate and less likely to be committed by accident.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Update Claude Code settings
&lt;/h2&gt;

&lt;p&gt;In many setups, Claude Code stores local settings in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;~/.claude/settings.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minimal OpenAI-compatible provider configuration looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://open.bigmodel.cn/api/paas/v4/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"your_bigmodel_api_key"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your Claude Code build supports environment variable expansion, prefer that instead of pasting the raw key.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://open.bigmodel.cn/api/paas/v4/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"apiKeyEnv"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BIGMODEL_API_KEY"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact field names can vary by Claude Code build, but the pattern is the same:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;provider mode: OpenAI-compatible&lt;/li&gt;
&lt;li&gt;base URL: BigModel&lt;/li&gt;
&lt;li&gt;model: &lt;code&gt;glm-5.1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;auth: your BigModel API key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already configured Claude Code for another OpenAI-compatible provider, this should be a small config change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Test the BigModel API directly
&lt;/h2&gt;

&lt;p&gt;Before debugging Claude Code, confirm the BigModel endpoint works with a raw request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://open.bigmodel.cn/api/paas/v4/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "Write a Python function that removes duplicate lines from a file."
      }
    ],
    "max_tokens": 2048,
    "temperature": 0.7
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test verifies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;your API key is valid&lt;/li&gt;
&lt;li&gt;the model name is correct&lt;/li&gt;
&lt;li&gt;the endpoint is reachable&lt;/li&gt;
&lt;li&gt;BigModel returns an OpenAI-style chat completion response&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why the Claude Code integration works: Claude Code only needs a backend that speaks the OpenAI-compatible chat completions format.&lt;/p&gt;

&lt;p&gt;For the full API walkthrough with Python and Node examples, see how to use the GLM-5.1 API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Run a small Claude Code validation task
&lt;/h2&gt;

&lt;p&gt;Do not start with a large repo. First, run a small task to validate the integration.&lt;/p&gt;

&lt;p&gt;Good first prompts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write a Python script that scans a folder for JSON files and prints invalid ones.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Refactor this function for readability and add tests.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read this file, explain what it does, and suggest two safe improvements.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You are checking four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude Code accepts the configuration&lt;/li&gt;
&lt;li&gt;BigModel authentication works&lt;/li&gt;
&lt;li&gt;GLM-5.1 returns responses in the expected format&lt;/li&gt;
&lt;li&gt;tool-use behavior inside Claude Code still works cleanly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those pass, move to a real repository task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Best tasks for GLM-5.1 inside Claude Code
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is most useful when the coding task benefits from iteration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Good fits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;bug fixing across multiple files&lt;/li&gt;
&lt;li&gt;repository exploration and summarization&lt;/li&gt;
&lt;li&gt;test generation&lt;/li&gt;
&lt;li&gt;test repair&lt;/li&gt;
&lt;li&gt;iterative refactoring&lt;/li&gt;
&lt;li&gt;performance tuning&lt;/li&gt;
&lt;li&gt;long-running agent loops&lt;/li&gt;
&lt;li&gt;benchmark-driven code improvement&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Less ideal fits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;pure writing tasks&lt;/li&gt;
&lt;li&gt;short factual questions&lt;/li&gt;
&lt;li&gt;very small one-shot edits&lt;/li&gt;
&lt;li&gt;workflows where Claude's native behavior is more important than the backend swap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The best use case is a sustained coding session where the model needs to inspect, edit, test, and iterate.&lt;/p&gt;

&lt;h2&gt;
  
  
  GLM-5.1 vs Claude inside Claude Code
&lt;/h2&gt;

&lt;p&gt;GLM-5.1 is not automatically better than Claude for every coding task.&lt;/p&gt;

&lt;p&gt;Claude still has strengths in reasoning-heavy edits, instruction following, and some repository navigation workflows. GLM-5.1 is worth benchmarking when your tasks look like SWE-Bench-style coding or long tool-driven sessions.&lt;/p&gt;

&lt;p&gt;To compare fairly, run both models on the same repository task and track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;code quality&lt;/li&gt;
&lt;li&gt;number of turns required&lt;/li&gt;
&lt;li&gt;test pass rate&lt;/li&gt;
&lt;li&gt;tool-use behavior&lt;/li&gt;
&lt;li&gt;latency&lt;/li&gt;
&lt;li&gt;cost or quota usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple comparison format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Metric | Claude | GLM-5.1 |
|---|---:|---:|
| Turns to solution |  |  |
| Tests passed |  |  |
| Manual fixes needed |  |  |
| Latency |  |  |
| Cost/quota usage |  |  |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If GLM-5.1 solves the same task with similar quality and lower effective cost, it may be a good backend option. If Claude consistently produces cleaner changes in your workflow, keep using Claude for those tasks.&lt;/p&gt;

&lt;p&gt;Side-by-side testing is more useful than model opinions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common problems and fixes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Authentication failed
&lt;/h3&gt;

&lt;p&gt;This usually means the API key is wrong or Claude Code is not reading it.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the key works in a raw &lt;code&gt;curl&lt;/code&gt; request&lt;/li&gt;
&lt;li&gt;the environment variable is loaded in the current shell&lt;/li&gt;
&lt;li&gt;the config file points to the correct key field&lt;/li&gt;
&lt;li&gt;the key has no trailing spaces&lt;/li&gt;
&lt;li&gt;JSON quotes are valid&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$BIGMODEL_API_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://open.bigmodel.cn/api/paas/v4/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$BIGMODEL_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "glm-5.1",
    "messages": [
      {
        "role": "user",
        "content": "Say hello"
      }
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Model not found
&lt;/h3&gt;

&lt;p&gt;Make sure the model name is exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;glm-5.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not use a longer or guessed version name.&lt;/p&gt;

&lt;h3&gt;
  
  
  Claude Code ignores the custom provider
&lt;/h3&gt;

&lt;p&gt;Some setups cache settings or require a restart after config changes.&lt;/p&gt;

&lt;p&gt;Try:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Save the config file.&lt;/li&gt;
&lt;li&gt;Restart Claude Code.&lt;/li&gt;
&lt;li&gt;Run a small test prompt.&lt;/li&gt;
&lt;li&gt;Confirm the provider settings are loaded from the expected config path.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Requests work, but output quality feels off
&lt;/h3&gt;

&lt;p&gt;This may be a task-fit issue rather than a setup issue.&lt;/p&gt;

&lt;p&gt;Try:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lowering temperature if your config allows it&lt;/li&gt;
&lt;li&gt;giving clearer repo-specific instructions&lt;/li&gt;
&lt;li&gt;asking for a plan before edits&lt;/li&gt;
&lt;li&gt;using GLM-5.1 on iterative coding tasks instead of general reasoning prompts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inspect the failing tests first. Do not edit files yet. Explain the likely root cause and list the files you need to inspect next.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then continue with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Apply the smallest safe fix, run the relevant tests, and summarize the diff.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quota drains too fast
&lt;/h3&gt;

&lt;p&gt;GLM-5.1 uses quota multipliers on BigModel. Peak hours cost more than off-peak.&lt;/p&gt;

&lt;p&gt;For long coding sessions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run heavy jobs off-peak when possible&lt;/li&gt;
&lt;li&gt;reduce unnecessary context&lt;/li&gt;
&lt;li&gt;start with smaller validation tasks&lt;/li&gt;
&lt;li&gt;avoid repeatedly sending large files unless needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing the integration with Apidog
&lt;/h2&gt;

&lt;p&gt;If you want to validate the setup outside Claude Code, &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=blog-sync"&gt;Apidog&lt;/a&gt; is useful for testing the BigModel endpoint directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweca3nc8eqwzzeqiufy2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweca3nc8eqwzzeqiufy2.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A practical workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Define the BigModel chat completions endpoint in Apidog.&lt;/li&gt;
&lt;li&gt;Save a request using model &lt;code&gt;glm-5.1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Send a normal completion request.&lt;/li&gt;
&lt;li&gt;Test error cases, such as invalid auth.&lt;/li&gt;
&lt;li&gt;Test rate-limit behavior if applicable.&lt;/li&gt;
&lt;li&gt;Mock the endpoint so internal tools can be tested without consuming quota.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://open.bigmodel.cn/api/paas/v4/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glm-5.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write a TypeScript function that validates an email address."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful if your team is building wrappers around AI coding tools or routing traffic between multiple model providers. With Apidog's Smart Mock and Test Scenarios, you can verify API behavior independently from the editor integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should you use GLM-5.1 with Claude Code?
&lt;/h2&gt;

&lt;p&gt;Use GLM-5.1 with Claude Code if you want to test a strong agentic coding model without changing your coding interface.&lt;/p&gt;

&lt;p&gt;It is especially worth trying if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;you already use Claude Code daily&lt;/li&gt;
&lt;li&gt;your tasks involve multi-step coding sessions&lt;/li&gt;
&lt;li&gt;you want another backend option&lt;/li&gt;
&lt;li&gt;you are cost sensitive&lt;/li&gt;
&lt;li&gt;you want to benchmark multiple models against the same coding loop&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Claude may still be the better fit for short editing help, careful reasoning, or workflows where its native behavior works best for you.&lt;/p&gt;

&lt;p&gt;But if you do sustained code work with iterative fixes and tool-heavy agent loops, GLM-5.1 is worth testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Using GLM-5.1 with Claude Code requires three main values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Base URL: https://open.bigmodel.cn/api/paas/v4/
Model: glm-5.1
Auth: Bearer YOUR_BIGMODEL_API_KEY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because BigModel exposes an OpenAI-compatible API, the integration is mostly a provider configuration change.&lt;/p&gt;

&lt;p&gt;The main reason to do this is practical benchmarking. Run GLM-5.1 on the same Claude Code tasks you already care about, compare the results, and decide whether it deserves a place in your backend options.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can Claude Code use GLM-5.1 directly?
&lt;/h3&gt;

&lt;p&gt;Yes, if your Claude Code setup supports OpenAI-compatible custom providers.&lt;/p&gt;

&lt;h3&gt;
  
  
  What base URL should I use?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://open.bigmodel.cn/api/paas/v4/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What model name should I enter?
&lt;/h3&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;glm-5.1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Do I need a special GLM SDK?
&lt;/h3&gt;

&lt;p&gt;No. GLM-5.1 works through the BigModel OpenAI-compatible API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use GLM-5.1 with other coding tools too?
&lt;/h3&gt;

&lt;p&gt;Yes. The same setup pattern works for tools like Cline, Roo Code, and OpenCode.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is GLM-5.1 better than Claude for all coding tasks?
&lt;/h3&gt;

&lt;p&gt;No. It depends on your workflow. The best way to decide is to run the same repository tasks through both and compare the results.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Claude Mythos: Anthropic says this model is too dangerous to release</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:02:09 +0000</pubDate>
      <link>https://dev.to/preecha/claude-mythos-anthropic-says-this-model-is-too-dangerous-to-release-5d4h</link>
      <guid>https://dev.to/preecha/claude-mythos-anthropic-says-this-model-is-too-dangerous-to-release-5d4h</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Claude Mythos Preview appears to be a restricted Anthropic model being tested through Project Glasswing, a cybersecurity-focused preview program rather than a public launch. Reported benchmark numbers suggest it could be far stronger than Claude Opus 4.6 on software engineering tasks, but Anthropic has not released it broadly. The likely reason is dual-use risk: a model that helps defenders may also help attackers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Every major AI lab says it takes safety seriously. Very few labs prove it by holding back a powerful model instead of pushing it into the market as fast as possible.&lt;/p&gt;

&lt;p&gt;That is what makes Claude Mythos Preview interesting. Anthropic has not announced it like a normal Claude release. There is no broad public API rollout, no standard chat product launch, and no public "try it now" page aimed at everyone. Instead, the model surfaced through reporting tied to Project Glasswing, a restricted program focused on defensive cybersecurity work.&lt;/p&gt;

&lt;p&gt;The benchmark numbers attached to Claude Mythos Preview make the story bigger. Reported results suggest a large jump over Claude Opus 4.6 on SWE-Bench-style coding tasks. If those numbers hold up, Anthropic may already have a model that materially changes the balance between offensive and defensive cyber capability.&lt;/p&gt;

&lt;p&gt;For developers building AI integrations, the practical takeaway is simple: do not assume the public model catalog is the full frontier. Design your tooling so you can test provider changes, restricted-access flows, model routing, and fallback behavior before a model becomes broadly available. Tools like &lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=blog-sync"&gt;Apidog&lt;/a&gt; can help teams mock future API endpoints and validate integration logic without waiting for full access.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Claude Mythos Preview?
&lt;/h2&gt;

&lt;p&gt;Based on current reporting, Claude Mythos Preview is an unreleased Anthropic model being made available only to selected defensive cybersecurity partners and researchers.&lt;/p&gt;

&lt;p&gt;That wording matters.&lt;/p&gt;

&lt;p&gt;This does not look like a standard Claude family launch such as Sonnet or Opus. It looks more like a controlled preview model with access restrictions tied to a narrow use case. Reuters reported that Anthropic is working with major partners including Amazon, Microsoft, Apple, Google, Nvidia, CrowdStrike, and Palo Alto Networks under Project Glasswing. The purpose is defensive cybersecurity research, not mass consumer access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5segp8y06r03jd8kb4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu5segp8y06r03jd8kb4p.png" alt="Image" width="800" height="268"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthydb0n6dkd5qaehqg1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthydb0n6dkd5qaehqg1g.png" alt="Image" width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The clearest current description is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Claude Mythos Preview appears to be a restricted-access Anthropic model for defensive security work, not a public Claude tier.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why developers are paying attention
&lt;/h2&gt;

&lt;p&gt;The reported benchmark numbers are unusually high.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Claude Mythos Preview&lt;/th&gt;
&lt;th&gt;Claude Opus 4.6&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Verified&lt;/td&gt;
&lt;td&gt;93.9%&lt;/td&gt;
&lt;td&gt;80.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SWE-Bench Pro&lt;/td&gt;
&lt;td&gt;77.8%&lt;/td&gt;
&lt;td&gt;53.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If accurate, this is not a small upgrade. It is a major jump.&lt;/p&gt;

&lt;p&gt;SWE-Bench matters because it is one of the clearest public proxies for real software engineering ability. It tests whether a model can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read a repository&lt;/li&gt;
&lt;li&gt;understand an issue&lt;/li&gt;
&lt;li&gt;identify the relevant files&lt;/li&gt;
&lt;li&gt;make correct code changes&lt;/li&gt;
&lt;li&gt;solve the task under realistic constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A jump of this size would suggest Anthropic has moved well beyond its previous public frontier in coding-heavy and agentic tasks.&lt;/p&gt;

&lt;p&gt;The key point is not only that Anthropic may have a stronger model. It is that Anthropic may already have that model and still be choosing not to release it publicly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Anthropic might be keeping Claude Mythos private
&lt;/h2&gt;

&lt;p&gt;The most likely explanation is dual-use risk.&lt;/p&gt;

&lt;p&gt;A model that helps defenders find vulnerabilities, analyze attack paths, review unsafe code, and automate remediation can also make offensive workflows easier. The same capability that helps a blue team patch systems faster can help a malicious actor move faster too.&lt;/p&gt;

&lt;p&gt;That tradeoff becomes sharper when a model improves at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repo-scale code understanding&lt;/li&gt;
&lt;li&gt;autonomous tool use&lt;/li&gt;
&lt;li&gt;vulnerability reproduction&lt;/li&gt;
&lt;li&gt;long-horizon problem solving&lt;/li&gt;
&lt;li&gt;chaining many actions without losing context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are exactly the capabilities developers want from coding agents. They are also exactly the capabilities that raise cybersecurity concerns.&lt;/p&gt;

&lt;p&gt;Anthropic has been signaling that frontier model releases may need more targeted rollout strategies. Claude Mythos Preview looks like a concrete example of that strategy: restrict first, learn from vetted users, then decide what broader access should look like.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Project Glasswing seems to mean
&lt;/h2&gt;

&lt;p&gt;Project Glasswing is the frame that makes the Mythos story make sense.&lt;/p&gt;

&lt;p&gt;The reported idea is not simply:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Here is a better model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is closer to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Here is a better model, but only trusted defensive partners can use it right now.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That changes the product story.&lt;/p&gt;

&lt;p&gt;Instead of a consumer launch, this looks more like a security preview program. Instead of growth being the main KPI, the main goal may be controlled evaluation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What can the model do for defenders?&lt;/li&gt;
&lt;li&gt;What misuse risks appear in practice?&lt;/li&gt;
&lt;li&gt;Are the release safeguards sufficient?&lt;/li&gt;
&lt;li&gt;Which workflows should remain restricted?&lt;/li&gt;
&lt;li&gt;Which capabilities can be safely exposed through public APIs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That may become the norm for models with strong cyber capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Claude Mythos stronger than Opus 4.6?
&lt;/h2&gt;

&lt;p&gt;Based on the reported benchmark numbers, it may be.&lt;/p&gt;

&lt;p&gt;But precision matters.&lt;/p&gt;

&lt;p&gt;What we can say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reported numbers suggest Claude Mythos Preview is significantly ahead of Opus 4.6 on SWE-Bench-style software engineering tasks.&lt;/li&gt;
&lt;li&gt;Anthropic is reportedly treating it as a higher-risk model.&lt;/li&gt;
&lt;li&gt;The model is not being rolled out like a normal public Claude release.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we cannot say with full certainty yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;that it is stronger than Opus 4.6 across every category&lt;/li&gt;
&lt;li&gt;that the comparison conditions were identical in every detail&lt;/li&gt;
&lt;li&gt;that public users would see the same gains in every workflow&lt;/li&gt;
&lt;li&gt;that a public API version would expose the same capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The careful version:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Claude Mythos Preview appears to be materially stronger than Claude Opus 4.6 on at least some important coding benchmarks, and strong enough that Anthropic may be restricting access because of the risks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is still a very big story.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for implementation planning
&lt;/h2&gt;

&lt;p&gt;Most developers cannot use Claude Mythos today. But the situation is still useful as a design signal.&lt;/p&gt;

&lt;p&gt;If frontier models may appear first through restricted programs, your AI integration should be built to handle unavailable, changing, or provider-specific models.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Do not hard-code model names everywhere
&lt;/h3&gt;

&lt;p&gt;Avoid scattering model IDs throughout your codebase.&lt;/p&gt;

&lt;p&gt;Instead, centralize model selection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ModelTier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;coding&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;security_review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fallback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modelMap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ModelTier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DEFAULT_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-public-default&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;coding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CODING_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-public-coding&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;security_review&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SECURITY_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-public-security&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;FALLBACK_MODEL&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-public-fallback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ModelTier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;modelMap&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then route by capability instead of directly depending on a specific model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;coding&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;aiClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Find and fix the bug described in this issue.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a restricted model becomes available later, you change configuration instead of rewriting application logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Add fallback behavior for restricted access
&lt;/h3&gt;

&lt;p&gt;Restricted models can fail for reasons unrelated to your code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;account not allowlisted&lt;/li&gt;
&lt;li&gt;region not supported&lt;/li&gt;
&lt;li&gt;endpoint not enabled&lt;/li&gt;
&lt;li&gt;policy guardrail triggered&lt;/li&gt;
&lt;li&gt;preview access revoked&lt;/li&gt;
&lt;li&gt;rate limit changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your integration should treat model access as dynamic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;runWithFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;aiClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;security_review&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
      &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;model_not_available&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;aiClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;getModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;fallback&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is useful for any provider, not just Anthropic.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Mock future model endpoints before you get access
&lt;/h3&gt;

&lt;p&gt;If a model is in limited preview, you may not be able to call it directly. You can still prepare your integration by mocking the expected API shape.&lt;/p&gt;

&lt;p&gt;Example OpenAPI-style mock for a model invocation endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;openapi&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;3.0.3&lt;/span&gt;
&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Restricted Model Preview API&lt;/span&gt;
  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0.1.0&lt;/span&gt;
&lt;span class="na"&gt;paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;/v1/messages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;post&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Create a model response&lt;/span&gt;
      &lt;span class="na"&gt;requestBody&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;application/json&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;object&lt;/span&gt;
              &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;model&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;messages&lt;/span&gt;
              &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
                  &lt;span class="na"&gt;example&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;restricted-preview-model&lt;/span&gt;
                &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;array&lt;/span&gt;
                  &lt;span class="na"&gt;items&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;object&lt;/span&gt;
                    &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;role&lt;/span&gt;
                      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;content&lt;/span&gt;
                    &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
                        &lt;span class="na"&gt;enum&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;assistant&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;system&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
                      &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                        &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;string&lt;/span&gt;
      &lt;span class="na"&gt;responses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;200"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Successful response&lt;/span&gt;
        &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;403"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Model access not allowed&lt;/span&gt;
        &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;429"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Rate limit exceeded&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets you test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;request construction&lt;/li&gt;
&lt;li&gt;auth headers&lt;/li&gt;
&lt;li&gt;retry behavior&lt;/li&gt;
&lt;li&gt;403/429 handling&lt;/li&gt;
&lt;li&gt;logging and observability&lt;/li&gt;
&lt;li&gt;fallback routing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Log model availability separately from model quality
&lt;/h3&gt;

&lt;p&gt;When evaluating AI systems, separate these two questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Did the model produce a good answer?&lt;/li&gt;
&lt;li&gt;Was the model actually available for this user, account, region, and workflow?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple event shape helps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ai_model_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"security_review"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"configured-model-id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fallback_used"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"model_not_available"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"latency_ms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1840&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That makes it easier to debug whether poor performance came from the model itself or from access restrictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this could mean for developers
&lt;/h2&gt;

&lt;p&gt;Three implications stand out.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Public Claude models may not reflect Anthropic's frontier ceiling
&lt;/h3&gt;

&lt;p&gt;Many developers assume the best public Claude model is close to Anthropic's best internal capability. Claude Mythos Preview suggests the gap may be larger than expected.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cyber capability may become the main release bottleneck
&lt;/h3&gt;

&lt;p&gt;The biggest constraint on release may not be model quality. It may be whether the model crosses a threshold where offensive misuse risk becomes too high.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The best models may arrive through restricted enterprise programs first
&lt;/h3&gt;

&lt;p&gt;Instead of seeing the strongest systems first in public chat apps, developers may see them inside:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;narrow partner networks&lt;/li&gt;
&lt;li&gt;industry pilots&lt;/li&gt;
&lt;li&gt;enterprise previews&lt;/li&gt;
&lt;li&gt;security-specific programs&lt;/li&gt;
&lt;li&gt;controlled API allowlists&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That affects roadmap planning, provider evaluation, and access-risk management.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this could mean for the AI industry
&lt;/h2&gt;

&lt;p&gt;Claude Mythos Preview may be less important as a product and more important as a signal.&lt;/p&gt;

&lt;p&gt;If Anthropic is willing to hold back a model because of cyber risk, other labs may do the same. That could create a two-track AI market:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;public models with broad access and heavier constraints&lt;/li&gt;
&lt;li&gt;restricted models with stronger capabilities and tighter access controls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That split would affect benchmarking too. A lab could have a much stronger system than the public has seen while still competing publicly with a safer, weaker release.&lt;/p&gt;

&lt;p&gt;It would also make it harder for outsiders to judge the true frontier from public APIs alone.&lt;/p&gt;

&lt;p&gt;From a policy perspective, this is the kind of case lawmakers and security researchers have been anticipating. The important question is not whether powerful models will exist. It is whether labs can create release mechanisms that preserve defensive value without making offensive misuse dramatically easier.&lt;/p&gt;

&lt;p&gt;Claude Mythos Preview may be an early high-profile example of a lab trying to solve that problem in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should developers care right now?
&lt;/h2&gt;

&lt;p&gt;Yes, but not because you need to switch tools tomorrow.&lt;/p&gt;

&lt;p&gt;You should care because this changes how you read model announcements.&lt;/p&gt;

&lt;p&gt;When a lab says a public model is its "best available" model, that may no longer mean it is the strongest model the lab has. It may only mean it is the strongest model the lab is willing to release widely.&lt;/p&gt;

&lt;p&gt;That is a different statement.&lt;/p&gt;

&lt;p&gt;You should also care because this affects competitive positioning across providers. If Anthropic is holding back a stronger coding model, then comparisons between public Claude, GPT, Gemini, GLM, and open-weight coding models may understate what private frontier systems can already do.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical checklist for AI integration teams
&lt;/h2&gt;

&lt;p&gt;If you are building AI-powered developer tools, security automation, or coding agents, use this checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Centralize model configuration.&lt;/li&gt;
&lt;li&gt;[ ] Route by capability, not hard-coded model ID.&lt;/li&gt;
&lt;li&gt;[ ] Add fallback behavior for unavailable models.&lt;/li&gt;
&lt;li&gt;[ ] Treat &lt;code&gt;403&lt;/code&gt;, &lt;code&gt;404&lt;/code&gt;, and &lt;code&gt;model_not_available&lt;/code&gt; as expected states.&lt;/li&gt;
&lt;li&gt;[ ] Mock restricted-access endpoints before preview access is granted.&lt;/li&gt;
&lt;li&gt;[ ] Log model availability separately from model quality.&lt;/li&gt;
&lt;li&gt;[ ] Keep benchmark results separate from production evaluation.&lt;/li&gt;
&lt;li&gt;[ ] Design for provider-specific policy and access controls.&lt;/li&gt;
&lt;li&gt;[ ] Avoid assuming public APIs expose the lab's strongest capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Claude Mythos Preview is not a normal product launch. It looks like a restricted Anthropic model that may be significantly stronger than Claude Opus 4.6 on software engineering tasks, and restricted enough that Anthropic appears unwilling to release it broadly.&lt;/p&gt;

&lt;p&gt;If the reported benchmarks are accurate, the headline is not just that Anthropic built a better model. The real headline is that Anthropic may already be operating in a world where some frontier models are too capable, or at least too risky, for immediate public release.&lt;/p&gt;

&lt;p&gt;For developers, the implementation lesson is clear: build AI systems that can handle restricted access, fast-changing model catalogs, provider-specific policies, and fallback routing.&lt;/p&gt;

&lt;p&gt;That may become the default architecture for working with frontier models.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude Mythos Preview?
&lt;/h3&gt;

&lt;p&gt;Based on current reporting, it is a restricted Anthropic preview model being tested with selected defensive cybersecurity partners rather than released publicly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Claude Mythos available to the public?
&lt;/h3&gt;

&lt;p&gt;No public general release has been announced. Current reporting suggests access is restricted through Project Glasswing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Claude Mythos stronger than Claude Opus 4.6?
&lt;/h3&gt;

&lt;p&gt;Reported benchmark numbers suggest it may be significantly stronger on SWE-Bench-style coding tasks, but that does not prove it is stronger across every category.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Project Glasswing?
&lt;/h3&gt;

&lt;p&gt;Project Glasswing appears to be Anthropic's restricted-access program for evaluating Claude Mythos Preview in defensive cybersecurity settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why would Anthropic refuse to release a stronger model?
&lt;/h3&gt;

&lt;p&gt;The likely reason is dual-use risk. A model that helps defenders automate code and security work can also make offensive misuse easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can developers use Claude Mythos today?
&lt;/h3&gt;

&lt;p&gt;Not broadly. At the moment, it appears to be limited to selected partners and researchers rather than public API users.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best AI image generators with no restrictions in 2026</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Thu, 04 Jun 2026 01:02:15 +0000</pubDate>
      <link>https://dev.to/preecha/best-ai-image-generators-with-no-restrictions-in-2026-4o9e</link>
      <guid>https://dev.to/preecha/best-ai-image-generators-with-no-restrictions-in-2026-4o9e</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;The only AI image generators with genuinely no restrictions are local tools: Stable Diffusion, FLUX, and ComfyUI running on your own hardware. Every cloud service, including Grok Imagine, Midjourney, and DALL-E, enforces a content policy at the model level. This guide compares both categories, explains what cloud tools typically filter, and shows how to set up a local image-generation pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Developers often ask the same question: which AI image generator actually has no restrictions?&lt;/p&gt;

&lt;p&gt;The practical answer is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloud generators always have content policies.&lt;/strong&gt; Some are more permissive than others, but none allow every prompt or output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local generators give you the most control.&lt;/strong&gt; When you run the model on your own machine, there is no hosted API, shared safety layer, or third-party service between your prompt and the generated image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guide covers both options. First, it breaks down what major cloud tools filter in practice. Then it walks through local setup options for Stable Diffusion, FLUX, and ComfyUI.&lt;/p&gt;

&lt;p&gt;If you are building an image-generation feature into your own app, you also need to test failure states such as content-policy rejections, rate limits, and timeouts. Apidog Smart Mock can simulate API responses like &lt;code&gt;400 content_policy_violation&lt;/code&gt; and &lt;code&gt;429 rate_limit_exceeded&lt;/code&gt;, so you can validate frontend behavior before calling a paid API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why every cloud generator has restrictions
&lt;/h2&gt;

&lt;p&gt;Cloud image generators run on shared infrastructure. A typical request like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /v1/images/generations
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;usually passes through at least two enforcement layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prompt filtering&lt;/strong&gt; before generation starts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output classification&lt;/strong&gt; before the image is returned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These checks run at the service or model-serving layer. They generally apply to every account, plan, and API key.&lt;/p&gt;

&lt;p&gt;The business reason is liability. Commercial providers restrict categories such as explicit sexual content, content involving minors, realistic non-consensual depictions of real people, and graphic violence.&lt;/p&gt;

&lt;p&gt;The technical reason is that filtering is usually not a per-user toggle. There is no general-purpose “admin mode” that disables moderation for one API customer.&lt;/p&gt;

&lt;p&gt;That is why local generation is the only practical option if your requirement is full control over the model execution path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud generators: what they actually filter
&lt;/h2&gt;

&lt;p&gt;The tools below are not “no-restrictions” generators. They differ mainly in how strict their filters are and which use cases they prioritize.&lt;/p&gt;

&lt;h3&gt;
  
  
  Grok Imagine
&lt;/h3&gt;

&lt;p&gt;Grok Imagine has been positioned as a more permissive mainstream cloud option than tools like DALL-E or Adobe Firefly, but it still applies safety filters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit sexual content&lt;/li&gt;
&lt;li&gt;Realistic depictions of real public figures in compromising situations&lt;/li&gt;
&lt;li&gt;Graphic violence with realistic gore&lt;/li&gt;
&lt;li&gt;Content involving minors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stylized violence in artistic or cinematic contexts&lt;/li&gt;
&lt;li&gt;Suggestive but non-explicit content&lt;/li&gt;
&lt;li&gt;Fictional characters in mature themes&lt;/li&gt;
&lt;li&gt;Dark, horror, or surreal imagery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;API shape:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.x.ai/v1/images/generations
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example model reference from the original article:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"grok-imagine-image"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same service-level filters apply through the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Good cloud option for mature artistic content, but not unrestricted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Midjourney
&lt;/h3&gt;

&lt;p&gt;Midjourney is strong for visual quality and artistic output. Its “stealth” mode affects visibility in public galleries, not content filtering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit sexual content&lt;/li&gt;
&lt;li&gt;Photorealistic depictions of real people in fictional sexual contexts&lt;/li&gt;
&lt;li&gt;Photo-realistic gore&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stylized nudity in artistic contexts&lt;/li&gt;
&lt;li&gt;Mature themes in clearly fictional settings&lt;/li&gt;
&lt;li&gt;Stylized violence&lt;/li&gt;
&lt;li&gt;Dark and horror themes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Strong artistic quality with moderate restrictions. Not an unrestricted option.&lt;/p&gt;

&lt;h3&gt;
  
  
  DALL-E 3
&lt;/h3&gt;

&lt;p&gt;DALL-E 3 is optimized for broad commercial safety and general-purpose creative use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit sexual content&lt;/li&gt;
&lt;li&gt;Suggestive content involving real people&lt;/li&gt;
&lt;li&gt;Realistic violence&lt;/li&gt;
&lt;li&gt;Broad categories interpreted as harmful or unsafe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prompts involving weapons, drugs, or controversial topics may be rejected even when the intent is educational or journalistic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;General creative imagery&lt;/li&gt;
&lt;li&gt;Fantasy and sci-fi scenes&lt;/li&gt;
&lt;li&gt;Stylized characters&lt;/li&gt;
&lt;li&gt;Marketing and product concepts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Best for safe commercial and general creative work, not edge-case prompting.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adobe Firefly
&lt;/h3&gt;

&lt;p&gt;Adobe Firefly is designed for commercial-safe creative workflows and licensed-content positioning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nudity&lt;/li&gt;
&lt;li&gt;Sexual content&lt;/li&gt;
&lt;li&gt;Violence&lt;/li&gt;
&lt;li&gt;Controversial political content&lt;/li&gt;
&lt;li&gt;Broad unsafe-content categories&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product photography&lt;/li&gt;
&lt;li&gt;Marketing imagery&lt;/li&gt;
&lt;li&gt;Commercial-safe creative assets&lt;/li&gt;
&lt;li&gt;Text-in-image workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Use it when commercial safety matters more than flexibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Leonardo AI
&lt;/h3&gt;

&lt;p&gt;Leonardo AI is more permissive than many mainstream cloud providers for mature artistic content, especially on paid plans where additional content settings may be available.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit sexual content on default settings&lt;/li&gt;
&lt;li&gt;Content that violates platform policy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mature creative content within policy limits&lt;/li&gt;
&lt;li&gt;Wider stylistic range than stricter platforms&lt;/li&gt;
&lt;li&gt;Community model workflows depending on settings&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; One of the more flexible cloud options, but still not uncensored.&lt;/p&gt;

&lt;h3&gt;
  
  
  Ideogram
&lt;/h3&gt;

&lt;p&gt;Ideogram is strongest when you need text rendered inside images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typically blocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit content&lt;/li&gt;
&lt;li&gt;Real-person deepfakes&lt;/li&gt;
&lt;li&gt;Violence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typically allowed:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;General creative content&lt;/li&gt;
&lt;li&gt;Text-heavy designs&lt;/li&gt;
&lt;li&gt;Posters, logos, and typography-focused images&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Verdict:&lt;/strong&gt; Useful for text-in-image generation, not relevant if your main requirement is unrestricted output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Generator&lt;/th&gt;
&lt;th&gt;Restriction level&lt;/th&gt;
&lt;th&gt;NSFW option&lt;/th&gt;
&lt;th&gt;Price from original article&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Grok Imagine&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$30/month SuperGrok&lt;/td&gt;
&lt;td&gt;Mature artistic content, API access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Midjourney&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$10-$120/month&lt;/td&gt;
&lt;td&gt;Artistic quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Leonardo AI&lt;/td&gt;
&lt;td&gt;Moderate with paid settings&lt;/td&gt;
&lt;td&gt;Yes on paid plans&lt;/td&gt;
&lt;td&gt;Free-$48/month&lt;/td&gt;
&lt;td&gt;Mature creative content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DALL-E 3&lt;/td&gt;
&lt;td&gt;Strict&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$20/month ChatGPT Plus&lt;/td&gt;
&lt;td&gt;Commercial and marketing imagery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adobe Firefly&lt;/td&gt;
&lt;td&gt;Very strict&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;$5-$55/month&lt;/td&gt;
&lt;td&gt;Commercial-safe content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideogram&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Free-$16/month&lt;/td&gt;
&lt;td&gt;Text-in-image&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stable Diffusion local&lt;/td&gt;
&lt;td&gt;None at service layer&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Hardware cost&lt;/td&gt;
&lt;td&gt;Full local control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX local&lt;/td&gt;
&lt;td&gt;None at service layer&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Hardware cost&lt;/td&gt;
&lt;td&gt;Full local control, high quality&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Local generation: the actual no-restrictions option
&lt;/h2&gt;

&lt;p&gt;Running a model locally means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model runs on your own machine.&lt;/li&gt;
&lt;li&gt;Prompts are not sent to a third-party image API.&lt;/li&gt;
&lt;li&gt;No hosted service applies prompt or output moderation.&lt;/li&gt;
&lt;li&gt;You control the model, weights, workflow, and runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tradeoff is hardware. You need enough GPU memory for the model and resolution you want.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;VRAM needed&lt;/th&gt;
&lt;th&gt;Approx. generation speed on RTX 3080&lt;/th&gt;
&lt;th&gt;Quality tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SDXL Turbo&lt;/td&gt;
&lt;td&gt;6GB&lt;/td&gt;
&lt;td&gt;~1 second per image&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SDXL 1.0&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;td&gt;15-30 seconds&lt;/td&gt;
&lt;td&gt;Very good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1-schnell&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;td&gt;3-5 seconds&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1-dev&lt;/td&gt;
&lt;td&gt;12GB&lt;/td&gt;
&lt;td&gt;20-40 seconds&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX.1-pro via API&lt;/td&gt;
&lt;td&gt;N/A cloud&lt;/td&gt;
&lt;td&gt;~8 seconds&lt;/td&gt;
&lt;td&gt;Best&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Apple Silicon Macs can run local workflows using the MPS backend. Performance is slower than comparable NVIDIA GPUs, but usable for many workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Set up Stable Diffusion locally
&lt;/h2&gt;

&lt;p&gt;Stable Diffusion is the most established local image-generation stack. AUTOMATIC1111 WebUI gives you a browser-based interface that runs on your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Install or prepare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.10 or 3.11&lt;/li&gt;
&lt;li&gt;NVIDIA GPU with 8GB+ VRAM, or Apple Silicon Mac&lt;/li&gt;
&lt;li&gt;At least 20GB free disk space&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Install on Windows or Linux with NVIDIA GPU
&lt;/h3&gt;

&lt;p&gt;Clone the WebUI repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
&lt;span class="nb"&gt;cd &lt;/span&gt;stable-diffusion-webui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the launcher.&lt;/p&gt;

&lt;p&gt;Linux or macOS:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./webui.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Windows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight batchfile"&gt;&lt;code&gt;&lt;span class="kd"&gt;webui&lt;/span&gt;&lt;span class="na"&gt;-user&lt;/span&gt;.bat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first launch downloads dependencies and the default model. After startup, open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://127.0.0.1:7860
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install on Apple Silicon Mac
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
&lt;span class="nb"&gt;cd &lt;/span&gt;stable-diffusion-webui
./webui.sh &lt;span class="nt"&gt;--skip-torch-cuda-test&lt;/span&gt; &lt;span class="nt"&gt;--precision&lt;/span&gt; full &lt;span class="nt"&gt;--no-half&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Load a model
&lt;/h3&gt;

&lt;p&gt;Download a model from Hugging Face or CivitAI, then place it here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stable-diffusion-webui/models/Stable-diffusion/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart the WebUI and select the model from the dropdown.&lt;/p&gt;

&lt;p&gt;Many community fine-tunes are SDXL-based and provide better quality than older SD 1.5 workflows. Always check the license for any model or fine-tune you use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Generate images through the local AUTOMATIC1111 API
&lt;/h2&gt;

&lt;p&gt;AUTOMATIC1111 can expose a local REST API. This is useful if you want to build your own app, CLI, or backend service around a local model.&lt;/p&gt;

&lt;p&gt;Example Python request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:7860/sdapi/v1/txt2img&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your prompt here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative_prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;low quality, blurry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;steps&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;width&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;height&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cfg_scale&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;b64decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No API key is required.&lt;/li&gt;
&lt;li&gt;No external rate limit applies.&lt;/li&gt;
&lt;li&gt;The request stays on your machine.&lt;/li&gt;
&lt;li&gt;No cloud content filter sits in the request path.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Set up FLUX locally
&lt;/h2&gt;

&lt;p&gt;FLUX from Black Forest Labs produces sharp, high-quality output in many workflows. FLUX.1-schnell is the fastest variant and is open for commercial and personal use according to the original article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run FLUX with &lt;code&gt;diffusers&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;diffusers torch transformers accelerate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generate an image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;diffusers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FluxPipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;black-forest-labs/FLUX.1-schnell&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;torch_dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# use "mps" for Apple Silicon
&lt;/span&gt;
&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a photorealistic portrait of a red fox in a forest at dawn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;height&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_inference_steps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_sequence_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;guidance_scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fox.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;FLUX.1-schnell&lt;/code&gt; uses a small number of inference steps.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;cuda&lt;/code&gt; for NVIDIA GPUs.&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;mps&lt;/code&gt; for Apple Silicon.&lt;/li&gt;
&lt;li&gt;First run downloads the model weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Run FLUX with ComfyUI
&lt;/h3&gt;

&lt;p&gt;ComfyUI is recommended if you want advanced workflows, node-based editing, and reusable generation graphs.&lt;/p&gt;

&lt;p&gt;Install ComfyUI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/comfyanonymous/ComfyUI
&lt;span class="nb"&gt;cd &lt;/span&gt;ComfyUI
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
python main.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Download FLUX model weights from Hugging Face.&lt;/li&gt;
&lt;li&gt;Place them in one of the supported ComfyUI model directories, such as:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ComfyUI/models/unet/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ComfyUI/models/diffusion_models/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Start ComfyUI.&lt;/li&gt;
&lt;li&gt;Import a community workflow JSON or build your own graph.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ComfyUI is a good choice when you need repeatable pipelines, image-to-image flows, ControlNet-style conditioning, or multiple model stages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test image-generation APIs with Apidog mocks
&lt;/h2&gt;

&lt;p&gt;If you are integrating an image API, do not only test the happy path. Your application should handle at least these states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;200&lt;/code&gt; successful generation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;400&lt;/code&gt; content policy rejection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;429&lt;/code&gt; rate limit&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;503&lt;/code&gt; model overload or timeout&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing every branch against a real provider can waste credits and make local development slow. Instead, create mock responses first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: mock a Grok-style image endpoint
&lt;/h3&gt;

&lt;p&gt;Create this endpoint in Apidog:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.x.ai/v1/images/generations
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a mock expectation for a successful response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1710000000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/test-image.png"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add another mock expectation that matches a test keyword, such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;trigger_policy_error
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Return a &lt;code&gt;400&lt;/code&gt; response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your request was rejected as a result of our safety system."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"invalid_request_error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"content_policy_violation"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now your frontend can verify that it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shows the correct user-facing message.&lt;/li&gt;
&lt;li&gt;Does not retry policy errors forever.&lt;/li&gt;
&lt;li&gt;Logs the error for debugging.&lt;/li&gt;
&lt;li&gt;Keeps the generation UI in a valid state.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mock rate limits
&lt;/h3&gt;

&lt;p&gt;Add another mock response with status &lt;code&gt;429&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rate limit exceeded. Please try again later."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rate_limit_error"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rate_limit_exceeded"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your client should handle this differently from a policy rejection. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;shouldRetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;content_policy_violation&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mock AUTOMATIC1111 locally
&lt;/h3&gt;

&lt;p&gt;You can also mock the AUTOMATIC1111 response shape before your GPU environment is ready.&lt;/p&gt;

&lt;p&gt;Example mock response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"images"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"BASE64_IMAGE_DATA_HERE"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"steps"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"width"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"height"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{}"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This lets frontend developers build the UI while the model runtime is still being configured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which option should you choose?
&lt;/h2&gt;

&lt;p&gt;Use this decision path.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need cloud generation with fewer restrictions
&lt;/h3&gt;

&lt;p&gt;Start with:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leonardo AI paid plan with available mature-content settings.&lt;/li&gt;
&lt;li&gt;Grok Imagine through SuperGrok or API access.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both are more permissive than DALL-E or Firefly for mature artistic content, but neither is unrestricted.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need genuinely no service-level restrictions and have a GPU
&lt;/h3&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;FLUX.1-schnell&lt;/code&gt; with &lt;code&gt;diffusers&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;FLUX through ComfyUI&lt;/li&gt;
&lt;li&gt;SDXL through AUTOMATIC1111&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you full local control over the generation pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  You want the easiest local setup
&lt;/h3&gt;

&lt;p&gt;Use AUTOMATIC1111 with an SDXL-based model.&lt;/p&gt;

&lt;p&gt;It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browser UI&lt;/li&gt;
&lt;li&gt;Local REST API&lt;/li&gt;
&lt;li&gt;Large community support&lt;/li&gt;
&lt;li&gt;Many compatible fine-tunes and extensions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  You use a Mac without a discrete GPU
&lt;/h3&gt;

&lt;p&gt;Use FLUX or Stable Diffusion with the MPS backend on Apple Silicon.&lt;/p&gt;

&lt;p&gt;Expect slower generation than NVIDIA CUDA, but the workflow is functional for local experimentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  You need commercial-safe cloud generation
&lt;/h3&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adobe Firefly&lt;/li&gt;
&lt;li&gt;DALL-E 3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These tools are built for safer commercial workflows and stricter policy enforcement.&lt;/p&gt;

&lt;h3&gt;
  
  
  You are building an image-generation product
&lt;/h3&gt;

&lt;p&gt;Before calling the real API, mock:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Success responses&lt;/li&gt;
&lt;li&gt;Policy rejections&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;Timeouts&lt;/li&gt;
&lt;li&gt;Provider-specific error formats&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This keeps frontend and backend development moving without burning credits on every test run.&lt;/p&gt;

&lt;p&gt;Hypereal is a hosted inference platform that gives API access to many of the same open models you would run locally, including image and video models. It sits between fully local generation and large cloud providers if you want open-model endpoints without managing GPUs yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttg7p59t8i5kv9y999a9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fttg7p59t8i5kv9y999a9.png" alt="Image" width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;No cloud image generator gives you genuinely unrestricted generation. Grok Imagine and Leonardo AI are among the more permissive cloud options for mature artistic content, but they still enforce platform policies.&lt;/p&gt;

&lt;p&gt;If your requirement is full control, run the model locally. Stable Diffusion, FLUX, and ComfyUI work on consumer hardware, have active communities, and support practical developer workflows through local APIs and reusable pipelines.&lt;/p&gt;

&lt;p&gt;The setup takes some effort, but after installation your main limits are hardware, model choice, and licensing.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Which AI image generator has no restrictions at all?
&lt;/h3&gt;

&lt;p&gt;Only local tools such as Stable Diffusion, FLUX, and ComfyUI running on your own hardware avoid cloud service-level content policies. Cloud services enforce restrictions through their hosted APIs and model-serving layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Grok Imagine still free in 2026?
&lt;/h3&gt;

&lt;p&gt;According to the original article, no. xAI removed the free tier on March 19, 2026, and image generation requires SuperGrok at $30/month.&lt;/p&gt;

&lt;h3&gt;
  
  
  What GPU do I need for local AI image generation?
&lt;/h3&gt;

&lt;p&gt;For SDXL or FLUX.1-schnell, use an NVIDIA GPU with at least 8GB VRAM, such as an RTX 3060 or better. FLUX.1-dev and heavier workflows benefit from 12GB+ VRAM, such as an RTX 3080 or better.&lt;/p&gt;

&lt;p&gt;Apple Silicon Macs can run local generation through the MPS backend, but performance is slower than comparable NVIDIA hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is it legal to run unrestricted local image generation?
&lt;/h3&gt;

&lt;p&gt;Running local models may be legal, but what you generate is your responsibility under the laws of your jurisdiction. Content involving real people without consent, minors, or other prohibited categories can carry legal risk regardless of whether a filter blocks it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use local image-generation models commercially?
&lt;/h3&gt;

&lt;p&gt;It depends on the model license.&lt;/p&gt;

&lt;p&gt;From the original article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FLUX.1-schnell uses Apache 2.0 and allows commercial use.&lt;/li&gt;
&lt;li&gt;FLUX.1-dev is non-commercial only.&lt;/li&gt;
&lt;li&gt;Stable Diffusion base models such as SD 1.5 and SDXL generally allow commercial use.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Always check the license for the exact model and fine-tune you use.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the best free AI image generator with the fewest restrictions?
&lt;/h3&gt;

&lt;p&gt;For cloud tools, Ideogram and Leonardo AI free tiers are among the more permissive free options mentioned in the original article.&lt;/p&gt;

&lt;p&gt;For local generation, FLUX.1-schnell with ComfyUI or &lt;code&gt;diffusers&lt;/code&gt; is a strong option if you have compatible hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test an image-generation API without spending credits?
&lt;/h3&gt;

&lt;p&gt;Use Apidog Smart Mock to define fake responses for each state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;200&lt;/code&gt; success&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;400&lt;/code&gt; content policy rejection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;429&lt;/code&gt; rate limit&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;503&lt;/code&gt; timeout or overload&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Point your frontend at the mock during development, then switch to the real provider for final integration testing.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How to build long-running AI agents with Claude ?</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Wed, 03 Jun 2026 13:02:11 +0000</pubDate>
      <link>https://dev.to/preecha/how-to-build-long-running-ai-agents-with-claude--2llj</link>
      <guid>https://dev.to/preecha/how-to-build-long-running-ai-agents-with-claude--2llj</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Claude Managed Agents is Anthropic's hosted runtime for production agents. It provides sandboxed execution, long-running sessions, scoped permissions, tracing, and optional multi-agent coordination without requiring your team to build that infrastructure from scratch. If your agent needs to call internal tools, third-party APIs, or long workflows, Apidog helps you validate those tool contracts before you let an agent touch real systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Claude Managed Agents targets one of the biggest reasons agent projects stall: the runtime is harder to ship than the prompt.&lt;/p&gt;

&lt;p&gt;Anthropic now offers a hosted way to run long-lived agents with sandboxing, permissions, tracing, and session persistence built in. That means teams can spend less time building queues, workers, session storage, retry logic, and observability, and more time shipping useful workflows.&lt;/p&gt;

&lt;p&gt;For API teams, the hard part is no longer just whether Claude can reason through a task. The hard part is whether the agent can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;call the right tools safely&lt;/li&gt;
&lt;li&gt;handle malformed responses&lt;/li&gt;
&lt;li&gt;recover from failed API calls&lt;/li&gt;
&lt;li&gt;respect permission boundaries&lt;/li&gt;
&lt;li&gt;keep working when a task runs longer than a normal chat request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you plan to expose internal APIs or tool endpoints to an agent, test that surface before launch. Apidog gives you a direct way to mock tool endpoints, validate JSON Schema, chain multi-step test scenarios, and run regression checks in CI with Apidog CLI.&lt;/p&gt;

&lt;p&gt;That is a safer starting point than giving a new hosted agent live access and discovering contract bugs in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why production agents are still hard to ship
&lt;/h2&gt;

&lt;p&gt;A weekend demo agent is easy. A production agent is not.&lt;/p&gt;

&lt;p&gt;Once you move beyond a single request and response, the operational work grows quickly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Secure code execution for file generation, data transformation, or custom scripts&lt;/li&gt;
&lt;li&gt;Persistent state that survives network drops and browser refreshes&lt;/li&gt;
&lt;li&gt;Permission boundaries so an agent can read one system without silently editing another&lt;/li&gt;
&lt;li&gt;Traces for debugging incidents&lt;/li&gt;
&lt;li&gt;Retry and recovery logic for failed steps&lt;/li&gt;
&lt;li&gt;Predictable contracts for the APIs and tools the agent calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where many teams get stuck between prototype and launch. The model keeps improving, but the runtime still consumes engineering time.&lt;/p&gt;

&lt;p&gt;The same pattern appears across coding assistants, research agents, meeting prep tools, and workflow automation products: the agent runtime becomes a product of its own.&lt;/p&gt;

&lt;p&gt;Claude Managed Agents is Anthropic's attempt to collapse that runtime layer into a managed service.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude Managed Agents includes
&lt;/h2&gt;

&lt;p&gt;According to Anthropic's launch post, Claude Managed Agents combines a Claude-tuned orchestration harness with hosted production infrastructure.&lt;/p&gt;

&lt;p&gt;For API and platform teams, five capabilities matter most.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Hosted agent runtime
&lt;/h3&gt;

&lt;p&gt;You define the job, tool access, and guardrails. Anthropic runs the agent loop on hosted infrastructure.&lt;/p&gt;

&lt;p&gt;That removes a large amount of backend work your team would otherwise need to build, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;queue management&lt;/li&gt;
&lt;li&gt;sandbox workers&lt;/li&gt;
&lt;li&gt;session lifecycle handling&lt;/li&gt;
&lt;li&gt;execution control&lt;/li&gt;
&lt;li&gt;runtime observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most teams can already call a model. What they often lack is a reliable runtime for real work.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Long-running sessions
&lt;/h3&gt;

&lt;p&gt;Anthropic says sessions can run for hours and persist outputs and progress even if the client disconnects.&lt;/p&gt;

&lt;p&gt;That matters for workflows such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;research tasks&lt;/li&gt;
&lt;li&gt;report generation&lt;/li&gt;
&lt;li&gt;large file creation&lt;/li&gt;
&lt;li&gt;document processing&lt;/li&gt;
&lt;li&gt;multi-step planning&lt;/li&gt;
&lt;li&gt;background operational work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent writes reports, audits codebases, processes documents, or assembles deliverables from several systems, long-running sessions remove a major constraint.&lt;/p&gt;

&lt;p&gt;Instead of designing around short chat windows, you can design around completed work.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Sandboxed execution and governance
&lt;/h3&gt;

&lt;p&gt;The launch emphasizes secure sandboxing, authentication, identity, and scoped permissions.&lt;/p&gt;

&lt;p&gt;That is not a secondary detail. It is the difference between a demo and an enterprise-ready agent.&lt;/p&gt;

&lt;p&gt;An agent that can open a pull request, generate a spreadsheet, or interact with finance data should not have broad access by default.&lt;/p&gt;

&lt;p&gt;Hosted governance gives teams a clearer way to constrain what the runtime can do and gives security reviewers a smaller surface to evaluate.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Built-in tracing and troubleshooting
&lt;/h3&gt;

&lt;p&gt;Anthropic says tool calls, decisions, analytics, and failure modes are visible in Claude Console.&lt;/p&gt;

&lt;p&gt;Good tracing shortens the gap between:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Something failed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This request called this tool, received this response, followed this branch, and failed here.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is especially important when debugging tools instead of prompts. In many agent systems, the weak point is the API contract around the tool, not the model itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Multi-agent coordination in research preview
&lt;/h3&gt;

&lt;p&gt;Anthropic also announced multi-agent coordination, where agents can direct other agents to parallelize work.&lt;/p&gt;

&lt;p&gt;This is still in research preview, so it should not be the primary reason to adopt the platform today. But it signals the direction of the product: from single agents to coordinated teams of agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  How this changes agent architecture
&lt;/h2&gt;

&lt;p&gt;Before Managed Agents, a team usually had two options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option A: Build the runtime yourself
&lt;/h2&gt;

&lt;p&gt;This gives you maximum control, but you also own the full runtime stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;container or VM isolation&lt;/li&gt;
&lt;li&gt;tool execution lifecycle&lt;/li&gt;
&lt;li&gt;session persistence&lt;/li&gt;
&lt;li&gt;checkpointing&lt;/li&gt;
&lt;li&gt;secrets and credentials&lt;/li&gt;
&lt;li&gt;permissioning&lt;/li&gt;
&lt;li&gt;logs and traces&lt;/li&gt;
&lt;li&gt;retries and recovery&lt;/li&gt;
&lt;li&gt;operations after launch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This path still makes sense when you need unusual infrastructure, strict in-house hosting requirements, or deeply custom orchestration logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option B: Use a managed runtime
&lt;/h2&gt;

&lt;p&gt;This trades some control for speed.&lt;/p&gt;

&lt;p&gt;The runtime is already hosted, so your team can focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;task design&lt;/li&gt;
&lt;li&gt;user experience&lt;/li&gt;
&lt;li&gt;tool quality&lt;/li&gt;
&lt;li&gt;permission design&lt;/li&gt;
&lt;li&gt;workflow reliability&lt;/li&gt;
&lt;li&gt;API contract testing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic frames Managed Agents as a way to get to production faster. The launch post also says internal testing on structured file generation showed task success gains of up to 10 points over a standard prompting loop, with the biggest gains on harder problems.&lt;/p&gt;

&lt;p&gt;The important shift is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Hosted agent infrastructure is becoming a product category, not a side project inside your stack.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Claude Managed Agents vs DIY agent infrastructure
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Decision area&lt;/th&gt;
&lt;th&gt;Claude Managed Agents&lt;/th&gt;
&lt;th&gt;DIY runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Time to first production launch&lt;/td&gt;
&lt;td&gt;Fast, because the runtime is already hosted&lt;/td&gt;
&lt;td&gt;Slower, because you build the runtime first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sandboxing and governance&lt;/td&gt;
&lt;td&gt;Built in&lt;/td&gt;
&lt;td&gt;You own the full design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long-running sessions&lt;/td&gt;
&lt;td&gt;Built in&lt;/td&gt;
&lt;td&gt;You build and maintain session state&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracing&lt;/td&gt;
&lt;td&gt;Available in Claude Console&lt;/td&gt;
&lt;td&gt;You build your own observability layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;td&gt;Good for the supported model and runtime pattern&lt;/td&gt;
&lt;td&gt;Highest flexibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ongoing ops load&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;td&gt;Higher&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best fit&lt;/td&gt;
&lt;td&gt;Teams that want to ship agent products quickly&lt;/td&gt;
&lt;td&gt;Teams with unusual infrastructure or strict custom runtime needs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use &lt;strong&gt;Claude Managed Agents&lt;/strong&gt; if your team wants to ship an agent product this quarter and your differentiator is the workflow, UI, or proprietary tools behind it.&lt;/p&gt;

&lt;p&gt;Use &lt;strong&gt;DIY infrastructure&lt;/strong&gt; if the runtime itself is part of your moat, you need full control over hosting and orchestration, or your security model requires deeper custom handling than a managed service can provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing and tradeoffs
&lt;/h2&gt;

&lt;p&gt;Managed Agents uses standard Claude Platform token pricing plus &lt;strong&gt;$0.08 per active session-hour&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That changes how you should think about cost.&lt;/p&gt;

&lt;p&gt;With a normal chat API workflow, cost mostly comes from tokens. With a managed runtime, cost comes from tokens plus elapsed active runtime.&lt;/p&gt;

&lt;p&gt;Design your agents to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;finish work cleanly&lt;/li&gt;
&lt;li&gt;fail fast on invalid inputs&lt;/li&gt;
&lt;li&gt;avoid unnecessary loops&lt;/li&gt;
&lt;li&gt;separate short synchronous tasks from longer background jobs&lt;/li&gt;
&lt;li&gt;set clear timeout behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before adopting it, answer these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How often will a session run for minutes versus hours?&lt;/li&gt;
&lt;li&gt;How much value does one completed run create for the user?&lt;/li&gt;
&lt;li&gt;Which tasks should stay synchronous?&lt;/li&gt;
&lt;li&gt;Which tasks should move into background execution?&lt;/li&gt;
&lt;li&gt;What should happen when a tool call fails halfway through a workflow?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your agent mostly performs short deterministic calls, a normal API integration may be enough.&lt;/p&gt;

&lt;p&gt;If your agent researches, writes, patches, coordinates tools, and returns a deliverable later, a managed runtime becomes more attractive.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test agent tool APIs with Apidog before launch
&lt;/h2&gt;

&lt;p&gt;The weak point in many agent launches is not the model. It is the tool layer.&lt;/p&gt;

&lt;p&gt;If your agent can call tools such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;search_customers
create_invoice
open_pr
send_slack_message
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then every tool is an API contract.&lt;/p&gt;

&lt;p&gt;You need to know what happens when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the payload is malformed&lt;/li&gt;
&lt;li&gt;the schema changes&lt;/li&gt;
&lt;li&gt;a required field disappears&lt;/li&gt;
&lt;li&gt;an enum value changes&lt;/li&gt;
&lt;li&gt;the auth token has the wrong scope&lt;/li&gt;
&lt;li&gt;the downstream service returns a timeout&lt;/li&gt;
&lt;li&gt;the tool returns an error the agent does not expect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdprzdkwxiczf618g1a36.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdprzdkwxiczf618g1a36.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apidog fits this workflow because you can model and test tool contracts before the agent reaches production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Define each tool as an API contract
&lt;/h2&gt;

&lt;p&gt;Start by treating every agent tool as a real API endpoint.&lt;/p&gt;

&lt;p&gt;For example, an internal &lt;code&gt;create_invoice&lt;/code&gt; tool might map to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /invoices
Content-Type: application/json
Authorization: Bearer &amp;lt;token&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"line_items"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"API usage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"unit_price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"invoice_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inv_456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"draft"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"total"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an agent, this contract matters because the model will rely on field names, required properties, enums, and error shapes.&lt;/p&gt;

&lt;p&gt;If the contract is ambiguous, the agent behavior becomes harder to debug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Use Smart Mock to stand up tool endpoints early
&lt;/h2&gt;

&lt;p&gt;Smart Mock generates realistic responses from your API spec and respects JSON Schema constraints.&lt;/p&gt;

&lt;p&gt;That gives your team a fast way to stand up fake tool endpoints while the real backend is still changing.&lt;/p&gt;

&lt;p&gt;For agent development, this is useful because you can test planning and tool selection before every downstream service is ready.&lt;/p&gt;

&lt;p&gt;If your managed agent expects fields such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ticket_priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"account_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"acc_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"open"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Smart Mock can return data that matches the schema instead of hand-written placeholders that hide bugs.&lt;/p&gt;

&lt;p&gt;Use mocks when you need to validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tool selection&lt;/li&gt;
&lt;li&gt;expected response shape&lt;/li&gt;
&lt;li&gt;error handling&lt;/li&gt;
&lt;li&gt;branching behavior&lt;/li&gt;
&lt;li&gt;multi-step planning before backend completion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See also API Testing Without Postman in 2026 if you are standardizing this workflow across the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Build multi-step Test Scenarios for agent workflows
&lt;/h2&gt;

&lt;p&gt;Apidog Test Scenarios are useful when one tool call feeds the next.&lt;/p&gt;

&lt;p&gt;The docs describe support for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sequential execution&lt;/li&gt;
&lt;li&gt;data passing between requests&lt;/li&gt;
&lt;li&gt;flow control&lt;/li&gt;
&lt;li&gt;predefined test data&lt;/li&gt;
&lt;li&gt;CI/CD integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That maps directly to agent systems.&lt;/p&gt;

&lt;p&gt;A realistic validation flow might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. POST /tasks
2. Extract task_id from the response
3. GET /tasks/{task_id}
4. Assert status transitions
5. Trigger an auth failure
6. Verify the error payload stays within contract
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example assertions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response.status == 200
response.body.task_id exists
response.body.status in ["queued", "running", "completed", "failed"]
response.body.error.code exists when status == "failed"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This kind of scenario catches tool bugs before the agent runtime has to recover from them in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Validate contract drift before it breaks the agent
&lt;/h2&gt;

&lt;p&gt;Agents are sensitive to schema drift.&lt;/p&gt;

&lt;p&gt;A renamed field, a looser enum, or a missing nested property can break a tool chain in ways that look like reasoning failures.&lt;/p&gt;

&lt;p&gt;For example, this response may work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subscription_status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But this changed response may break the agent if the tool definition was not updated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cus_123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use Apidog to lock down request and response shapes with OpenAPI and JSON Schema, then run scenario-based checks when the backend changes.&lt;/p&gt;

&lt;p&gt;This is especially important if your team generates tool definitions from API specs, because the agent will trust the spec you give it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Add CLI checks to CI for regression coverage
&lt;/h2&gt;

&lt;p&gt;Apidog CLI can run test suites from the command line and output reports, including HTML reports in the generated &lt;code&gt;apidog-reports/&lt;/code&gt; directory.&lt;/p&gt;

&lt;p&gt;That makes it a good fit for pre-merge or pre-deploy checks on agent tools.&lt;/p&gt;

&lt;p&gt;A simple CI policy is enough:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every tool endpoint needs a schema check&lt;/li&gt;
&lt;li&gt;Every write action needs at least one auth failure test&lt;/li&gt;
&lt;li&gt;Every long-running workflow needs a timeout and retry case&lt;/li&gt;
&lt;li&gt;Every high-risk tool needs one negative test for bad state&lt;/li&gt;
&lt;li&gt;Every response used by the agent should have stable field names and documented error shapes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example CI workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Developer opens PR
  -&amp;gt; API spec changes
  -&amp;gt; Apidog CLI runs tool contract tests
  -&amp;gt; Test report is generated
  -&amp;gt; Merge is blocked if schema or scenario checks fail
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you do this, your managed agent enters production with a cleaner tool surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  A simple architecture pattern to start with
&lt;/h2&gt;

&lt;p&gt;You do not need a large agent platform on day one.&lt;/p&gt;

&lt;p&gt;Start with a narrow architecture:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User request
  -&amp;gt; Claude Managed Agent session
  -&amp;gt; tool selection
  -&amp;gt; internal APIs and third-party services
  -&amp;gt; result artifact or action
  -&amp;gt; trace review in Claude Console
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before launch, validate the tool layer separately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Apidog spec
  -&amp;gt; Smart Mock
  -&amp;gt; Test Scenarios
  -&amp;gt; CLI regression checks in CI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This split is healthy.&lt;/p&gt;

&lt;p&gt;Let Claude Managed Agents handle runtime concerns such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;session management&lt;/li&gt;
&lt;li&gt;hosted execution&lt;/li&gt;
&lt;li&gt;orchestration&lt;/li&gt;
&lt;li&gt;long-running work&lt;/li&gt;
&lt;li&gt;tracing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let Apidog handle API quality concerns such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;contract design&lt;/li&gt;
&lt;li&gt;mock responses&lt;/li&gt;
&lt;li&gt;schema validation&lt;/li&gt;
&lt;li&gt;multi-step tests&lt;/li&gt;
&lt;li&gt;regression checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That keeps the model layer and the API quality layer separate, which is what most teams need.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this launch matters most
&lt;/h2&gt;

&lt;p&gt;Claude Managed Agents is most interesting for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;teams building coding or debugging agents&lt;/li&gt;
&lt;li&gt;teams running document or research workflows that take more than a few minutes&lt;/li&gt;
&lt;li&gt;product teams that want background task execution inside an app&lt;/li&gt;
&lt;li&gt;enterprise teams that need governance, tracing, and scoped permissions&lt;/li&gt;
&lt;li&gt;API teams that already have internal tools and want a faster route to agent products&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your team is still proving the use case, start with a narrow workflow and a small tool surface.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Good first workflow:
User asks for a customer account summary
  -&amp;gt; agent calls customer profile API
  -&amp;gt; agent calls billing status API
  -&amp;gt; agent calls recent tickets API
  -&amp;gt; agent returns a structured summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Avoid starting with a broad agent that can call every internal API.&lt;/p&gt;

&lt;p&gt;Instead, start with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one user goal&lt;/li&gt;
&lt;li&gt;three to five tools&lt;/li&gt;
&lt;li&gt;explicit permissions&lt;/li&gt;
&lt;li&gt;mocked failure cases&lt;/li&gt;
&lt;li&gt;CI checks for every tool contract&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the workflow works and infrastructure is the bottleneck, Claude Managed Agents is worth serious attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Claude Managed Agents is not just another model feature. It is Anthropic's attempt to productize the messy part of agent delivery: hosted execution, persistence, governance, and tracing.&lt;/p&gt;

&lt;p&gt;That shifts the build question from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do we create an agent runtime?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Which workflows deserve an agent, and how safe are the tools behind it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That second question is where Apidog fits.&lt;/p&gt;

&lt;p&gt;Before you expose an internal API to a long-running hosted agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model the contract&lt;/li&gt;
&lt;li&gt;mock the responses&lt;/li&gt;
&lt;li&gt;test the happy path&lt;/li&gt;
&lt;li&gt;test the failure paths&lt;/li&gt;
&lt;li&gt;validate auth behavior&lt;/li&gt;
&lt;li&gt;add regression coverage in CI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That work gives the agent a cleaner surface to operate on and gives your team fewer production surprises.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude Managed Agents?
&lt;/h3&gt;

&lt;p&gt;Claude Managed Agents is Anthropic's hosted runtime for cloud-based agents on the Claude Platform. It includes sandboxed execution, long-running sessions, tracing, scoped permissions, and hosted orchestration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Claude Managed Agents available now?
&lt;/h3&gt;

&lt;p&gt;Yes. Anthropic announced it as a public beta on April 8, 2026. Some features, such as multi-agent coordination and self-evaluation loops, are still in research preview.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is Claude Managed Agents priced?
&lt;/h3&gt;

&lt;p&gt;Anthropic says standard Claude Platform token pricing applies, plus $0.08 per active session-hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should you use Managed Agents instead of building your own runtime?
&lt;/h3&gt;

&lt;p&gt;Use Managed Agents when speed to production matters more than deep runtime customization.&lt;/p&gt;

&lt;p&gt;Build your own runtime if your team needs unusual hosting, strict in-house control, or custom orchestration that a managed platform cannot support.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why should API teams test agent tools separately?
&lt;/h3&gt;

&lt;p&gt;Because many agent failures come from broken tool contracts, auth issues, or schema drift instead of poor reasoning.&lt;/p&gt;

&lt;p&gt;Testing tools separately helps you catch those failures before they reach the agent runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  How can Apidog help with agent tool testing?
&lt;/h3&gt;

&lt;p&gt;Apidog helps you define the tool contract, generate mocked responses from the schema with Smart Mock, chain multi-step validations with Test Scenarios, and run regression checks in CI with Apidog CLI.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Google Vertex AI alternatives in 2026: simpler setup, no GCP lock-in</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Wed, 03 Jun 2026 01:01:40 +0000</pubDate>
      <link>https://dev.to/preecha/best-google-vertex-ai-alternatives-in-2026-simpler-setup-no-gcp-lock-in-oim</link>
      <guid>https://dev.to/preecha/best-google-vertex-ai-alternatives-in-2026-simpler-setup-no-gcp-lock-in-oim</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Google Vertex AI is a comprehensive ML platform, but it also requires GCP expertise, cloud configuration, and ongoing infrastructure management. If your use case is production AI inference rather than full MLOps, consider alternatives like WaveSpeed, Replicate, Fal.ai, or OpenAI API. Test candidate providers in Apidog before migrating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Vertex AI is Google Cloud’s enterprise platform for the full ML lifecycle: training, deployment, evaluation, and monitoring. It is a strong option for teams already invested in GCP and building custom ML pipelines.&lt;/p&gt;

&lt;p&gt;For developers who only need to call an AI model and return a result, Vertex AI can add unnecessary operational overhead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GCP IAM and service account setup&lt;/li&gt;
&lt;li&gt;Region-specific endpoint configuration&lt;/li&gt;
&lt;li&gt;Cloud billing and quota management&lt;/li&gt;
&lt;li&gt;Deployment and infrastructure decisions&lt;/li&gt;
&lt;li&gt;Vendor lock-in to Google Cloud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your workload is inference-only, a hosted API provider may be faster to implement and easier to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Vertex AI does well
&lt;/h2&gt;

&lt;p&gt;Vertex AI is designed for teams that need a managed ML platform, not just an inference API.&lt;/p&gt;

&lt;p&gt;Common Vertex AI capabilities include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full ML lifecycle management&lt;/strong&gt;: training, evaluation, deployment, and monitoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom model deployment&lt;/strong&gt;: host your own trained models on Google infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini API access&lt;/strong&gt;: use Google models through the Vertex AI platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GCP integration&lt;/strong&gt;: connect with BigQuery, Cloud Storage, IAM, and other Google Cloud services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Vertex AI when you need those platform capabilities and already have GCP expertise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Vertex AI creates friction
&lt;/h2&gt;

&lt;p&gt;For many developer teams, the main friction is not model quality. It is setup and operations.&lt;/p&gt;

&lt;p&gt;Typical blockers include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GCP expertise required&lt;/strong&gt;: meaningful setup requires familiarity with Google Cloud IAM, projects, regions, quotas, and billing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Longer setup time&lt;/strong&gt;: new model deployments can take days or weeks depending on the environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor lock-in&lt;/strong&gt;: infrastructure, billing, and operations are tightly coupled to GCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost complexity&lt;/strong&gt;: GCP pricing can be layered and harder to predict&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overkill for simple inference&lt;/strong&gt;: you may only need an HTTPS API call, not a full MLOps platform&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top Vertex AI alternatives for inference
&lt;/h2&gt;

&lt;h3&gt;
  
  
  WaveSpeed
&lt;/h3&gt;

&lt;p&gt;WaveSpeed is a hosted inference provider focused on fast setup and access to many visual AI models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful when you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API-key-based setup&lt;/li&gt;
&lt;li&gt;First request in minutes&lt;/li&gt;
&lt;li&gt;600+ models&lt;/li&gt;
&lt;li&gt;Access to models including ByteDance and Alibaba ecosystems&lt;/li&gt;
&lt;li&gt;Transparent pay-per-use pricing&lt;/li&gt;
&lt;li&gt;No GCP dependency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of configuring GCP projects, IAM roles, and Vertex AI endpoints, you can call WaveSpeed with a Bearer token.&lt;/p&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A professional office building lobby, architectural photography style"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;WaveSpeed is a good fit if your team wants hosted model access without managing cloud ML infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replicate
&lt;/h3&gt;

&lt;p&gt;Replicate is a practical option for teams that want access to open-source models through a simple API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful when you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,000+ community models&lt;/li&gt;
&lt;li&gt;Simple setup&lt;/li&gt;
&lt;li&gt;No GCP dependency&lt;/li&gt;
&lt;li&gt;Open-source model access&lt;/li&gt;
&lt;li&gt;Support for custom models through Cog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replicate is often a straightforward path when you want to experiment with multiple open-source models without managing infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="http://fal.ai/?ref=apidog.com" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Fal.ai focuses on serverless inference and speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful when you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;600+ serverless models&lt;/li&gt;
&lt;li&gt;Fast inference&lt;/li&gt;
&lt;li&gt;Simple API access&lt;/li&gt;
&lt;li&gt;No GCP dependency&lt;/li&gt;
&lt;li&gt;Per-output pricing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fal.ai can be a good fit for latency-sensitive applications that need hosted inference without cloud platform setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenAI API
&lt;/h3&gt;

&lt;p&gt;The OpenAI API is a strong alternative if your Vertex AI usage is mainly centered on general-purpose text, image, audio, or multimodal capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Useful when you need:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPT models&lt;/li&gt;
&lt;li&gt;Image generation&lt;/li&gt;
&lt;li&gt;Whisper&lt;/li&gt;
&lt;li&gt;Strong API documentation&lt;/li&gt;
&lt;li&gt;Simple authentication&lt;/li&gt;
&lt;li&gt;No GCP dependency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example image generation request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.openai.com/v1/images/generations
Authorization: Bearer {{OPENAI_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-image-1.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A professional office building lobby, architectural photography style"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1024x1024"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Setup time&lt;/th&gt;
&lt;th&gt;GCP required&lt;/th&gt;
&lt;th&gt;Custom models&lt;/th&gt;
&lt;th&gt;Price transparency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vertex AI&lt;/td&gt;
&lt;td&gt;Days to weeks&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Complex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Simple&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicate&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes, with Cog&lt;/td&gt;
&lt;td&gt;Per-second&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://fal.ai/?ref=apidog.com" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Per-output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI API&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Fine-tuning&lt;/td&gt;
&lt;td&gt;Per-token&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Testing alternatives with Apidog
&lt;/h2&gt;

&lt;p&gt;Before migrating away from Vertex AI, test the same prompts against each provider.&lt;/p&gt;

&lt;p&gt;Vertex AI usually requires GCP authentication, such as service accounts or OAuth tokens, before you can test an endpoint. Most hosted inference APIs use simpler Bearer token authentication.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create environments
&lt;/h3&gt;

&lt;p&gt;Create one Apidog environment per provider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Vertex AI&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;WaveSpeed&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Replicate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Fal.ai&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;OpenAI&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Add provider credentials as Secret variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WAVESPEED_API_KEY
OPENAI_API_KEY
REPLICATE_API_KEY
FAL_API_KEY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Add provider requests
&lt;/h3&gt;

&lt;p&gt;For WaveSpeed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A professional office building lobby, architectural photography style"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For OpenAI image generation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.openai.com/v1/images/generations
Authorization: Bearer {{OPENAI_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-image-1.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A professional office building lobby, architectural photography style"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1024x1024"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Run the same production prompts
&lt;/h3&gt;

&lt;p&gt;Use the same prompts, parameters, and expected output criteria across providers.&lt;/p&gt;

&lt;p&gt;Compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response time&lt;/li&gt;
&lt;li&gt;Output quality&lt;/li&gt;
&lt;li&gt;Failure rate&lt;/li&gt;
&lt;li&gt;Response schema&lt;/li&gt;
&lt;li&gt;Pricing model&lt;/li&gt;
&lt;li&gt;Authentication complexity&lt;/li&gt;
&lt;li&gt;Integration effort&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Validate response parsing
&lt;/h3&gt;

&lt;p&gt;Each provider returns different JSON. Before switching traffic, confirm your application can parse the new response shape.&lt;/p&gt;

&lt;p&gt;For example, do not assume every provider returns image URLs or generated text in the same field.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration checklist from Vertex AI
&lt;/h2&gt;

&lt;p&gt;Use this checklist for inference-only migrations.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Identify current Vertex AI usage
&lt;/h3&gt;

&lt;p&gt;Document what you are using Vertex AI for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text generation&lt;/li&gt;
&lt;li&gt;Image generation&lt;/li&gt;
&lt;li&gt;Embeddings&lt;/li&gt;
&lt;li&gt;Audio&lt;/li&gt;
&lt;li&gt;Custom model inference&lt;/li&gt;
&lt;li&gt;Batch jobs&lt;/li&gt;
&lt;li&gt;Monitoring&lt;/li&gt;
&lt;li&gt;Training pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you rely on Vertex AI training, monitoring, or explainability, an inference API alone will not replace those features.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Map each model to an alternative
&lt;/h3&gt;

&lt;p&gt;For each Vertex AI model or endpoint, identify the closest replacement.&lt;/p&gt;

&lt;p&gt;Example mapping:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current usage&lt;/th&gt;
&lt;th&gt;Possible alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemini text generation&lt;/td&gt;
&lt;td&gt;OpenAI API or Gemini API directly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;WaveSpeed, Fal.ai, OpenAI API, Replicate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Open-source model inference&lt;/td&gt;
&lt;td&gt;Replicate or Fal.ai&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual AI model access&lt;/td&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom model hosting&lt;/td&gt;
&lt;td&gt;Replicate with Cog or another model-hosting option&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Update authentication
&lt;/h3&gt;

&lt;p&gt;Vertex AI commonly uses GCP credentials.&lt;/p&gt;

&lt;p&gt;Alternatives usually use Bearer tokens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer {{API_KEY}}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This simplifies local testing, CI, and API client setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Update endpoints
&lt;/h3&gt;

&lt;p&gt;Vertex AI endpoints follow GCP URL patterns and often include project, region, and publisher-specific paths.&lt;/p&gt;

&lt;p&gt;Hosted APIs usually expose standard HTTPS endpoints.&lt;/p&gt;

&lt;p&gt;Before migration, update:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL&lt;/li&gt;
&lt;li&gt;Endpoint path&lt;/li&gt;
&lt;li&gt;Headers&lt;/li&gt;
&lt;li&gt;Request body&lt;/li&gt;
&lt;li&gt;Query parameters&lt;/li&gt;
&lt;li&gt;Timeout settings&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Test in Apidog before changing production traffic
&lt;/h3&gt;

&lt;p&gt;Run your production prompts against the new provider first.&lt;/p&gt;

&lt;p&gt;Validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request body format&lt;/li&gt;
&lt;li&gt;Auth headers&lt;/li&gt;
&lt;li&gt;Model parameters&lt;/li&gt;
&lt;li&gt;Response schema&lt;/li&gt;
&lt;li&gt;Error responses&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;Timeout behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Update response parsing
&lt;/h3&gt;

&lt;p&gt;Do not migrate by only changing the URL. Response formats differ.&lt;/p&gt;

&lt;p&gt;Update your application code to handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output field names&lt;/li&gt;
&lt;li&gt;Nested JSON structures&lt;/li&gt;
&lt;li&gt;Async job IDs&lt;/li&gt;
&lt;li&gt;Polling endpoints, if required&lt;/li&gt;
&lt;li&gt;Error codes&lt;/li&gt;
&lt;li&gt;Retry behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Cut over gradually
&lt;/h3&gt;

&lt;p&gt;For production applications, avoid a hard switch when possible.&lt;/p&gt;

&lt;p&gt;Use one of these patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Route a small percentage of traffic to the new provider&lt;/li&gt;
&lt;li&gt;Run both providers in parallel and compare outputs&lt;/li&gt;
&lt;li&gt;Keep Vertex AI as a fallback during rollout&lt;/li&gt;
&lt;li&gt;Monitor latency, errors, and output quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I access Google’s Gemini models without Vertex AI?
&lt;/h3&gt;

&lt;p&gt;Yes. Google’s Gemini API is available directly through Google AI Studio with simpler authentication than Vertex AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Vertex AI cheaper than alternatives for high-volume workloads?
&lt;/h3&gt;

&lt;p&gt;For very high-volume enterprise workloads with committed use discounts, Vertex AI can be cost-competitive. For variable workloads without committed use, pay-per-use alternatives are typically simpler and may be cheaper.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Vertex AI’s monitoring and MLOps features?
&lt;/h3&gt;

&lt;p&gt;Simple inference APIs do not replace Vertex AI’s full MLOps features. If you rely on Vertex AI training pipeline management, model monitoring, or explainability tools, you will need separate tooling to replace those capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does migration from Vertex AI take?
&lt;/h3&gt;

&lt;p&gt;For inference-only workloads, updating the API endpoint and authentication can take a few hours. A complete migration, including testing and production cutover, usually takes 1–3 days depending on workload complexity.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best RunPod alternatives in 2026: pay per inference, not per hour</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Tue, 02 Jun 2026 13:01:42 +0000</pubDate>
      <link>https://dev.to/preecha/best-runpod-alternatives-in-2026-pay-per-inference-not-per-hour-2l34</link>
      <guid>https://dev.to/preecha/best-runpod-alternatives-in-2026-pay-per-inference-not-per-hour-2l34</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;RunPod is a GPU cloud marketplace charging $0.34-$0.79/hour regardless of actual usage. Its main limitations are idle cost, because you pay even when your GPU is not generating; complex setup, including Docker containers and ML framework installation; and manual scaling. Simpler alternatives include WaveSpeed for pay-per-inference with zero setup, Replicate for API access to 1,000+ models, and Fal.ai for fast serverless inference.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;RunPod fills a real need: cheap, flexible GPU access for workloads that require raw compute. If you are running custom training jobs, fine-tuning experiments, or workloads that do not fit standard inference APIs, hourly GPU rental can be the right model.&lt;/p&gt;

&lt;p&gt;For teams using RunPod mainly for model inference, the economics often become harder to justify. You pay $0.34/hour whether the GPU is serving 100 requests or sitting idle. You also maintain Docker containers, install ML frameworks, and manage deployment details yourself. Managed inference APIs remove much of that operational overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  What RunPod provides
&lt;/h2&gt;

&lt;p&gt;RunPod is useful when you need control over the GPU environment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPU marketplace&lt;/strong&gt;: Consumer GPUs such as RTX 3090 and 4090, plus enterprise GPUs such as A100 and H100, available at hourly rates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible deployment&lt;/strong&gt;: Run any Docker container with any ML framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent storage&lt;/strong&gt;: Keep datasets, model weights, and generated assets across sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pod and serverless options&lt;/strong&gt;: Use always-on pods or serverless functions depending on the workload&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The limitations at production scale
&lt;/h2&gt;

&lt;p&gt;The trade-off is that you own more of the infrastructure layer.&lt;/p&gt;

&lt;p&gt;Common production issues include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Idle cost&lt;/strong&gt;: $0.34-$0.79/hour whether the GPU is generating or not; running 24/7 adds up to roughly $245-$570/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setup overhead&lt;/strong&gt;: Docker configuration, CUDA setup, framework installation, and model loading before the first inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual scaling&lt;/strong&gt;: No automatic scale-to-zero for always-on pods; you manage capacity and replica counts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment time&lt;/strong&gt;: New models can take hours to configure, deploy, and validate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance&lt;/strong&gt;: Framework updates, security patches, monitoring, and runtime issues stay with your team&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For inference workloads, these costs matter most when traffic is bursty or unpredictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top alternatives for inference workloads
&lt;/h2&gt;

&lt;h3&gt;
  
  
  WaveSpeed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Standard image and video generation workloads where you want pay-per-inference pricing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Per-inference only, zero idle costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ pre-deployed models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setup&lt;/strong&gt;: API key, then first request in minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Potential savings&lt;/strong&gt;: 85-95% versus RunPod for sporadic workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WaveSpeed’s pay-per-inference model eliminates idle costs. You pay only when generating. For teams using RunPod for standard image or video generation models, the cost difference can be significant: around $0.02-$0.08 per image instead of paying for GPU-hours whether requests are running or not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replicate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams that want access to a large model catalog without running containers.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Per-second of compute, for example $0.000225/s on Nvidia T4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 1,000+ community models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cold starts&lt;/strong&gt;: 10-30 seconds on first request&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replicate scales to zero between requests. You avoid idle costs and container management. The 1,000+ model catalog also means many common workloads are already available through an API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fal.ai
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Fast serverless inference for optimized image and video models.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Per output, such as per megapixel for images or per second for video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ optimized models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: 2-3x faster inference than standard GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fal.ai’s serverless architecture is closest to RunPod’s serverless tier, but with managed model deployment. Instead of running containers, you call an API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Novita AI
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams that need both managed inference APIs and access to raw GPU instances.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: $0.0015/image, spot GPU instances at 50% off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 200+ APIs plus GPU instance access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique point&lt;/strong&gt;: Hybrid API and raw GPU access in one account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Novita AI is the closest hosted alternative to RunPod for teams that need both managed inference and raw GPU capacity. You can use the API for standard workloads and GPU instances for custom training.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost comparison
&lt;/h2&gt;

&lt;p&gt;The right choice depends on GPU utilization. Use this table as a starting point:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;RunPod cost&lt;/th&gt;
&lt;th&gt;WaveSpeed cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100 images, RTX 3090, 1 hour&lt;/td&gt;
&lt;td&gt;$0.34 idle + active&lt;/td&gt;
&lt;td&gt;~$2-$4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000 images/month, sporadic&lt;/td&gt;
&lt;td&gt;$50-$200+ including idle time&lt;/td&gt;
&lt;td&gt;$20-$80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000 images/month, consistent&lt;/td&gt;
&lt;td&gt;$245+ for 24/7 GPU&lt;/td&gt;
&lt;td&gt;$200-$800&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;RunPod becomes cost-competitive when your GPU is busy most of the time. As a rule of thumb, if utilization is below 80%, managed inference APIs are often cheaper.&lt;/p&gt;

&lt;p&gt;To estimate your real cost, calculate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;monthly_runpod_cost = gpu_hourly_rate * total_hours_running
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then compare it with managed API usage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;managed_api_cost = number_of_outputs * cost_per_output
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important detail is to include idle hours in the RunPod calculation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing with Apidog
&lt;/h2&gt;

&lt;p&gt;RunPod requires deploying a pod before you can test anything. Managed APIs can usually be tested in minutes with a direct HTTP request.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft06s31y8tv4j9gtx3vif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft06s31y8tv4j9gtx3vif.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is a practical way to test WaveSpeed in Apidog.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create an environment variable
&lt;/h3&gt;

&lt;p&gt;Create an environment and add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API_KEY = your_wavespeed_api_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Store it as a secret variable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Send a test request
&lt;/h3&gt;

&lt;p&gt;Use this request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
Authorization: Bearer {{API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A 3D render of a modern office desk setup, soft lighting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"image_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"landscape_4_3"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Add assertions
&lt;/h3&gt;

&lt;p&gt;Add checks for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Status code is 200
Response body &amp;gt; outputs &amp;gt; 0 &amp;gt; url exists
Response time &amp;lt; 30000ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Run a small benchmark
&lt;/h3&gt;

&lt;p&gt;Run 10 requests and record:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average response time&lt;/li&gt;
&lt;li&gt;Success rate&lt;/li&gt;
&lt;li&gt;Cost per output&lt;/li&gt;
&lt;li&gt;Total cost for the batch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then compare it with your RunPod cost for the same period, including idle time.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;RunPod cost = hourly_rate * hours_pod_was_running
Managed API cost = request_count * cost_per_request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you a workload-specific answer instead of relying on generic pricing comparisons.&lt;/p&gt;

&lt;h2&gt;
  
  
  When RunPod is still the right choice
&lt;/h2&gt;

&lt;p&gt;RunPod remains the better option when you need raw GPU control.&lt;/p&gt;

&lt;p&gt;Use RunPod when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom model weights&lt;/strong&gt;: Your fine-tuned model does not exist on any managed platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High, consistent utilization&lt;/strong&gt;: The GPU is busy 80%+ of the time, which can justify hourly rental&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proprietary frameworks&lt;/strong&gt;: You depend on unusual ML libraries that managed APIs do not support&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training workloads&lt;/strong&gt;: Fine-tuning and training require direct GPU access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For pure inference on standard models, managed APIs are usually faster to set up and cheaper to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much does RunPod’s idle cost actually add up to?
&lt;/h3&gt;

&lt;p&gt;At $0.34/hour for 24/7 operation, the cost is about $245/month.&lt;/p&gt;

&lt;p&gt;At 8 hours/day, the cost is about $82/month.&lt;/p&gt;

&lt;p&gt;For workloads with sporadic traffic patterns, pay-per-inference is often significantly cheaper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use a managed API for some workloads and RunPod for others?
&lt;/h3&gt;

&lt;p&gt;Yes. Many teams use managed APIs for production inference and RunPod for training or experimentation. The workloads do not need to run on the same platform.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the fastest way to estimate if switching saves money?
&lt;/h3&gt;

&lt;p&gt;Calculate your actual RunPod hours from last month, including idle time.&lt;/p&gt;

&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;runpod_monthly_cost = actual_hours * hourly_rate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;managed_api_monthly_cost = number_of_inferences * cost_per_inference
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also include setup and maintenance time. If your GPU spends a lot of time idle, a managed inference API will usually be the simpler and cheaper option.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Recraft alternatives in 2026: more models, video, flexible pricing</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Tue, 02 Jun 2026 01:01:34 +0000</pubDate>
      <link>https://dev.to/preecha/best-recraft-alternatives-in-2026-more-models-video-flexible-pricing-50d3</link>
      <guid>https://dev.to/preecha/best-recraft-alternatives-in-2026-more-models-video-flexible-pricing-50d3</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Recraft is strong for vector/SVG generation and brand-consistent design assets. Its main limitations are a single proprietary model, subscription pricing, limited API access, and no video generation. The top alternatives are WaveSpeed for broad model access and video, GPT Image 1.5 for text rendering via API, and Ideogram 2.0 for text-in-image generation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Recraft has a clear niche: design-focused AI image generation with vector/SVG output and tools for maintaining brand consistency.&lt;/p&gt;

&lt;p&gt;That makes it useful for teams creating logos, icons, illustrations, campaign assets, and other visual brand materials. But if your workflow needs multiple model options, video generation, deeper API access, or usage-based pricing, you may need an alternative.&lt;/p&gt;

&lt;p&gt;This guide compares Recraft with practical alternatives and shows how to test image-generation APIs directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Recraft does well
&lt;/h2&gt;

&lt;p&gt;Recraft is a good fit when your output needs to behave like a design asset rather than a generic AI image.&lt;/p&gt;

&lt;p&gt;Key strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector and SVG output&lt;/strong&gt;: Native vector generation instead of raster images converted after the fact.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brand style consistency&lt;/strong&gt;: Tools for style guides, locked colors, and consistent visual direction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design-focused interface&lt;/strong&gt;: Built around design workflows rather than only prompt-based image generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text rendering&lt;/strong&gt;: Strong typography support in generated images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured design assets&lt;/strong&gt;: Useful for icons, illustrations, and marketing templates.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where Recraft falls short
&lt;/h2&gt;

&lt;p&gt;Recraft’s strengths are specific, but the tradeoffs matter if you are building production workflows.&lt;/p&gt;

&lt;p&gt;Main limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single model&lt;/strong&gt;: You only get Recraft’s proprietary model, not access to Flux, Seedream, or other model families.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscription pricing&lt;/strong&gt;: Plans range from free to paid monthly tiers with credit limits. Variable workloads may not map cleanly to a fixed subscription.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited API access&lt;/strong&gt;: Not every product feature is available programmatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No video generation&lt;/strong&gt;: You need a separate provider for image-to-video or text-to-video workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less flexibility outside design use cases&lt;/strong&gt;: Photorealistic images and broader creative styles are not Recraft’s primary strength.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top Recraft alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. WaveSpeed
&lt;/h3&gt;

&lt;p&gt;WaveSpeed is the most direct alternative if you want design-quality images plus broader model access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ models, including Seedream 4.5 for design and text rendering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Text rendering&lt;/strong&gt;: Comparable to Recraft through Seedream 4.5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video generation&lt;/strong&gt;: Yes, including models such as Kling, Hailuo, and Seedance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Pay-per-use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Full REST API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WaveSpeed’s Seedream 4.5 model can handle design-style prompts with strong text rendering, making it a practical replacement for many Recraft image-generation workflows.&lt;/p&gt;

&lt;p&gt;It also gives you access to video models and many other image models, which is useful if your team needs more than static design assets.&lt;/p&gt;

&lt;p&gt;At around 50 images per month, WaveSpeed costs approximately half of Recraft’s pricing for comparable usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. GPT Image 1.5
&lt;/h3&gt;

&lt;p&gt;GPT Image 1.5 is the strongest option when text accuracy inside images is the main requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text rendering&lt;/strong&gt;: Best-in-class and an LM Arena leader&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: About &lt;code&gt;$0.04&lt;/code&gt; to &lt;code&gt;$0.08&lt;/code&gt; per image&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Full REST API with clear documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use GPT Image 1.5 when your images need readable, accurate Latin-character text, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Social media banners&lt;/li&gt;
&lt;li&gt;Product announcement graphics&lt;/li&gt;
&lt;li&gt;Ads with headlines&lt;/li&gt;
&lt;li&gt;Event posters&lt;/li&gt;
&lt;li&gt;UI mockup-style visuals&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many text-in-image cases, GPT Image 1.5 performs better than Recraft.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Ideogram 2.0
&lt;/h3&gt;

&lt;p&gt;Ideogram is focused specifically on text-in-image generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Specialty&lt;/strong&gt;: Accurate text inside generated images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: &lt;code&gt;$8&lt;/code&gt; to &lt;code&gt;$96&lt;/code&gt; per month subscription&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your Recraft workflow is mostly about producing images with readable text, Ideogram is worth testing. It was built around this use case and matches one of Recraft’s strongest capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Adobe Firefly API
&lt;/h3&gt;

&lt;p&gt;Adobe Firefly API is the best fit for teams already using Adobe Creative Cloud.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Integration&lt;/strong&gt;: Creative Cloud, Illustrator, and Photoshop&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector support&lt;/strong&gt;: Available through Creative Cloud workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Licensing&lt;/strong&gt;: Commercially safe, based on licensed training data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for&lt;/strong&gt;: Teams already working in Adobe tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Firefly is less of a direct API-first replacement and more of a natural fit for design teams that already rely on Illustrator, Photoshop, and Creative Cloud asset pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Vector/SVG&lt;/th&gt;
&lt;th&gt;Text rendering&lt;/th&gt;
&lt;th&gt;Video&lt;/th&gt;
&lt;th&gt;Brand consistency&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Recraft&lt;/td&gt;
&lt;td&gt;Yes, native&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Built-in tools&lt;/td&gt;
&lt;td&gt;&lt;code&gt;$0-$20/mo&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;No, raster&lt;/td&gt;
&lt;td&gt;Strong with Seedream&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Via API workflow&lt;/td&gt;
&lt;td&gt;Pay-per-use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT Image 1.5&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Best-in-class&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;&lt;code&gt;$0.04-$0.08/img&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ideogram 2.0&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;&lt;code&gt;$8-$96/mo&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adobe Firefly&lt;/td&gt;
&lt;td&gt;Via Creative Cloud&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Creative Cloud integration&lt;/td&gt;
&lt;td&gt;Creative Cloud subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The vector/SVG tradeoff
&lt;/h2&gt;

&lt;p&gt;Recraft’s native vector output is its clearest differentiator.&lt;/p&gt;

&lt;p&gt;Most other AI image APIs generate raster images. You can convert those images to vector afterward, but the result depends heavily on image complexity.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple icons may vectorize well.&lt;/li&gt;
&lt;li&gt;Logos with clean shapes may be usable after cleanup.&lt;/li&gt;
&lt;li&gt;Detailed illustrations often require manual editing.&lt;/li&gt;
&lt;li&gt;Photorealistic outputs usually do not convert cleanly to SVG.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If native SVG output is essential to your workflow, Recraft may still be the best option despite its limitations.&lt;/p&gt;

&lt;p&gt;If you do not need true vector output, alternatives can offer better flexibility, broader model coverage, video support, and usage-based pricing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to test alternatives with Apidog
&lt;/h2&gt;

&lt;p&gt;The fastest way to compare these tools is to run the same prompt across multiple APIs and inspect the outputs side by side.&lt;/p&gt;

&lt;p&gt;Test for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text accuracy&lt;/li&gt;
&lt;li&gt;Typography quality&lt;/li&gt;
&lt;li&gt;Prompt adherence&lt;/li&gt;
&lt;li&gt;Brand/style consistency&lt;/li&gt;
&lt;li&gt;API latency&lt;/li&gt;
&lt;li&gt;Cost per usable image&lt;/li&gt;
&lt;li&gt;Post-processing requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test WaveSpeed Seedream 4.5
&lt;/h3&gt;

&lt;p&gt;Use this request to test a design-style logo prompt.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A minimalist logo for a tech company called 'Apex', blue and white, geometric shapes, clean design"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test GPT Image 1.5
&lt;/h3&gt;

&lt;p&gt;Use this request to test text rendering inside a banner.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.openai.com/v1/images/generations
Authorization: Bearer {{OPENAI_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-image-1.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A social media banner reading 'Spring Collection 2026' in elegant serif font, pastel background"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1792x1024"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Suggested comparison workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create one prompt for each asset type you care about.&lt;/li&gt;
&lt;li&gt;Run the exact same prompt through Recraft and each alternative.&lt;/li&gt;
&lt;li&gt;Save all outputs with the model name and timestamp.&lt;/li&gt;
&lt;li&gt;Score each result on text accuracy, visual quality, and editability.&lt;/li&gt;
&lt;li&gt;Calculate the cost per usable image, not just the cost per generated image.&lt;/li&gt;
&lt;li&gt;Repeat with at least 5 to 10 prompts before choosing a provider.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For design teams, the most important metric is usually not “best single image.” It is how often the API produces an output that can be used with minimal edits.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is there any API that generates SVG output like Recraft?
&lt;/h3&gt;

&lt;p&gt;No major AI image API generates native SVG in 2026. Recraft is unique here.&lt;/p&gt;

&lt;p&gt;For vector output from alternatives, use raster-to-vector tools as a post-processing step, such as Vectorizer.ai or Adobe Illustrator’s Image Trace.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I replicate Recraft’s brand style consistency on alternatives?
&lt;/h3&gt;

&lt;p&gt;Yes, but it requires more setup.&lt;/p&gt;

&lt;p&gt;With LoRA fine-tuning on Flux 2 Max or Stable Diffusion, you can train a model on your brand’s visual style. This gives you more control, but it is more technical than Recraft’s built-in style guide workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which alternative is cheapest for 100 design images per month?
&lt;/h3&gt;

&lt;p&gt;WaveSpeed is likely the cheapest among the listed options, at approximately &lt;code&gt;$3-$10&lt;/code&gt; depending on the model tier.&lt;/p&gt;

&lt;p&gt;That compares with Recraft’s Pro tier at &lt;code&gt;$20/month&lt;/code&gt; for 1,000 credits, where image cost varies based on quality settings.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Recraft’s text rendering beat GPT Image 1.5?
&lt;/h3&gt;

&lt;p&gt;No. GPT Image 1.5 leads on most text-rendering benchmarks.&lt;/p&gt;

&lt;p&gt;Recraft is strong for text-in-image generation, but GPT Image 1.5 is generally better for text accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Choose &lt;strong&gt;Recraft&lt;/strong&gt; if native SVG/vector output and built-in brand consistency tools are core requirements.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;WaveSpeed&lt;/strong&gt; if you want broader model access, video generation, REST API flexibility, and pay-per-use pricing.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;GPT Image 1.5&lt;/strong&gt; if your highest priority is accurate text inside generated images.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;Ideogram 2.0&lt;/strong&gt; if your workflow is heavily focused on text-in-image generation.&lt;/p&gt;

&lt;p&gt;Choose &lt;strong&gt;Adobe Firefly API&lt;/strong&gt; if your team already works inside Adobe Creative Cloud and wants tighter integration with existing design tools.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Modal alternatives in 2026: skip the infrastructure, call an API instead</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Mon, 01 Jun 2026 13:01:43 +0000</pubDate>
      <link>https://dev.to/preecha/best-modal-alternatives-in-2026-skip-the-infrastructure-call-an-api-instead-1hd5</link>
      <guid>https://dev.to/preecha/best-modal-alternatives-in-2026-skip-the-infrastructure-call-an-api-instead-1hd5</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Modal is a serverless Python infrastructure platform for running custom code on cloud GPUs. It works well when you need custom Python execution, but it adds coding overhead because you still write and maintain containers. If you only need standard AI model inference, alternatives like WaveSpeed, Replicate, and Fal.ai can be faster to implement because they expose managed APIs instead of requiring deployment code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Modal is useful when you have custom Python code that needs GPU execution and automatic scaling. For example, running a Python function on an A100 with Modal is much simpler than provisioning GPU instances, configuring drivers, and managing Kubernetes or EC2 infrastructure yourself.&lt;/p&gt;

&lt;p&gt;The tradeoff is that Modal still requires you to think like an infrastructure owner. You write Python functions, define containers, manage dependencies, and maintain deployment logic over time.&lt;/p&gt;

&lt;p&gt;If your use case is standard AI inference — image generation, video generation, or text generation — a managed API may be simpler. Instead of deploying your own function, you send an HTTP request to a hosted model endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Modal does
&lt;/h2&gt;

&lt;p&gt;Modal provides a higher-level way to run Python code on cloud GPUs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless GPU execution&lt;/strong&gt;: Write Python functions and run them on cloud GPUs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic scaling&lt;/strong&gt;: Functions can scale to zero and back up without manual configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container management&lt;/strong&gt;: Modal handles Python dependencies and GPU runtime setup.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast cold starts&lt;/strong&gt;: Startup time is faster than many traditional container orchestration setups.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A typical Modal workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;App&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpu-example&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;modal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debian_slim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;pip_install&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_inference&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Load model, run inference, return output
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;done&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is easier than managing your own GPU cluster, but it is still deployment code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where teams look for alternatives
&lt;/h2&gt;

&lt;p&gt;Teams usually evaluate Modal alternatives when they want to reduce implementation and maintenance work.&lt;/p&gt;

&lt;p&gt;Common reasons include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coding overhead&lt;/strong&gt;: You write Python containers and deployment logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No zero-code path&lt;/strong&gt;: Modal is developer-friendly, but not API-only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No pre-deployed model catalog&lt;/strong&gt;: You bring and deploy your own models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-second billing&lt;/strong&gt;: Costs can include time spent loading models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ongoing maintenance&lt;/strong&gt;: Your functions need updates as dependencies change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learning curve&lt;/strong&gt;: Modal has its own programming model and patterns.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your team is running standard models, a hosted API can remove most of this work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  WaveSpeed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams that want hosted image, video, or text generation APIs without writing deployment code.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ pre-deployed models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface&lt;/strong&gt;: REST API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coding required&lt;/strong&gt;: No Python container required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples mentioned&lt;/strong&gt;: ByteDance Seedream, Kling 2.0, Alibaba WAN&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing model&lt;/strong&gt;: Pay per API call&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams using Modal to run image or video generation models, WaveSpeed removes the infrastructure layer. You do not write Modal functions, configure containers, or maintain GPU runtime dependencies. You call an endpoint and process the response.&lt;/p&gt;

&lt;p&gt;WaveSpeed covers model categories such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image generation: Flux, Seedream, Stable Diffusion&lt;/li&gt;
&lt;li&gt;Video generation: Kling, Runway, Hailuo&lt;/li&gt;
&lt;li&gt;Text generation: Qwen, DeepSeek&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your Modal functions are wrapping standard models already available through WaveSpeed, migration can be as simple as replacing the Modal function call with an HTTP request.&lt;/p&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/black-forest-labs/flux-2-pro
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "An isometric illustration of a city block, minimal style, soft colors",
  "image_size": "square_hd"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Replicate
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams looking for hosted open-source models with a simple API.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 1,000+ community models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface&lt;/strong&gt;: REST API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Billing model&lt;/strong&gt;: Per-second billing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom deployment&lt;/strong&gt;: Cog tool for packaging custom models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Replicate is useful when your main requirement is access to common open-source models. If you are using Modal because you could not find a hosted version of your target model, check Replicate’s catalog first.&lt;/p&gt;

&lt;p&gt;Implementation usually follows this pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the model in Replicate’s catalog.&lt;/li&gt;
&lt;li&gt;Send input parameters through the REST API.&lt;/li&gt;
&lt;li&gt;Poll or receive the result depending on the API flow.&lt;/li&gt;
&lt;li&gt;Replace your Modal-specific inference wrapper with the hosted API call.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Fal.ai
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams that want serverless AI inference with managed models.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ serverless AI models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: Proprietary inference engine, 2–3x faster generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interface&lt;/strong&gt;: REST API with Python SDK&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fal.ai is architecturally closer to Modal than a basic hosted API: it is serverless, scalable, and designed for fast inference. The main difference is that Fal.ai manages the model deployments for you.&lt;/p&gt;

&lt;p&gt;Instead of writing deployment code, you call an API.&lt;/p&gt;

&lt;p&gt;Example request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://fal.run/fal-ai/flux-pro
Authorization: Key {{FAL_API_KEY}}
Content-Type: application/json

{
  "prompt": "An isometric illustration of a city block, minimal style, soft colors"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Coding required&lt;/th&gt;
&lt;th&gt;Pre-deployed models&lt;/th&gt;
&lt;th&gt;Cold starts&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Modal&lt;/td&gt;
&lt;td&gt;Yes, Python&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Per-second compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;td&gt;Per API call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicate&lt;/td&gt;
&lt;td&gt;No, standard API&lt;/td&gt;
&lt;td&gt;1,000+&lt;/td&gt;
&lt;td&gt;10–30s&lt;/td&gt;
&lt;td&gt;Per-second compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://fal.ai/?ref=apidog.com" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;Minimal&lt;/td&gt;
&lt;td&gt;Per output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Testing with Apidog
&lt;/h2&gt;

&lt;p&gt;The key implementation difference between Modal and hosted API alternatives is testability.&lt;/p&gt;

&lt;p&gt;With Modal, you usually need to deploy or run a function before validating the full inference flow. With hosted APIs, you can test requests directly in Apidog before writing integration code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6f248z9bkz68i6xltx3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6f248z9bkz68i6xltx3.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Test WaveSpeed in Apidog
&lt;/h3&gt;

&lt;p&gt;Create a new request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/black-forest-labs/flux-2-pro
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "An isometric illustration of a city block, minimal style, soft colors",
  "image_size": "square_hd"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test Fal.ai in Apidog
&lt;/h3&gt;

&lt;p&gt;Create another request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://fal.run/fal-ai/flux-pro
Authorization: Key {{FAL_API_KEY}}
Content-Type: application/json

{
  "prompt": "An isometric illustration of a city block, minimal style, soft colors"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Compare providers with the same prompt
&lt;/h3&gt;

&lt;p&gt;Use separate Apidog environments for each provider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;WAVESPEED_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;FAL_API_KEY&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Any other provider-specific credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then run the same prompt across providers and compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output quality&lt;/li&gt;
&lt;li&gt;Response time&lt;/li&gt;
&lt;li&gt;Error format&lt;/li&gt;
&lt;li&gt;JSON response shape&lt;/li&gt;
&lt;li&gt;Cost per request&lt;/li&gt;
&lt;li&gt;Required integration work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives you a practical migration benchmark instead of relying on assumptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Modal is still the right choice
&lt;/h2&gt;

&lt;p&gt;Modal is still the better option when you need custom GPU-backed Python execution rather than a standard hosted model.&lt;/p&gt;

&lt;p&gt;Use Modal when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need custom Python logic around inference.&lt;/li&gt;
&lt;li&gt;You have preprocessing, post-processing, or multi-step pipelines.&lt;/li&gt;
&lt;li&gt;Your model is not available on a hosted platform.&lt;/li&gt;
&lt;li&gt;You are running custom fine-tunes or proprietary architectures.&lt;/li&gt;
&lt;li&gt;You need GPU access for non-AI workloads such as simulation, data processing, or rendering.&lt;/li&gt;
&lt;li&gt;You require specific GPU types for performance or compliance reasons.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For standard model inference, hosted APIs are usually faster to deploy and easier to maintain.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration checklist
&lt;/h2&gt;

&lt;p&gt;If you are moving from Modal to a hosted API, use this process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Identify the model&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confirm whether the same or equivalent model is available on WaveSpeed, Replicate, or Fal.ai.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Map inputs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compare your Modal function arguments with the hosted API request body.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Test the API&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send sample requests in Apidog using real prompts and parameters.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Update application code&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replace the Modal function call with an HTTP request.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Update response parsing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adjust your code for the provider’s JSON response format.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Remove Modal-specific dependencies&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Delete Modal imports, app definitions, image definitions, and deployment scripts if they are no longer needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Benchmark&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compare latency, output quality, and cost before switching production traffic.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simplified replacement might look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.wavespeed.ai/api/v2/black-forest-labs/flux-2-pro&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;WAVESPEED_API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;An isometric illustration of a city block, minimal style, soft colors&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;square_hd&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I use Modal and WaveSpeed together in the same application?
&lt;/h3&gt;

&lt;p&gt;Yes. Use Modal for custom Python logic, preprocessing, post-processing, or orchestration. Use WaveSpeed for standard AI model inference. Many production systems combine infrastructure-level tools with hosted model APIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Modal cheaper than pay-per-use APIs?
&lt;/h3&gt;

&lt;p&gt;It depends on utilization. Modal’s per-second billing means idle time costs nothing. For high-utilization workloads, Modal can be cheaper. For sporadic workloads, pay-per-use APIs are often more economical.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does migrating from Modal to a hosted API look like?
&lt;/h3&gt;

&lt;p&gt;Replace your Modal function call with an HTTP request to the equivalent API endpoint. Then update your response parsing for the new JSON shape and remove Modal dependencies from your project. For simple inference wrappers, this can be a small code change.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Baseten alternatives in 2026: faster setup, no DevOps, lower cost</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Mon, 01 Jun 2026 01:01:41 +0000</pubDate>
      <link>https://dev.to/preecha/best-baseten-alternatives-in-2026-faster-setup-no-devops-lower-cost-4b6n</link>
      <guid>https://dev.to/preecha/best-baseten-alternatives-in-2026-faster-setup-no-devops-lower-cost-4b6n</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Baseten is an enterprise ML infrastructure platform for deploying custom models with its Truss framework. It is a good fit when you need to serve your own trained models with control over GPU infrastructure, but it adds setup time, DevOps overhead, and does not provide a ready-to-use model catalog. Practical alternatives include WaveSpeed for hosted production APIs, Replicate for community models, and Fal.ai for fast inference on standard models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Baseten solves a specific problem: deploying custom ML models into production infrastructure. Its Truss packaging framework helps teams define model runtime behavior, GPU requirements, replicas, and scaling configuration.&lt;/p&gt;

&lt;p&gt;For many AI application teams, that is more infrastructure than they need. If your goal is to generate images, videos, text, or audio from existing models, a hosted inference API is usually faster to integrate and easier to maintain.&lt;/p&gt;

&lt;p&gt;This guide compares Baseten with hosted alternatives and shows how to test an API-based workflow using Apidog.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Baseten does
&lt;/h2&gt;

&lt;p&gt;Baseten is designed for teams that want control over their model-serving stack.&lt;/p&gt;

&lt;p&gt;Typical Baseten use cases include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Packaging custom trained models with Truss&lt;/li&gt;
&lt;li&gt;Deploying models to GPU-backed infrastructure&lt;/li&gt;
&lt;li&gt;Configuring replicas and autoscaling behavior&lt;/li&gt;
&lt;li&gt;Managing production inference endpoints&lt;/li&gt;
&lt;li&gt;Giving MLOps or DevOps teams deployment-level control&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simplified workflow looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Train or fine-tune model
        ↓
Package model with Truss
        ↓
Configure deployment
        ↓
Deploy to Baseten
        ↓
Send inference requests
        ↓
Monitor and scale
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That workflow is useful when the model is unique to your organization. It is less useful when you only need access to existing models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Baseten falls short for most teams
&lt;/h2&gt;

&lt;p&gt;Baseten’s main tradeoff is that you manage more of the deployment lifecycle.&lt;/p&gt;

&lt;p&gt;Common friction points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup time:&lt;/strong&gt; Expect hours to days before the first production-ready inference request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No pre-deployed catalog:&lt;/strong&gt; You bring your own model; there is no default catalog of ready-to-call models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Baseten-specific packaging:&lt;/strong&gt; Truss is useful inside Baseten, but learning it has limited transferability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise pricing:&lt;/strong&gt; Contract-based pricing can be inefficient for variable or smaller workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DevOps overhead:&lt;/strong&gt; Infrastructure management is still part of your team’s responsibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building an application and do not need to deploy custom weights, a hosted inference API usually removes most of this work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top Baseten alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  WaveSpeed
&lt;/h3&gt;

&lt;p&gt;WaveSpeed is a strong alternative when you want production AI model access without managing deployment infrastructure.&lt;/p&gt;

&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; 600+ pre-deployed models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; API key and first request in minutes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Access:&lt;/strong&gt; Includes models such as ByteDance Seedream, Kling, and Alibaba WAN&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; Pay-per-use with no minimum commitments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SLA:&lt;/strong&gt; 99.9% uptime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use WaveSpeed when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need image, video, text, or audio generation quickly&lt;/li&gt;
&lt;li&gt;You do not have custom-trained models&lt;/li&gt;
&lt;li&gt;You want to avoid GPU orchestration and model packaging&lt;/li&gt;
&lt;li&gt;Your workload is variable and better suited to pay-per-use pricing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Estimated savings can exceed 90% for variable workloads compared with enterprise-style contracts, depending on usage patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replicate
&lt;/h3&gt;

&lt;p&gt;Replicate is useful when you want broad access to public and community-hosted models.&lt;/p&gt;

&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; 1,000+ community models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; API key with immediate access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; Per-second compute, for example &lt;code&gt;$0.000225/s&lt;/code&gt; on Nvidia T4&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom models:&lt;/strong&gt; Supported through Cog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Replicate when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want to experiment with many open-source models&lt;/li&gt;
&lt;li&gt;You need common models such as Stable Diffusion, Flux, Llama, or Whisper&lt;/li&gt;
&lt;li&gt;You want simple API access without packaging models yourself&lt;/li&gt;
&lt;li&gt;You may later need to package a custom model with Cog&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Fal.ai
&lt;/h3&gt;

&lt;p&gt;Fal.ai is a good fit when low-latency inference and production reliability are priorities.&lt;/p&gt;

&lt;p&gt;Key points:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; 600+ models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed:&lt;/strong&gt; Proprietary inference engine, often positioned for faster inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing:&lt;/strong&gt; Output-based, such as per megapixel or per video second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SLA:&lt;/strong&gt; 99.99% uptime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Fal.ai when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need hosted inference without managing deployment infrastructure&lt;/li&gt;
&lt;li&gt;You care about response time for standard models&lt;/li&gt;
&lt;li&gt;You want serverless-style usage and production reliability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Setup time&lt;/th&gt;
&lt;th&gt;Custom models&lt;/th&gt;
&lt;th&gt;Pre-deployed catalog&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Baseten&lt;/td&gt;
&lt;td&gt;Hours to days&lt;/td&gt;
&lt;td&gt;Yes, with Truss&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Enterprise contract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;Pay-per-use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicate&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Yes, with Cog&lt;/td&gt;
&lt;td&gt;1,000+&lt;/td&gt;
&lt;td&gt;Per-second compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://fal.ai/?ref=apidog.com" rel="noopener noreferrer"&gt;Fal.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Minutes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;Per-output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Testing a Baseten alternative with Apidog
&lt;/h2&gt;

&lt;p&gt;Baseten requires deploying your model before you can test inference. Hosted alternatives let you test an API request immediately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ll90njq7wi7rypejx5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ll90njq7wi7rypejx5m.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is an example WaveSpeed request you can test in Apidog.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A product photo of a white ceramic coffee mug, studio lighting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"image_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"square_hd"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 1: Create an Apidog environment
&lt;/h3&gt;

&lt;p&gt;Create an environment with this variable:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;WAVESPEED_API_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Secret&lt;/td&gt;
&lt;td&gt;Your WaveSpeed API key&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use a secret variable so the key is not exposed in shared collections.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Create the request
&lt;/h3&gt;

&lt;p&gt;In Apidog:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a new &lt;code&gt;POST&lt;/code&gt; request.&lt;/li&gt;
&lt;li&gt;Set the URL:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add the authorization header:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Authorization: Bearer {{WAVESPEED_API_KEY}}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Set the content type:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Add the JSON body:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A product photo of a white ceramic coffee mug, studio lighting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"image_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"square_hd"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Add assertions
&lt;/h3&gt;

&lt;p&gt;Add assertions to make the test repeatable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Status code is 200
Response body &amp;gt; outputs &amp;gt; 0 &amp;gt; url exists
Response time &amp;lt; 30000ms
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These checks help validate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The API key is configured correctly&lt;/li&gt;
&lt;li&gt;The model returns an output URL&lt;/li&gt;
&lt;li&gt;The request completes within your latency budget&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Compare results
&lt;/h3&gt;

&lt;p&gt;Run the same production-style prompts across providers and compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output quality&lt;/li&gt;
&lt;li&gt;Response time&lt;/li&gt;
&lt;li&gt;Error rate&lt;/li&gt;
&lt;li&gt;Pricing per successful output&lt;/li&gt;
&lt;li&gt;Response format complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a hosted API, you can usually run your first request within minutes. With Baseten, you first need to package and deploy the model before sending inference traffic.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Baseten is still the right choice
&lt;/h2&gt;

&lt;p&gt;Baseten is still the right tool when you need infrastructure-level control.&lt;/p&gt;

&lt;p&gt;Choose Baseten if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have custom-trained models that are not available on hosted platforms&lt;/li&gt;
&lt;li&gt;Your organization requires on-premises or VPC deployment for compliance&lt;/li&gt;
&lt;li&gt;You need fine-grained control over GPU type, replica count, and autoscaling&lt;/li&gt;
&lt;li&gt;Your team has dedicated MLOps capacity&lt;/li&gt;
&lt;li&gt;You want to own more of the model deployment lifecycle&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For standard model inference, hosted APIs are usually faster to integrate and require less maintenance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration checklist: Baseten to hosted inference API
&lt;/h2&gt;

&lt;p&gt;If you are evaluating a move away from Baseten, use this checklist.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Identify your current models
&lt;/h3&gt;

&lt;p&gt;List each model currently served through Baseten:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model name
Model type
Input format
Output format
Average latency
Monthly request volume
Current cost
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Find hosted equivalents
&lt;/h3&gt;

&lt;p&gt;Check whether each model has an equivalent on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WaveSpeed&lt;/li&gt;
&lt;li&gt;Replicate&lt;/li&gt;
&lt;li&gt;Fal.ai&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For each option, compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model quality&lt;/li&gt;
&lt;li&gt;Supported parameters&lt;/li&gt;
&lt;li&gt;Input and output formats&lt;/li&gt;
&lt;li&gt;Rate limits&lt;/li&gt;
&lt;li&gt;Pricing model&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Update API integration code
&lt;/h3&gt;

&lt;p&gt;A Baseten-style integration may point to your deployed model endpoint. A hosted API integration usually changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL&lt;/li&gt;
&lt;li&gt;Authentication header&lt;/li&gt;
&lt;li&gt;Request body schema&lt;/li&gt;
&lt;li&gt;Response parsing logic&lt;/li&gt;
&lt;li&gt;Error handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://api.wavespeed.ai/api/v2/bytedance/seedream-4-5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Authorization&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;WAVESPEED_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;A product photo of a white ceramic coffee mug, studio lighting&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;image_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;square_hd&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Validate with test cases
&lt;/h3&gt;

&lt;p&gt;Before switching traffic, test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Common prompts&lt;/li&gt;
&lt;li&gt;Edge-case prompts&lt;/li&gt;
&lt;li&gt;Large inputs&lt;/li&gt;
&lt;li&gt;Invalid inputs&lt;/li&gt;
&lt;li&gt;Timeout behavior&lt;/li&gt;
&lt;li&gt;Provider errors&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Roll out gradually
&lt;/h3&gt;

&lt;p&gt;Use a staged migration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Local testing
    ↓
Internal staging
    ↓
Small production percentage
    ↓
Full migration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep your Baseten deployment available until the hosted API path is stable.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Can I deploy fine-tuned versions of popular models on Baseten?
&lt;/h3&gt;

&lt;p&gt;Yes. Baseten’s Truss framework supports fine-tuned model weights. Replicate also supports custom model packaging through Cog.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the migration path from Baseten to a hosted API?
&lt;/h3&gt;

&lt;p&gt;Identify the models you are serving, find equivalents on WaveSpeed, Replicate, or Fal.ai, then update your API endpoints, authentication, request bodies, and response parsing code. Response formats differ between platforms, so test each integration before switching production traffic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Baseten cheaper than hosted APIs at high volume?
&lt;/h3&gt;

&lt;p&gt;For consistently high and predictable workloads, Baseten’s enterprise contract may be cost-competitive. For variable workloads, pay-per-use hosted APIs are often cheaper because you are not committing to fixed infrastructure capacity.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I test a Baseten alternative before committing?
&lt;/h3&gt;

&lt;p&gt;Use Apidog to create an environment with the provider’s API key, run your production prompts, and compare output quality, response time, and response structure against your current Baseten baseline.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best NightCafe alternatives in 2026: API access, enterprise features, lower costs</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Sun, 31 May 2026 13:01:38 +0000</pubDate>
      <link>https://dev.to/preecha/best-nightcafe-alternatives-in-2026-api-access-enterprise-features-lower-costs-56p3</link>
      <guid>https://dev.to/preecha/best-nightcafe-alternatives-in-2026-api-access-enterprise-features-lower-costs-56p3</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;NightCafe is a community-focused AI art platform for hobbyists and artists. For professional or developer workflows, the main blockers are no production API, web-only generation, credit-based pricing that gets hard to model at scale, and no programmatic access. Practical alternatives include WaveSpeed for API-first image generation, Replicate for developer access to community/open-source models, and GPT Image 1.5 for high-quality image generation via API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;NightCafe built a large community around social AI art features: sharing creations, joining challenges, browsing other users’ work, and experimenting with different styles. If you generate images occasionally and care about community discovery, it works well.&lt;/p&gt;

&lt;p&gt;If you need to integrate image generation into an app, backend job, content pipeline, or internal tool, NightCafe becomes difficult to use. There is no production API, generation happens through the web UI, and the credit system makes cost forecasting harder as usage grows.&lt;/p&gt;

&lt;p&gt;This guide focuses on what to use instead when you need developer-friendly image generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What NightCafe does well
&lt;/h2&gt;

&lt;p&gt;NightCafe is useful when your workflow is interactive and community-driven:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Community platform&lt;/strong&gt;: Share, browse, and get inspiration from other AI art users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple style models&lt;/strong&gt;: Access Stable Diffusion, DALL-E, and other backends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Art challenges&lt;/strong&gt;: Participate in recurring themed competitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple interface&lt;/strong&gt;: Generate images without writing code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Free credits&lt;/strong&gt;: Daily credits make casual experimentation accessible&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where NightCafe falls short for developers
&lt;/h2&gt;

&lt;p&gt;NightCafe is not designed as an API-first image generation platform.&lt;/p&gt;

&lt;p&gt;Key limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No production API&lt;/strong&gt;: No official way to integrate generation into applications or backend services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web-only workflow&lt;/strong&gt;: High-volume generation requires manual interaction in the browser&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credit complexity&lt;/strong&gt;: Pricing depends on credits and tiers, which can be harder to model than per-request pricing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expensive at scale&lt;/strong&gt;: Monthly plans from &lt;code&gt;$10&lt;/code&gt; to &lt;code&gt;$48&lt;/code&gt; include limited credits, and top-ups add up&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No batch processing API&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Less explicit commercial clarity&lt;/strong&gt;: Business usage terms are not as clear as dedicated API providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your use case includes automation, batch jobs, product features, or predictable cost modeling, you’ll likely want an API-based alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top NightCafe alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. WaveSpeed
&lt;/h3&gt;

&lt;p&gt;WaveSpeed is the most complete professional upgrade if you need image generation through an API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production apps&lt;/li&gt;
&lt;li&gt;Backend automation&lt;/li&gt;
&lt;li&gt;Batch image generation&lt;/li&gt;
&lt;li&gt;Cost-controlled image pipelines&lt;/li&gt;
&lt;li&gt;Teams that need enterprise features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Full REST API with SDKs in Python, Node.js, and Go&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ production-ready models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: &lt;code&gt;$0.03&lt;/code&gt; to &lt;code&gt;$0.30&lt;/code&gt; per image depending on model tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Billing&lt;/strong&gt;: Pay-per-use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SLA&lt;/strong&gt;: 99.9% uptime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise&lt;/strong&gt;: SOC 2 compliance, RBAC, and audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compared with NightCafe’s credit model, WaveSpeed is easier to automate and can provide 40–70% cost savings at scale. It also provides clearer commercial licensing.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. GPT Image 1.5
&lt;/h3&gt;

&lt;p&gt;GPT Image 1.5 is the best fit when output quality matters more than model variety.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-quality generated images&lt;/li&gt;
&lt;li&gt;Simple API integration&lt;/li&gt;
&lt;li&gt;Teams already using OpenAI APIs&lt;/li&gt;
&lt;li&gt;Workflows where documentation and reliability are priorities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quality&lt;/strong&gt;: LM Arena Elo 1,264, highest rated in 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Standard OpenAI REST API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Price&lt;/strong&gt;: &lt;code&gt;$0.04&lt;/code&gt; to &lt;code&gt;$0.08&lt;/code&gt; per image&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt;: Best-in-class&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want a straightforward API with strong documentation and high-quality image output, GPT Image 1.5 is the benchmark.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Replicate
&lt;/h3&gt;

&lt;p&gt;Replicate is useful when you want developer access to community and open-source models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stable Diffusion variants&lt;/li&gt;
&lt;li&gt;Open-source model experimentation&lt;/li&gt;
&lt;li&gt;Model-specific workflows&lt;/li&gt;
&lt;li&gt;Developers who want many model choices behind one API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 1,000+ community models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Full REST API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique advantage&lt;/strong&gt;: Access to many of the same open-source model families used by platforms like NightCafe&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Per-second compute&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you use NightCafe for specific Stable Diffusion styles or model types, Replicate may give you similar base model access through a developer-friendly API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Models&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Enterprise&lt;/th&gt;
&lt;th&gt;Commercial&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NightCafe&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~10 popular&lt;/td&gt;
&lt;td&gt;Credits, &lt;code&gt;$10&lt;/code&gt;–&lt;code&gt;$48/mo&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Unclear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;Full REST&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$0.03&lt;/code&gt;–&lt;code&gt;$0.30/image&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Yes, SOC 2&lt;/td&gt;
&lt;td&gt;Clear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT Image 1.5&lt;/td&gt;
&lt;td&gt;Full REST&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;$0.04&lt;/code&gt;–&lt;code&gt;$0.08/image&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Clear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Replicate&lt;/td&gt;
&lt;td&gt;Full REST&lt;/td&gt;
&lt;td&gt;1,000+&lt;/td&gt;
&lt;td&gt;Per-second compute&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Testing alternatives with Apidog
&lt;/h2&gt;

&lt;p&gt;NightCafe has no API to test in Apidog. With API-first alternatives, you can send your first request in under 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n7p2vwfl0j23w1778e4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4n7p2vwfl0j23w1778e4.png" alt="Image" width="799" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A practical way to compare providers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create one environment per provider in Apidog.&lt;/li&gt;
&lt;li&gt;Store each API key as a Secret variable.&lt;/li&gt;
&lt;li&gt;Use the same prompt across providers.&lt;/li&gt;
&lt;li&gt;Compare response format, latency, output quality, and cost.&lt;/li&gt;
&lt;li&gt;Keep the request examples as reusable team documentation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  WaveSpeed test request
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/black-forest-labs/flux-2-pro
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A surrealist oil painting of a lighthouse on a floating island, dramatic clouds"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"image_size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"portrait_4_3"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  GPT Image 1.5 test request
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.openai.com/v1/images/generations
Authorization: Bearer {{OPENAI_API_KEY}}
Content-Type: application/json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gpt-image-1.5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A surrealist oil painting of a lighthouse on a floating island, dramatic clouds"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1024x1536"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Apidog, define provider-specific variables such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WAVESPEED_API_KEY = Secret
OPENAI_API_KEY = Secret
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reuse the same prompt across requests. This gives you a consistent baseline for comparing quality, pricing, and integration complexity.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost comparison at scale
&lt;/h2&gt;

&lt;p&gt;NightCafe’s credit system makes cost modeling more complex than per-image or usage-based APIs.&lt;/p&gt;

&lt;p&gt;For equivalent professional usage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monthly volume&lt;/th&gt;
&lt;th&gt;NightCafe&lt;/th&gt;
&lt;th&gt;WaveSpeed estimate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100 images/month&lt;/td&gt;
&lt;td&gt;Hobbyist plan, &lt;code&gt;$4.79/mo&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;~&lt;code&gt;$3&lt;/code&gt;–&lt;code&gt;$10&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500 images/month&lt;/td&gt;
&lt;td&gt;Pro plan, &lt;code&gt;$9.99/mo&lt;/code&gt;, limited credits&lt;/td&gt;
&lt;td&gt;~&lt;code&gt;$15&lt;/code&gt;–&lt;code&gt;$50&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5,000 images/month&lt;/td&gt;
&lt;td&gt;Multiple accounts or top-ups may be required&lt;/td&gt;
&lt;td&gt;~&lt;code&gt;$150&lt;/code&gt;–&lt;code&gt;$500&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At higher volumes, NightCafe’s credit pricing is consistently harder to forecast and typically more expensive than pay-per-use APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration checklist
&lt;/h2&gt;

&lt;p&gt;Use this checklist if you’re moving from NightCafe to an API-based provider:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Define your use case&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive generation&lt;/li&gt;
&lt;li&gt;Batch jobs&lt;/li&gt;
&lt;li&gt;Product feature&lt;/li&gt;
&lt;li&gt;Internal automation&lt;/li&gt;
&lt;li&gt;Content pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Pick a provider&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;WaveSpeed&lt;/strong&gt; for production API workflows and model variety.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;GPT Image 1.5&lt;/strong&gt; for highest-quality outputs through a simple API.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Replicate&lt;/strong&gt; for open-source and community model access.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create test requests&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the same prompt across providers.&lt;/li&gt;
&lt;li&gt;Store API keys as Secret variables.&lt;/li&gt;
&lt;li&gt;Save working requests in your API workspace.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Compare outputs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image quality&lt;/li&gt;
&lt;li&gt;Prompt adherence&lt;/li&gt;
&lt;li&gt;Latency&lt;/li&gt;
&lt;li&gt;Response format&lt;/li&gt;
&lt;li&gt;Cost per image&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Check commercial terms&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confirm licensing before using generated images in customer-facing or commercial contexts.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Download existing NightCafe assets&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Export images you want to keep before leaving the platform.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does switching mean losing NightCafe’s community and inspiration features?
&lt;/h3&gt;

&lt;p&gt;Yes. NightCafe’s community features do not exist on API-first platforms. If community engagement is important to your workflow, that is a real tradeoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I generate the same art styles as NightCafe on alternatives?
&lt;/h3&gt;

&lt;p&gt;The same underlying model families, including Stable Diffusion variants, are available on platforms like Replicate and WaveSpeed. NightCafe-specific style presets will not transfer directly, but many base models are available elsewhere.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happens to images I already generated on NightCafe?
&lt;/h3&gt;

&lt;p&gt;Download them before you leave. NightCafe does not guarantee image hosting indefinitely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is there any API access to NightCafe?
&lt;/h3&gt;

&lt;p&gt;No official production API exists as of April 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;NightCafe is strong for casual AI art creation and community discovery. It is not a good fit for developer workflows that require API access, automation, batch processing, or predictable pricing.&lt;/p&gt;

&lt;p&gt;If you need production image generation, start by testing WaveSpeed, GPT Image 1.5, and Replicate with the same prompt set. Use an API client like Apidog to store requests, manage API keys securely, and compare providers before committing to one.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Best Clipdrop alternatives in 2026: more models, video, full API access</title>
      <dc:creator>Preecha</dc:creator>
      <pubDate>Sun, 31 May 2026 01:01:40 +0000</pubDate>
      <link>https://dev.to/preecha/best-clipdrop-alternatives-in-2026-more-models-video-full-api-access-22h</link>
      <guid>https://dev.to/preecha/best-clipdrop-alternatives-in-2026-more-models-video-full-api-access-22h</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Clipdrop by Stability AI is useful for image editing workflows like background removal, upscaling, cleanup, style transfer, and image variation. For developers, the main constraints are limited API coverage, a small Stability AI-based tool catalog, no video generation, and subscription pricing. If you need broader model access or production API workflows, consider WaveSpeed, Stability AI API, Adobe Firefly API, or Remove.bg depending on your use case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apidog.com/?utm_source=dev.to&amp;amp;utm_medium=wanda&amp;amp;utm_content=n8n-post-automation" class="crayons-btn crayons-btn--primary"&gt;Try Apidog today&lt;/a&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Clipdrop built a polished product around practical image editing tasks: remove a background, upscale a photo, clean up an image, or generate variations. The web UI is straightforward, and the tools work well for those focused tasks.&lt;/p&gt;

&lt;p&gt;For developers, the tradeoffs are mostly architectural:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The API exposes only part of what the web UI supports.&lt;/li&gt;
&lt;li&gt;The model catalog is narrow: around 10 Stability AI-based tools.&lt;/li&gt;
&lt;li&gt;There is no video generation.&lt;/li&gt;
&lt;li&gt;Subscription pricing can be less flexible for variable workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are building an automated image pipeline, you should evaluate Clipdrop against alternatives based on API coverage, model availability, input/output formats, pricing model, and batch-processing needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Clipdrop does
&lt;/h2&gt;

&lt;p&gt;Clipdrop focuses on common image-editing workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Background removal&lt;/strong&gt;: Create clean cutouts from product and portrait photos.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image upscaling&lt;/strong&gt;: Enhance image resolution up to 2x.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image cleanup&lt;/strong&gt;: Remove objects and perform inpainting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Style transfer&lt;/strong&gt;: Apply visual styles to existing images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reimagine&lt;/strong&gt;: Generate variations of an existing image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These features are useful for ecommerce, social content, design workflows, and basic image automation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Clipdrop falls short for developers
&lt;/h2&gt;

&lt;p&gt;Clipdrop is easiest to use through its web interface. If you need API-first workflows, there are several limitations to account for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Narrow catalog&lt;/strong&gt;: Around 10 Stability AI-based tools compared with broader developer platforms.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited API access&lt;/strong&gt;: The full feature set is not exposed through the API.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No video generation&lt;/strong&gt;: Clipdrop is image-focused.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subscription pricing&lt;/strong&gt;: Monthly plans may not match variable usage patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single model ecosystem&lt;/strong&gt;: Tools are tied to Stability AI models, without access to alternatives like Flux, Seedream, or other model families.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top Clipdrop alternatives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. WaveSpeed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who need one API for image editing, generation, upscaling, and video.&lt;/p&gt;

&lt;p&gt;WaveSpeed offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt;: 600+ across image, video, editing, and upscaling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: Full REST API for available capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editing tools&lt;/strong&gt;: Inpainting, upscaling, style transfer, background removal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Video&lt;/strong&gt;: Supported through models such as Kling, Hailuo, and Seedance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pricing&lt;/strong&gt;: Pay-per-use credits instead of a fixed subscription.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;WaveSpeed covers Clipdrop-style editing workflows and adds access to a much broader model catalog. If your application needs programmatic access to image and video generation, WaveSpeed is a stronger API-first option.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Stability AI API
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Developers who want direct access to Stability AI models with more control.&lt;/p&gt;

&lt;p&gt;The Stability AI API gives you access to the underlying Stability AI ecosystem without Clipdrop’s simplified interface layer.&lt;/p&gt;

&lt;p&gt;Key differences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uses Stability AI models directly.&lt;/li&gt;
&lt;li&gt;Offers more API control than Clipdrop.&lt;/li&gt;
&lt;li&gt;Uses usage-based pricing through the Stability AI platform.&lt;/li&gt;
&lt;li&gt;Better fit when you specifically want Stability AI capabilities but do not need Clipdrop’s web UI.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Adobe Firefly API
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Teams already using Adobe Creative Cloud.&lt;/p&gt;

&lt;p&gt;Adobe Firefly API is designed for creative workflows where Photoshop, Illustrator, Express, or brand-safe production pipelines are already part of the stack.&lt;/p&gt;

&lt;p&gt;It offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creative Cloud integration.&lt;/li&gt;
&lt;li&gt;Background removal.&lt;/li&gt;
&lt;li&gt;Generative fill.&lt;/li&gt;
&lt;li&gt;Style matching.&lt;/li&gt;
&lt;li&gt;Commercial licensing clarity based on Adobe’s licensed-content training approach.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Firefly is especially relevant for agencies, brand teams, and enterprise design workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;a href="http://remove.bg/?ref=apidog.com" rel="noopener noreferrer"&gt;Remove.bg&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Best for:&lt;/strong&gt; Dedicated background removal.&lt;/p&gt;

&lt;p&gt;Remove.bg focuses on one task: removing image backgrounds.&lt;/p&gt;

&lt;p&gt;It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A simple background-removal API.&lt;/li&gt;
&lt;li&gt;Per-image credit pricing.&lt;/li&gt;
&lt;li&gt;Strong quality for difficult edges such as hair, fur, and transparent objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If background removal is your only requirement, a specialized service like Remove.bg may outperform general-purpose image editing APIs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Model count&lt;/th&gt;
&lt;th&gt;Video&lt;/th&gt;
&lt;th&gt;Background removal&lt;/th&gt;
&lt;th&gt;API completeness&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Clipdrop&lt;/td&gt;
&lt;td&gt;~10&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;td&gt;600+&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Full REST&lt;/td&gt;
&lt;td&gt;Pay-per-use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability AI API&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;More than Clipdrop&lt;/td&gt;
&lt;td&gt;Usage-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adobe Firefly API&lt;/td&gt;
&lt;td&gt;Multiple&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;Creative Cloud subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="http://remove.bg/?ref=apidog.com" rel="noopener noreferrer"&gt;Remove.bg&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes, specialized&lt;/td&gt;
&lt;td&gt;Full&lt;/td&gt;
&lt;td&gt;Per-image&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Test the APIs with Apidog
&lt;/h2&gt;

&lt;p&gt;Before migrating from Clipdrop, test the exact API calls your app needs: input format, authentication, response body, output format, and error handling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Clipdrop background removal request
&lt;/h3&gt;

&lt;p&gt;Clipdrop uses multipart form data for image upload.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://clipdrop-api.co/remove-background/v1
x-api-key: {{CLIPDROP_API_KEY}}
Content-Type: multipart/form-data

image_file: your-product-photo.jpg
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  WaveSpeed background removal request
&lt;/h3&gt;

&lt;p&gt;WaveSpeed accepts image URLs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST https://api.wavespeed.ai/api/v2/removal/background
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "image_url": "https://example.com/product-photo.jpg"
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key implementation difference is input handling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clipdrop expects a multipart file upload.&lt;/li&gt;
&lt;li&gt;WaveSpeed accepts a hosted image URL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your current app uploads local files directly to Clipdrop, you may need to add an upload step before calling an API that expects an image URL.&lt;/p&gt;

&lt;p&gt;Example migration flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User uploads image
        ↓
Store image in object storage
        ↓
Generate public or signed image URL
        ↓
Send image_url to the target API
        ↓
Parse output URL or image response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Add API assertions in Apidog
&lt;/h2&gt;

&lt;p&gt;When testing Clipdrop and alternatives, define basic assertions before integrating them into production.&lt;/p&gt;

&lt;p&gt;Recommended checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Status code is 200
Response body contains output_url or equivalent output field
Response time is within acceptable limits
Content type matches expected format
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For APIs that return binary image data, validate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Status code is 200
Content-Type is image/png or expected image type
Response body is not empty
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For APIs that return JSON with an output URL, validate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://example.com/result.png"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your actual response field may differ by provider, so confirm it in Apidog before updating application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Migration checklist from Clipdrop
&lt;/h2&gt;

&lt;p&gt;Use this checklist when replacing Clipdrop in an existing workflow.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Map each Clipdrop feature to an alternative
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Clipdrop feature&lt;/th&gt;
&lt;th&gt;Possible replacement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Background removal&lt;/td&gt;
&lt;td&gt;WaveSpeed or Remove.bg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Upscaling&lt;/td&gt;
&lt;td&gt;WaveSpeed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cleanup / object removal&lt;/td&gt;
&lt;td&gt;WaveSpeed inpainting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reimagine&lt;/td&gt;
&lt;td&gt;WaveSpeed generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Stability AI-specific workflows&lt;/td&gt;
&lt;td&gt;Stability AI API&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Update image input handling
&lt;/h3&gt;

&lt;p&gt;Clipdrop commonly uses multipart uploads.&lt;/p&gt;

&lt;p&gt;Some alternatives use image URLs instead.&lt;/p&gt;

&lt;p&gt;If the target API requires URLs, add storage before the API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Local file → object storage → image URL → API request
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Update response parsing
&lt;/h3&gt;

&lt;p&gt;Do not assume every provider returns the same response format.&lt;/p&gt;

&lt;p&gt;Check whether the API returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Binary image data.&lt;/li&gt;
&lt;li&gt;A JSON response with an image URL.&lt;/li&gt;
&lt;li&gt;A task ID for async polling.&lt;/li&gt;
&lt;li&gt;Metadata plus output links.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Verify output format
&lt;/h3&gt;

&lt;p&gt;Clipdrop typically returns PNG output. Confirm the default output format from the replacement provider before shipping.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;File type.&lt;/li&gt;
&lt;li&gt;Transparency support.&lt;/li&gt;
&lt;li&gt;Image dimensions.&lt;/li&gt;
&lt;li&gt;Compression behavior.&lt;/li&gt;
&lt;li&gt;Whether the result is returned directly or by URL.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Test edge cases
&lt;/h3&gt;

&lt;p&gt;Use your hardest production images, not only clean samples.&lt;/p&gt;

&lt;p&gt;For background removal, test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hair.&lt;/li&gt;
&lt;li&gt;Fur.&lt;/li&gt;
&lt;li&gt;Transparent objects.&lt;/li&gt;
&lt;li&gt;Low-contrast subjects.&lt;/li&gt;
&lt;li&gt;Busy backgrounds.&lt;/li&gt;
&lt;li&gt;Product photos with shadows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For upscaling, test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Text-heavy images.&lt;/li&gt;
&lt;li&gt;Faces.&lt;/li&gt;
&lt;li&gt;Product details.&lt;/li&gt;
&lt;li&gt;Low-resolution source files.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is there a direct API replacement for Clipdrop background removal?
&lt;/h3&gt;

&lt;p&gt;Remove.bg is the specialist option for background removal and often performs well on complex edges. WaveSpeed is a broader option if you also need inpainting, upscaling, generation, or video from one API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use WaveSpeed for batch background removal?
&lt;/h3&gt;

&lt;p&gt;Yes. WaveSpeed supports batch processing through parallel API requests. Clipdrop does not support batch processing through its standard API.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the cost difference?
&lt;/h3&gt;

&lt;p&gt;Clipdrop’s Pro plan starts at $9.99/month for a fixed credit pack. WaveSpeed uses pay-per-use pricing, with editing operations starting around $0.02–$0.05.&lt;/p&gt;

&lt;p&gt;The better option depends on your monthly usage. Estimate cost with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;monthly cost = number of images × average cost per operation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then compare that with Clipdrop’s subscription cost and credit limits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does switching affect output quality?
&lt;/h3&gt;

&lt;p&gt;Yes, it can. Different providers and models handle images differently.&lt;/p&gt;

&lt;p&gt;Before migrating, test your real production images in Apidog and compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Subject edge quality.&lt;/li&gt;
&lt;li&gt;Background cleanup.&lt;/li&gt;
&lt;li&gt;Transparency.&lt;/li&gt;
&lt;li&gt;Color preservation.&lt;/li&gt;
&lt;li&gt;Artifacts.&lt;/li&gt;
&lt;li&gt;Output size and format.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For clean product photos, WaveSpeed and Remove.bg can both work well. For complex cases like hair, fur, or transparent objects, test the exact images your app needs to process.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
