<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jenny Met</title>
    <description>The latest articles on DEV Community by Jenny Met (@xujfcn).</description>
    <link>https://dev.to/xujfcn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3789823%2F2c9e4c1c-52be-4e47-b75c-c97051adb99c.png</url>
      <title>DEV Community: Jenny Met</title>
      <link>https://dev.to/xujfcn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xujfcn"/>
    <language>en</language>
    <item>
      <title>text-embedding-3-small Dimensions Explained: 1536 vs 1024 vs 512</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Sat, 06 Jun 2026 14:13:22 +0000</pubDate>
      <link>https://dev.to/xujfcn/text-embedding-3-small-dimensions-explained-1536-vs-1024-vs-512-128a</link>
      <guid>https://dev.to/xujfcn/text-embedding-3-small-dimensions-explained-1536-vs-1024-vs-512-128a</guid>
      <description>&lt;h1&gt;
  
  
  text-embedding-3-small Dimensions Explained: 1536 vs 1024 vs 512
&lt;/h1&gt;

&lt;p&gt;If you use &lt;code&gt;text-embedding-3-small&lt;/code&gt;, one small setting can quietly affect your whole retrieval system: embedding dimensions.&lt;/p&gt;

&lt;p&gt;The default vector length is &lt;strong&gt;1536 dimensions&lt;/strong&gt;. That is a good default. But it is not always the cheapest or fastest choice once you store millions of chunks in a vector database.&lt;/p&gt;

&lt;p&gt;This guide explains what &lt;code&gt;text-embedding-3-small dimensions&lt;/code&gt; means, when to keep 1536, when to test smaller vectors, and how to call an OpenAI-compatible embeddings endpoint with real code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs4raqj9yd8qk8tch7fh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvs4raqj9yd8qk8tch7fh.png" alt="text-embedding-3-small dimensions visual guide" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What are text-embedding-3-small dimensions?
&lt;/h2&gt;

&lt;p&gt;An embedding turns text into a list of numbers. That list is a vector.&lt;/p&gt;

&lt;p&gt;For &lt;code&gt;text-embedding-3-small&lt;/code&gt;, the default vector has &lt;strong&gt;1536 numbers&lt;/strong&gt;. If you embed the sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“API gateways help developers route model calls.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model returns one vector that represents the meaning of that whole input. The vector is not one number per word. It is one semantic representation for the input text you send.&lt;/p&gt;

&lt;p&gt;You then store that vector in a vector database such as pgvector, Pinecone, Milvus, Weaviate, Chroma, or Qdrant. When a user searches, you embed the query and compare it against stored vectors.&lt;/p&gt;

&lt;p&gt;Official OpenAI documentation states that &lt;code&gt;text-embedding-3-small&lt;/code&gt; defaults to 1536 dimensions, while &lt;code&gt;text-embedding-3-large&lt;/code&gt; defaults to 3072 dimensions. It also supports a &lt;code&gt;dimensions&lt;/code&gt; parameter that can reduce the output vector length.&lt;/p&gt;

&lt;p&gt;External references:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/guides/embeddings?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;OpenAI embeddings guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/api/docs/models/text-embedding-3-small?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;OpenAI text-embedding-3-small model page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://stackoverflow.com/questions/75733749/openai-embeddings-api-how-to-change-the-embedding-output-dimension?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;Stack Overflow discussion on embedding dimensions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Default text-embedding-3-small dimensions: why 1536 is common
&lt;/h2&gt;

&lt;p&gt;1536 dimensions is popular because it is the default. It is also a practical balance between quality and cost for many semantic search and RAG workloads.&lt;/p&gt;

&lt;p&gt;Use the default 1536 dimensions when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are building your first retrieval system.&lt;/li&gt;
&lt;li&gt;You do not have evaluation data yet.&lt;/li&gt;
&lt;li&gt;Your dataset is small enough that vector storage is not painful.&lt;/li&gt;
&lt;li&gt;Search quality matters more than a few gigabytes of storage.&lt;/li&gt;
&lt;li&gt;You want fewer moving parts during the first launch.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last point matters. If your app is still early, the biggest risk is usually not vector size. It is bad chunking, weak retrieval evaluation, missing metadata filters, or poor prompts.&lt;/p&gt;

&lt;p&gt;Start simple. Then optimize.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dimensions parameter: what changes and what does not
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;dimensions&lt;/code&gt; parameter lets you request a shorter embedding vector.&lt;/p&gt;

&lt;p&gt;For example, instead of asking for the default 1536-dimensional vector, you can request 1024, 768, or 512 dimensions if your provider supports it for that model.&lt;/p&gt;

&lt;p&gt;What changes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;1536 dimensions&lt;/th&gt;
&lt;th&gt;1024 / 768 / 512 dimensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Vector storage&lt;/td&gt;
&lt;td&gt;Larger&lt;/td&gt;
&lt;td&gt;Smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Index memory&lt;/td&gt;
&lt;td&gt;Larger&lt;/td&gt;
&lt;td&gt;Smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search latency&lt;/td&gt;
&lt;td&gt;Often higher&lt;/td&gt;
&lt;td&gt;Often lower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieval quality&lt;/td&gt;
&lt;td&gt;Strong baseline&lt;/td&gt;
&lt;td&gt;Must be tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API input token cost&lt;/td&gt;
&lt;td&gt;Usually unchanged&lt;/td&gt;
&lt;td&gt;Usually unchanged&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;What does &lt;strong&gt;not&lt;/strong&gt; usually change: the number of input tokens you send. Embedding API pricing is normally based on input tokens, not the final vector size.&lt;/p&gt;

&lt;p&gt;That means smaller dimensions mainly help with storage, index memory, and retrieval speed. They are not a magic way to reduce the embedding generation bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Storage math: 1536 vs 1024 vs 512 dimensions
&lt;/h2&gt;

&lt;p&gt;A float32 number uses 4 bytes. So the raw vector size is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vector_size_bytes = dimensions × 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For one vector:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Bytes per vector&lt;/th&gt;
&lt;th&gt;Storage vs 1536&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;6,144 bytes&lt;/td&gt;
&lt;td&gt;Baseline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;4,096 bytes&lt;/td&gt;
&lt;td&gt;~33% smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;3,072 bytes&lt;/td&gt;
&lt;td&gt;~50% smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;2,048 bytes&lt;/td&gt;
&lt;td&gt;~67% smaller&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For 1 million chunks, raw float32 vector storage looks like this:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Raw vector storage&lt;/th&gt;
&lt;th&gt;With rough 35% index overhead&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;~5.72 GiB&lt;/td&gt;
&lt;td&gt;~7.72 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;~3.81 GiB&lt;/td&gt;
&lt;td&gt;~5.15 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;~2.86 GiB&lt;/td&gt;
&lt;td&gt;~3.86 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;~1.91 GiB&lt;/td&gt;
&lt;td&gt;~2.57 GiB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is why dimensions start to matter at scale. A small difference per vector becomes real infrastructure cost when you store millions of chunks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick calculator for embedding dimensions
&lt;/h2&gt;

&lt;p&gt;Here is a small Python tool you can use to estimate storage and rough generation cost.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;#!/usr/bin/env python3
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;gib&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--avg-tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--dimensions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--price-per-million&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;avg_tokens&lt;/span&gt;
    &lt;span class="n"&gt;estimated_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;total_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price_per_million&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Documents: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Estimated input tokens: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Embedding generation cost: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;estimated_cost&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dims  Raw GiB  With 35% index overhead&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;raw_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dim&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dim&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;gib&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;7.2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;gib&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_bytes&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;1.35&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mf"&gt;24.2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 embedding_dimension_calculator.py &lt;span class="nt"&gt;--documents&lt;/span&gt; 1000000 &lt;span class="nt"&gt;--avg-tokens&lt;/span&gt; 350
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Embedding Dimension Calculator
================================
Documents/chunks: 1,000,000
Average tokens/chunk: 350
Estimated input tokens: 350,000,000
Embedding generation cost @ $0.02/1M tokens: $7.00

Dimension storage comparison
--------------------------------
  Dims  Bytes/vector     Raw GiB  With index GiB  Saved vs max
  1536         6,144        5.72            7.72            0%
  1024         4,096        3.81            5.15           33%
   768         3,072        2.86            3.86           50%
   512         2,048        1.91            2.57           67%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The important lesson: generation cost can stay small, while vector database cost and memory can grow quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  API example: default 1536 dimensions
&lt;/h2&gt;

&lt;p&gt;Here is a standard OpenAI-compatible embeddings call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/embeddings &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "text-embedding-3-small",
    "input": "Explain API gateway routing in one paragraph."
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response includes an embedding array. With the default setting, its length should be 1536.&lt;/p&gt;

&lt;p&gt;You can use the same pattern with any OpenAI-compatible client. With Crazyrouter, you only change the base URL and API key:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL: &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Endpoint: &lt;code&gt;/embeddings&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Auth: &lt;code&gt;Authorization: Bearer YOUR_KEY&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Related internal guides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://crazyrouter.com/blog/openai-compatible-api-base-url-explained?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;OpenAI-compatible base URL explained&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crazyrouter.com/blog/ai-api-pricing-guide-2026?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;AI API pricing guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crazyrouter.com/blog/ai-api-cost-optimization-complete-guide-2026?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;AI API cost optimization guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crazyrouter.com/blog/structured-output-json-mode-ai-api-guide-2026?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;Structured output JSON mode guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://crazyrouter.com/blog/rag-implementation-guide-2026?utm_source=crazyrouter_blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=embedding_dimensions" rel="noopener noreferrer"&gt;RAG implementation guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Python example: check the vector length
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-crazyrouter-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A vector database stores embeddings for semantic search.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# usually 1536 by default
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;   &lt;span class="c1"&gt;# preview the first few values
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not paste real keys into code. Use environment variables in production.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Python example: request custom dimensions
&lt;/h2&gt;

&lt;p&gt;If your embeddings provider supports the &lt;code&gt;dimensions&lt;/code&gt; parameter for &lt;code&gt;text-embedding-3-small&lt;/code&gt;, you can request a shorter vector.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-crazyrouter-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Shorter embeddings can reduce vector database storage.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# expected: 1024 when supported
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important: do not mix dimensions in the same vector index. If your collection was created for 1536-dimensional vectors, a 1024-dimensional vector will usually fail at insert time.&lt;/p&gt;

&lt;p&gt;Use one collection per dimension setting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Node.js example: embeddings with OpenAI-compatible base URL
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Embeddings help search by meaning, not just keywords.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Which dimensions should you choose?
&lt;/h2&gt;

&lt;p&gt;There is no universal best value. Choose based on evaluation, not vibes.&lt;/p&gt;

&lt;p&gt;A practical starting point:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Suggested starting dimensions&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prototype / small app&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;Maximize quality while you learn&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support docs RAG&lt;/td&gt;
&lt;td&gt;1536 or 1024&lt;/td&gt;
&lt;td&gt;Quality matters, but storage can grow&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large FAQ search&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Often a good balance to test&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-volume semantic cache&lt;/td&gt;
&lt;td&gt;768 or 512&lt;/td&gt;
&lt;td&gt;Speed and memory may matter more&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal / medical / financial retrieval&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;Test carefully before reducing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mobile / edge search&lt;/td&gt;
&lt;td&gt;512 or 768&lt;/td&gt;
&lt;td&gt;Smaller vectors are easier to move&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For production, run an evaluation set. Take 50 to 200 real user queries. Label the best matching documents. Compare recall@5 or recall@10 for 1536, 1024, 768, and 512.&lt;/p&gt;

&lt;p&gt;If 1024 gives almost the same recall as 1536, you can reduce storage and memory without hurting users.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes with text-embedding-3-small dimensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: mixing 1536 and 1024 vectors in one index
&lt;/h3&gt;

&lt;p&gt;Vector databases expect a fixed dimension per collection or index. If you change dimensions, create a new index and re-embed the corpus.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: optimizing dimensions before chunking
&lt;/h3&gt;

&lt;p&gt;Bad chunking hurts retrieval more than a larger vector helps it.&lt;/p&gt;

&lt;p&gt;Fix chunking first:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keep chunks focused.&lt;/li&gt;
&lt;li&gt;Add useful metadata.&lt;/li&gt;
&lt;li&gt;Avoid huge mixed-topic chunks.&lt;/li&gt;
&lt;li&gt;Test overlap instead of guessing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Mistake 3: assuming smaller dimensions reduce API cost
&lt;/h3&gt;

&lt;p&gt;Embedding generation cost is usually based on input tokens. Smaller vectors reduce storage and search costs, not necessarily API call cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: choosing 512 without evaluation
&lt;/h3&gt;

&lt;p&gt;512-dimensional vectors can work for some workloads. But they may lose recall on nuanced queries. Test them before moving production search.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: forgetting downstream schema changes
&lt;/h3&gt;

&lt;p&gt;If you use pgvector, your schema may include a fixed dimension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you switch to 1024, you need a different column or table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents_1024&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;bigserial&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="nb"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  A simple evaluation workflow
&lt;/h2&gt;

&lt;p&gt;Use this workflow before changing dimensions in production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pick 100 real user queries.&lt;/li&gt;
&lt;li&gt;Label the correct documents for each query.&lt;/li&gt;
&lt;li&gt;Create separate indexes for 1536, 1024, 768, and 512.&lt;/li&gt;
&lt;li&gt;Run the same queries against each index.&lt;/li&gt;
&lt;li&gt;Compare recall@5, recall@10, latency, and memory.&lt;/li&gt;
&lt;li&gt;Choose the smallest dimension that does not hurt retrieval quality.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is more reliable than reading a benchmark and hoping it matches your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final recommendation
&lt;/h2&gt;

&lt;p&gt;For most teams, the best first move is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with &lt;code&gt;text-embedding-3-small&lt;/code&gt; at 1536 dimensions.&lt;/li&gt;
&lt;li&gt;Build a clean retrieval evaluation set.&lt;/li&gt;
&lt;li&gt;Test 1024 once your corpus grows.&lt;/li&gt;
&lt;li&gt;Try 768 or 512 only when storage, memory, or latency becomes important.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you already use OpenAI-compatible tools, you can test this through Crazyrouter by setting your base URL to &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt; and calling &lt;code&gt;/embeddings&lt;/code&gt; with your normal SDK.&lt;/p&gt;

&lt;p&gt;The goal is not to use the smallest vector. The goal is to use the smallest vector that still retrieves the right answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ: text-embedding-3-small dimensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are the default text-embedding-3-small dimensions?
&lt;/h3&gt;

&lt;p&gt;The default &lt;code&gt;text-embedding-3-small&lt;/code&gt; output is 1536 dimensions. That means each input text returns a vector with 1536 numeric values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I change text-embedding-3-small dimensions?
&lt;/h3&gt;

&lt;p&gt;Yes, when your provider supports the &lt;code&gt;dimensions&lt;/code&gt; parameter, you can request a shorter vector. Common test values are 1024, 768, and 512.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do smaller embedding dimensions reduce API cost?
&lt;/h3&gt;

&lt;p&gt;Usually not directly. Embedding API cost is normally based on input tokens. Smaller dimensions mainly reduce vector storage, index memory, and search latency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is 512 dimensions enough for text-embedding-3-small?
&lt;/h3&gt;

&lt;p&gt;Sometimes. It depends on your dataset and retrieval quality requirements. Use an evaluation set before using 512 dimensions in production.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I store 1536 and 1024 dimension vectors in the same database table?
&lt;/h3&gt;

&lt;p&gt;Usually no. Most vector indexes require a fixed dimension. Create separate collections or tables when testing different dimensions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use text-embedding-3-small or text-embedding-3-large?
&lt;/h3&gt;

&lt;p&gt;Use &lt;code&gt;text-embedding-3-small&lt;/code&gt; for cost-effective general retrieval. Test &lt;code&gt;text-embedding-3-large&lt;/code&gt; when retrieval quality is the main bottleneck and you can afford larger vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the best dimension for RAG?
&lt;/h3&gt;

&lt;p&gt;Start with 1536 for &lt;code&gt;text-embedding-3-small&lt;/code&gt;. Then test 1024 and 768 against real queries. The best dimension is the smallest one that preserves your recall target.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>embeddings</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Stop AI Agents from Guessing API Routes: Crazyrouter llms.txt Endpoint Guide</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Sat, 06 Jun 2026 14:12:53 +0000</pubDate>
      <link>https://dev.to/xujfcn/stop-ai-agents-from-guessing-api-routes-crazyrouter-llmstxt-endpoint-guide-1h0b</link>
      <guid>https://dev.to/xujfcn/stop-ai-agents-from-guessing-api-routes-crazyrouter-llmstxt-endpoint-guide-1h0b</guid>
      <description>&lt;h1&gt;
  
  
  Stop AI Agents from Guessing API Routes: Crazyrouter &lt;code&gt;llms.txt&lt;/code&gt; Endpoint Guide
&lt;/h1&gt;

&lt;p&gt;When an AI agent writes API code, the most expensive bug is often not a typo. It is an endpoint mismatch.&lt;/p&gt;

&lt;p&gt;A model might support Chat Completions, Responses, Claude Messages, Gemini native calls, image generation, video generation, or audio APIs. If the AI guesses the route from memory, the generated code may look reasonable while still being wrong.&lt;/p&gt;

&lt;p&gt;Crazyrouter's &lt;code&gt;llms.txt&lt;/code&gt; helps prevent that.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://docs.crazyrouter.com/llms.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user-facing guide is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.crazyrouter.com/llms-guide?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=llms_txt_docs" rel="noopener noreferrer"&gt;https://docs.crazyrouter.com/llms-guide?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=llms_txt_docs&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The core idea
&lt;/h2&gt;

&lt;p&gt;Before generating code, tell the AI to read the docs entry point and match the model's endpoint type to the correct documentation page.&lt;/p&gt;

&lt;p&gt;Use this prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://docs.crazyrouter.com/llms.txt first.
Use Crazyrouter's official docs to choose the correct endpoint.
Do not invent model names, prices, billing modes, or routes from memory.
Generate code only after matching the model to the correct endpoint type.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common endpoint mappings
&lt;/h2&gt;

&lt;p&gt;Here are the important mappings AI tools should not guess:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Crazyrouter endpoint&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Chat Completions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/chat/completions&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Responses&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/responses&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Messages&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/messages&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini native&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1beta/models/{model}:generateContent&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image generation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/images/generations&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image editing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/images/edits&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI-style video&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;POST /v1/video/generations&lt;/code&gt; or &lt;code&gt;POST /v1/videos&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Unified video&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;POST /v1/video/create&lt;/code&gt; and &lt;code&gt;GET /v1/video/query&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text-to-speech&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/audio/speech&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speech-to-text&lt;/td&gt;
&lt;td&gt;&lt;code&gt;POST /v1/audio/transcriptions&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Base URL details
&lt;/h2&gt;

&lt;p&gt;For OpenAI-compatible SDKs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For raw HTTP Chat Completions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For Anthropic-native clients, use the root domain expected by the client rather than manually appending &lt;code&gt;/v1&lt;/code&gt; in the wrong place.&lt;/p&gt;

&lt;p&gt;This is the kind of detail that &lt;code&gt;llms.txt&lt;/code&gt; is designed to surface before the AI writes code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing and model availability
&lt;/h2&gt;

&lt;p&gt;Do not ask an AI model to rely on memory for model names or prices.&lt;/p&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/pricing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For machine-readable pricing data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/api/pricing?lang=en
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the exact models available to a specific API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://cn.crazyrouter.com/v1/models &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_API_KEY"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;YOUR_API_KEY&lt;/code&gt; in shared prompts. Do not paste real keys into public chats or untrusted tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: better agent instruction
&lt;/h2&gt;

&lt;p&gt;Bad prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Write code to call a Crazyrouter model.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Better prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://docs.crazyrouter.com/llms.txt first.
My task is to call [MODEL_NAME] from Node.js.
Check which endpoint type applies, then write runnable code.
Use YOUR_API_KEY as the placeholder.
If the model or endpoint is uncertain, tell me to verify it on https://crazyrouter.com/pricing or GET /v1/models.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why this matters for agent workflows
&lt;/h2&gt;

&lt;p&gt;AI coding agents are increasingly asked to configure entire projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cursor project settings&lt;/li&gt;
&lt;li&gt;Cline providers&lt;/li&gt;
&lt;li&gt;Claude Code API setup&lt;/li&gt;
&lt;li&gt;Codex CLI Responses API setup&lt;/li&gt;
&lt;li&gt;OpenAI-compatible SDK examples&lt;/li&gt;
&lt;li&gt;video and image generation scripts&lt;/li&gt;
&lt;li&gt;multimodal demos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A small routing error can waste a lot of time. &lt;code&gt;llms.txt&lt;/code&gt; gives the agent a map before it starts editing files.&lt;/p&gt;

&lt;p&gt;Start with the full guide:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.crazyrouter.com/llms-guide?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=llms_txt_docs" rel="noopener noreferrer"&gt;https://docs.crazyrouter.com/llms-guide?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=llms_txt_docs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>agents</category>
      <category>documentation</category>
    </item>
    <item>
      <title>/v1/chat/completions vs /v1/responses vs /v1/messages: Which AI API Endpoint Should You Use?</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Thu, 04 Jun 2026 23:57:09 +0000</pubDate>
      <link>https://dev.to/xujfcn/v1chatcompletions-vs-v1responses-vs-v1messages-which-ai-api-endpoint-should-you-use-528n</link>
      <guid>https://dev.to/xujfcn/v1chatcompletions-vs-v1responses-vs-v1messages-which-ai-api-endpoint-should-you-use-528n</guid>
      <description>&lt;h1&gt;
  
  
  /v1/chat/completions vs /v1/responses vs /v1/messages: Which AI API Endpoint Should You Use?
&lt;/h1&gt;

&lt;p&gt;A common support issue in AI API gateways is not the API key, not the model, and not the SDK. It is the endpoint.&lt;/p&gt;

&lt;p&gt;A user picks a model that exists, but sends the request to the wrong endpoint. The result looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model unavailable;&lt;/li&gt;
&lt;li&gt;unsupported endpoint;&lt;/li&gt;
&lt;li&gt;invalid request body;&lt;/li&gt;
&lt;li&gt;tool calling not working;&lt;/li&gt;
&lt;li&gt;streaming format mismatch;&lt;/li&gt;
&lt;li&gt;Claude model works in one tool but fails in another.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guide explains the difference between three endpoint families:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/v1/chat/completions
/v1/responses
/v1/messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Short version:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Native ecosystem&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Request style&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/v1/chat/completions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI-compatible legacy/current chat&lt;/td&gt;
&lt;td&gt;Most apps, SDKs, LiteLLM, Cursor-style tools, simple chat&lt;/td&gt;
&lt;td&gt;&lt;code&gt;messages: [{role, content}]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/v1/responses&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Newer OpenAI Responses API&lt;/td&gt;
&lt;td&gt;Tool use, multimodal, reasoning items, newer OpenAI-style agents&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;input&lt;/code&gt;, &lt;code&gt;tools&lt;/code&gt;, structured response items&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;/v1/messages&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Anthropic Claude API&lt;/td&gt;
&lt;td&gt;Claude-native SDKs and Claude-style apps&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;messages&lt;/code&gt; plus top-level &lt;code&gt;system&lt;/code&gt;, Anthropic schema&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your client says “OpenAI-compatible”, start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or set base URL to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and let the SDK append &lt;code&gt;/chat/completions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If your tool specifically requires the OpenAI Responses API, use &lt;code&gt;/v1/responses&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If your tool is Anthropic-native and expects Claude’s Messages API, use &lt;code&gt;/v1/messages&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The most common mistake
&lt;/h2&gt;

&lt;p&gt;The most common mistake is mixing the model family and the endpoint family.&lt;/p&gt;

&lt;p&gt;For example, a Claude model can be exposed through an OpenAI-compatible gateway, but that does not automatically mean your request body should use Anthropic’s native &lt;code&gt;/v1/messages&lt;/code&gt; schema.&lt;/p&gt;

&lt;p&gt;Likewise, a tool that sends Anthropic-native requests to &lt;code&gt;/v1/messages&lt;/code&gt; cannot be fixed by only changing the model name. The endpoint and request body must match.&lt;/p&gt;

&lt;p&gt;Think of it this way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model name + endpoint + request schema must match
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one of the three is wrong, the model may look unavailable even when the model itself is healthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;/v1/chat/completions&lt;/code&gt; is
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;/v1/chat/completions&lt;/code&gt; is the classic OpenAI-compatible chat endpoint.&lt;/p&gt;

&lt;p&gt;A typical request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-5.5",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain API endpoints in one paragraph."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;/v1/chat/completions&lt;/code&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the app says it supports an OpenAI-compatible API;&lt;/li&gt;
&lt;li&gt;the config asks for &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt; or &lt;code&gt;base_url&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;the request body uses &lt;code&gt;messages&lt;/code&gt; with &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;content&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;you are using common OpenAI SDK compatibility mode;&lt;/li&gt;
&lt;li&gt;you are configuring tools like Cursor-style clients, LiteLLM-style routers, FastGPT-style apps, or many chat UIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many users, this is the safest default endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;/v1/responses&lt;/code&gt; is
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;/v1/responses&lt;/code&gt; is the newer OpenAI Responses API endpoint.&lt;/p&gt;

&lt;p&gt;It is designed around a more general response object, and it can represent things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text output;&lt;/li&gt;
&lt;li&gt;tool calls;&lt;/li&gt;
&lt;li&gt;multimodal input;&lt;/li&gt;
&lt;li&gt;reasoning items;&lt;/li&gt;
&lt;li&gt;structured output;&lt;/li&gt;
&lt;li&gt;agent-like workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simplified request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/responses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-5.5",
    "input": "Explain the difference between chat completions and responses."
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;/v1/responses&lt;/code&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the tool explicitly says it uses the OpenAI Responses API;&lt;/li&gt;
&lt;li&gt;the config mentions &lt;code&gt;responses&lt;/code&gt; rather than &lt;code&gt;chat.completions&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;the request body uses &lt;code&gt;input&lt;/code&gt; instead of &lt;code&gt;messages&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;the model/provider route supports Responses API behavior;&lt;/li&gt;
&lt;li&gt;you need newer OpenAI-style tool or reasoning output formats.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not send a Chat Completions body to &lt;code&gt;/v1/responses&lt;/code&gt; unless the gateway explicitly documents that conversion.&lt;/p&gt;

&lt;h2&gt;
  
  
  What &lt;code&gt;/v1/messages&lt;/code&gt; is
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;/v1/messages&lt;/code&gt; is the Anthropic Messages API style endpoint.&lt;/p&gt;

&lt;p&gt;A typical Claude-native request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/messages &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "system": "You are a helpful assistant.",
    "messages": [
      {"role": "user", "content": "Explain Claude Messages API."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;/v1/messages&lt;/code&gt; when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the client is Anthropic-native;&lt;/li&gt;
&lt;li&gt;the SDK expects Claude Messages API format;&lt;/li&gt;
&lt;li&gt;the request has top-level &lt;code&gt;system&lt;/code&gt; instead of a system message inside the array;&lt;/li&gt;
&lt;li&gt;the request requires Anthropic-specific content blocks or tool schema;&lt;/li&gt;
&lt;li&gt;the documentation explicitly says to call &lt;code&gt;/v1/messages&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not assume every Claude model should be called with &lt;code&gt;/v1/messages&lt;/code&gt;. If you are using an OpenAI-compatible gateway or SDK, Claude models may be called through &lt;code&gt;/v1/chat/completions&lt;/code&gt; instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why endpoint mistakes make models look unavailable
&lt;/h2&gt;

&lt;p&gt;When a model fails, users often assume the model is down. But in many cases, the model is fine and the request format is wrong.&lt;/p&gt;

&lt;p&gt;Common mismatch examples:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mistake&lt;/th&gt;
&lt;th&gt;What happens&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sending &lt;code&gt;messages&lt;/code&gt; chat body to &lt;code&gt;/v1/responses&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Invalid request body or ignored fields&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;/v1/chat/completions&lt;/code&gt; or change body to &lt;code&gt;input&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sending Anthropic &lt;code&gt;system&lt;/code&gt; + &lt;code&gt;messages&lt;/code&gt; body to &lt;code&gt;/v1/chat/completions&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Schema mismatch&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;/v1/messages&lt;/code&gt; or convert to OpenAI message format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Using &lt;code&gt;/v1/messages&lt;/code&gt; in an OpenAI SDK&lt;/td&gt;
&lt;td&gt;SDK may append the wrong path or parse the wrong response&lt;/td&gt;
&lt;td&gt;Use OpenAI-compatible base URL with chat completions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Missing &lt;code&gt;/v1&lt;/code&gt; in base URL&lt;/td&gt;
&lt;td&gt;404 or unknown route&lt;/td&gt;
&lt;td&gt;Set base URL to &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding &lt;code&gt;/chat/completions&lt;/code&gt; to base URL and SDK appends it again&lt;/td&gt;
&lt;td&gt;Double path like &lt;code&gt;/chat/completions/chat/completions&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Base URL should usually end at &lt;code&gt;/v1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding UTM parameters to API endpoint&lt;/td&gt;
&lt;td&gt;Auth/routing errors&lt;/td&gt;
&lt;td&gt;UTM only belongs on website links&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Base URL vs full endpoint
&lt;/h2&gt;

&lt;p&gt;Many tools ask for a Base URL, not a full endpoint.&lt;/p&gt;

&lt;p&gt;Correct Base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the SDK calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wrong Base URL for most SDKs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why? Because the SDK may append &lt;code&gt;/chat/completions&lt;/code&gt; again.&lt;/p&gt;

&lt;p&gt;Rule:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the field is called &lt;code&gt;base_url&lt;/code&gt;, use &lt;code&gt;/v1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;If the field is called &lt;code&gt;endpoint&lt;/code&gt; and asks for a full URL, use the full path the tool requires.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Which endpoint should I choose?
&lt;/h2&gt;

&lt;p&gt;Use this decision tree:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Does your tool say “OpenAI-compatible API”?&lt;br&gt;&lt;br&gt;
Use &lt;code&gt;/v1/chat/completions&lt;/code&gt; or Base URL &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Does your tool specifically say “Responses API”?&lt;br&gt;&lt;br&gt;
Use &lt;code&gt;/v1/responses&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Does your tool use the Anthropic SDK or Claude Messages API?&lt;br&gt;&lt;br&gt;
Use &lt;code&gt;/v1/messages&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Are you unsure?&lt;br&gt;&lt;br&gt;
Start with &lt;code&gt;/v1/chat/completions&lt;/code&gt;, because most third-party clients support it.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Recommended Crazyrouter configuration
&lt;/h2&gt;

&lt;p&gt;For most OpenAI-compatible tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API Key: your Crazyrouter API key
Base URL: https://crazyrouter.com/v1
Model: choose a model from the Crazyrouter model list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For users in regions where the global route is unstable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Base URL: https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Human-facing links can use UTM tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_endpoints
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;API endpoints should not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://api-endpoint.example.invalid/v1?utm_source=blog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick debugging checklist
&lt;/h2&gt;

&lt;p&gt;Before reporting “model unavailable,” check these five things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is the model name spelled exactly as shown in the model list?&lt;/li&gt;
&lt;li&gt;Is the endpoint family correct: chat completions, responses, or messages?&lt;/li&gt;
&lt;li&gt;Does the request body match that endpoint’s schema?&lt;/li&gt;
&lt;li&gt;Is the Base URL exactly &lt;code&gt;/v1&lt;/code&gt;, not missing it and not duplicating endpoint paths?&lt;/li&gt;
&lt;li&gt;Are you using the right SDK mode: OpenAI-compatible or Anthropic-native?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Final recommendation
&lt;/h2&gt;

&lt;p&gt;If you are building general app integrations, start with &lt;code&gt;/v1/chat/completions&lt;/code&gt;. It is the broadest compatibility path.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;/v1/responses&lt;/code&gt; when your client or workflow explicitly requires the Responses API.&lt;/p&gt;

&lt;p&gt;Use &lt;code&gt;/v1/messages&lt;/code&gt; when you are using Anthropic-native tooling.&lt;/p&gt;

&lt;p&gt;Most endpoint problems disappear when you remember this rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenAI-compatible client → /v1/chat/completions
OpenAI Responses client → /v1/responses
Anthropic-native client → /v1/messages
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the model exists but appears unavailable, do not only change the model name. Check the endpoint and request schema first.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gateway</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Crazyrouter Codex CLI: Use Codex with One API Key and an OpenAI-Compatible Gateway</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Thu, 04 Jun 2026 16:57:39 +0000</pubDate>
      <link>https://dev.to/xujfcn/crazyrouter-codex-cli-use-codex-with-one-api-key-and-an-openai-compatible-gateway-3bj4</link>
      <guid>https://dev.to/xujfcn/crazyrouter-codex-cli-use-codex-with-one-api-key-and-an-openai-compatible-gateway-3bj4</guid>
      <description>&lt;h1&gt;
  
  
  Crazyrouter Codex CLI: Use Codex with One API Key and an OpenAI-Compatible Gateway
&lt;/h1&gt;

&lt;p&gt;OpenAI Codex CLI is useful when you want an AI coding agent directly inside your terminal. The painful part is not the idea — it is the setup: API keys, base URLs, model names, Windows environment variables, macOS shell profiles, Linux config files, and different providers for different models.&lt;/p&gt;

&lt;p&gt;The new &lt;a href="https://github.com/xujfcn/crazyrouter-codex-cli?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;&lt;code&gt;crazyrouter-codex-cli&lt;/code&gt;&lt;/a&gt; repo solves one specific problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;connect Codex CLI to Crazyrouter with one API key and an OpenAI-compatible API endpoint.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/xujfcn/crazyrouter-codex-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What this repo does
&lt;/h2&gt;

&lt;p&gt;The repo provides simple install scripts for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows PowerShell&lt;/li&gt;
&lt;li&gt;Windows batch file&lt;/li&gt;
&lt;li&gt;macOS&lt;/li&gt;
&lt;li&gt;Linux&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It configures Codex CLI to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://cn.crazyrouter.com/v1
&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-crazyrouter-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That means Codex CLI can talk to Crazyrouter through an OpenAI-compatible interface, while Crazyrouter handles the model/provider side.&lt;/p&gt;

&lt;p&gt;Important rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not add UTM parameters to API endpoints. UTM belongs on human-clickable website links, not &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Correct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://cn.crazyrouter.com/v1?utm_source&lt;span class="o"&gt;=&lt;/span&gt;...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why use Codex CLI through Crazyrouter?
&lt;/h2&gt;

&lt;p&gt;A terminal coding agent is most useful when it can become part of your normal development loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;open a project directory;&lt;/li&gt;
&lt;li&gt;ask the agent to inspect code;&lt;/li&gt;
&lt;li&gt;let it patch files;&lt;/li&gt;
&lt;li&gt;run tests;&lt;/li&gt;
&lt;li&gt;review the diff;&lt;/li&gt;
&lt;li&gt;repeat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But real developer teams often want more than one model. Some tasks need a fast low-cost model. Some need a stronger reasoning model. Some need Claude-style code review. Some need Gemini-style long-context analysis. Some teams also need a more stable route from regions where direct access is unreliable.&lt;/p&gt;

&lt;p&gt;Crazyrouter gives Codex CLI a single OpenAI-compatible gateway:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one API key;&lt;/li&gt;
&lt;li&gt;one base URL;&lt;/li&gt;
&lt;li&gt;multiple model choices;&lt;/li&gt;
&lt;li&gt;OpenAI-compatible client configuration;&lt;/li&gt;
&lt;li&gt;easier switching between coding models.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  One-command install
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Windows PowerShell
&lt;/h3&gt;

&lt;p&gt;Open PowerShell as a normal user and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;iwr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-UseB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://raw.githubusercontent.com/xujfcn/crazyrouter-codex-cli/main/install-crazyrouter-codex.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or download and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;install-crazyrouter-codex.bat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  macOS / Linux
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/xujfcn/crazyrouter-codex-cli/main/install-crazyrouter-codex.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script asks for your Crazyrouter API key, writes the needed environment variables, and backs up existing Codex configuration when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Manual setup
&lt;/h2&gt;

&lt;p&gt;If you prefer to configure it yourself, install Codex CLI first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Node.js 22+ is recommended.&lt;/p&gt;

&lt;p&gt;Then set the environment variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  macOS / Linux
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-your-crazyrouter-key
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Windows PowerShell
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;setx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk-your-crazyrouter-key"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;setx&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://cn.crazyrouter.com/v1"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After &lt;code&gt;setx&lt;/code&gt;, reopen your terminal.&lt;/p&gt;

&lt;p&gt;Then start Codex:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Codex &lt;code&gt;config.toml&lt;/code&gt; example
&lt;/h2&gt;

&lt;p&gt;Some Codex CLI versions support provider configuration in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Windows: &lt;code&gt;%USERPROFILE%\.codex\config.toml&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;macOS / Linux: &lt;code&gt;~/.codex/config.toml&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;model&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gpt-5.5"&lt;/span&gt;
&lt;span class="py"&gt;model_provider&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"crazyrouter"&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.crazyrouter]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Crazyrouter"&lt;/span&gt;
&lt;span class="py"&gt;base_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"https://cn.crazyrouter.com/v1"&lt;/span&gt;
&lt;span class="py"&gt;env_key&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"OPENAI_API_KEY"&lt;/span&gt;
&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;

&lt;span class="nn"&gt;[model_providers.crazyrouter.query_params]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see this error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wire_api = "chat" is no longer supported
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"chat"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;wire_api&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"responses"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Model selection
&lt;/h2&gt;

&lt;p&gt;Once the gateway is configured, you can start Codex with the default model or specify one explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codex
codex &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.5
codex &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-4o-mini
codex &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Model availability can vary by account, provider route, and current upstream status. Check the current model list here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;Crazyrouter model list&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical workflow example
&lt;/h2&gt;

&lt;p&gt;A simple Codex CLI coding loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then ask:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Inspect this repo and explain the architecture. Do not edit files yet.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Find the smallest safe fix for the failing login test. Make the change, then run the relevant test only.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then review:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git diff
npm &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The gateway setup does not replace careful review. It makes the model/provider configuration less annoying so you can focus on the actual engineering loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;codex: command not found&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Install Codex globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @openai/codex
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;which codex
codex &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Windows, reopen the terminal after installation.&lt;/p&gt;

&lt;h3&gt;
  
  
  API key not found
&lt;/h3&gt;

&lt;p&gt;Check whether the environment variable exists.&lt;/p&gt;

&lt;p&gt;macOS / Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$OPENAI_API_KEY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Windows PowerShell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If empty, set it again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wrong base URL
&lt;/h3&gt;

&lt;p&gt;The base URL should be exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not add &lt;code&gt;/chat/completions&lt;/code&gt;, &lt;code&gt;/responses&lt;/code&gt;, or UTM parameters. Client libraries append the final API path themselves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Existing Codex config conflict
&lt;/h3&gt;

&lt;p&gt;If you had another provider configured before, check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.codex/config.toml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure the selected &lt;code&gt;model_provider&lt;/code&gt; points to the Crazyrouter provider block.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this setup is useful
&lt;/h2&gt;

&lt;p&gt;This repo is especially useful for developers who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use Codex CLI but want an OpenAI-compatible gateway;&lt;/li&gt;
&lt;li&gt;want to try multiple models from one CLI setup;&lt;/li&gt;
&lt;li&gt;work across Windows, macOS, and Linux machines;&lt;/li&gt;
&lt;li&gt;need a repeatable install script for teammates;&lt;/li&gt;
&lt;li&gt;want a simpler path for AI coding workflows in regions where direct provider access can be unstable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;GitHub repo: &lt;a href="https://github.com/xujfcn/crazyrouter-codex-cli?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;xujfcn/crazyrouter-codex-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Crazyrouter: &lt;a href="https://crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;crazyrouter.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model list: &lt;a href="https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;crazyrouter.com/models&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://crazyrouter.com/docs?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=codex_cli" rel="noopener noreferrer"&gt;crazyrouter.com/docs&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;crazyrouter-codex-cli&lt;/code&gt; is a small repo, but it removes a common setup tax: configuring Codex CLI to use an OpenAI-compatible gateway correctly.&lt;/p&gt;

&lt;p&gt;If you want Codex CLI with one key, one base URL, and easier model routing, start here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/xujfcn/crazyrouter-codex-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
      <category>tools</category>
    </item>
    <item>
      <title>AI Image API Playground: Test GPT Image, Imagen, Qwen Image and FLUX Online</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:53:52 +0000</pubDate>
      <link>https://dev.to/xujfcn/ai-image-api-playground-test-gpt-image-imagen-qwen-image-and-flux-online-2c3k</link>
      <guid>https://dev.to/xujfcn/ai-image-api-playground-test-gpt-image-imagen-qwen-image-and-flux-online-2c3k</guid>
      <description>&lt;h1&gt;
  
  
  AI Image API Playground: Test GPT Image, Imagen, Qwen Image and FLUX Online
&lt;/h1&gt;

&lt;p&gt;If you are building image generation into a product, do not pick a model from a pricing table alone.&lt;/p&gt;

&lt;p&gt;Test the same prompt across multiple image models first.&lt;/p&gt;

&lt;p&gt;A good AI image API playground lets you compare output style, prompt following, text rendering, product accuracy, speed, and cost before you commit to a provider-specific integration. That matters because GPT Image, Imagen, Qwen Image, FLUX, and DALL-E-style workflows can behave very differently on the same request.&lt;/p&gt;

&lt;p&gt;This guide shows how developers can use an image API playground to test multiple AI image models with one API key, then move the winning model into production through an OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnsvqa8sc9br3rrexrxm9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnsvqa8sc9br3rrexrxm9.png" alt="AI Image API Playground workflow" width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;Use an AI image API playground when you need to answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which image model follows my prompt most accurately?&lt;/li&gt;
&lt;li&gt;Which model handles product photos, posters, UI assets, or text-heavy images best?&lt;/li&gt;
&lt;li&gt;Which model is good enough for the cost?&lt;/li&gt;
&lt;li&gt;Can I copy a working API request after testing visually?&lt;/li&gt;
&lt;li&gt;Can my application switch models later without rewriting the integration?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Crazyrouter users, the image workflow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;open the image playground;&lt;/li&gt;
&lt;li&gt;write one test prompt;&lt;/li&gt;
&lt;li&gt;run it across image models such as GPT Image, Imagen, Qwen Image, FLUX, and DALL-E-style models;&lt;/li&gt;
&lt;li&gt;compare results;&lt;/li&gt;
&lt;li&gt;copy the cURL/API request;&lt;/li&gt;
&lt;li&gt;move the request to &lt;code&gt;https://crazyrouter.com/v1/images/generations&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Human-facing test page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://image.crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=image_api_playground
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Production API endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1/images/generations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not add UTM parameters to API endpoints. Use tracking only on human-facing links.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why image model testing is different from chat model testing
&lt;/h2&gt;

&lt;p&gt;Chat models are usually judged by text quality, reasoning, latency, tool use, and price.&lt;/p&gt;

&lt;p&gt;Image models need a different checklist:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;visual style consistency;&lt;/li&gt;
&lt;li&gt;prompt following;&lt;/li&gt;
&lt;li&gt;product detail accuracy;&lt;/li&gt;
&lt;li&gt;text rendering inside images;&lt;/li&gt;
&lt;li&gt;brand-safety behavior;&lt;/li&gt;
&lt;li&gt;aspect ratio support;&lt;/li&gt;
&lt;li&gt;edit vs generation support;&lt;/li&gt;
&lt;li&gt;reproducibility;&lt;/li&gt;
&lt;li&gt;cost per accepted asset, not just cost per request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, a model that creates beautiful cinematic images may fail on product packaging text. A model that handles text well may not be the cheapest for bulk poster generation. A model that is excellent for photorealism may not fit UI mockups.&lt;/p&gt;

&lt;p&gt;That is why a playground is useful: it lets you compare before you wire the model into a production workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model comparison: what to test first
&lt;/h2&gt;

&lt;p&gt;Start with a small matrix. Do not test twenty models with random prompts. Pick three real prompts from your product and run them consistently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0t7pmhmwxqt95ayvgg7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy0t7pmhmwxqt95ayvgg7.png" alt="Image model comparison table" width="800" height="469"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model family&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Watch out for&lt;/th&gt;
&lt;th&gt;Recommended test prompt&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT Image / DALL-E-style workflows&lt;/td&gt;
&lt;td&gt;Instruction following, edits, product mockups, structured scenes&lt;/td&gt;
&lt;td&gt;May cost more on large batches&lt;/td&gt;
&lt;td&gt;“Create a clean SaaS hero image showing a dashboard, API routing lines, and four model cards.”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Imagen&lt;/td&gt;
&lt;td&gt;Photorealistic visuals, natural lighting, polished marketing images&lt;/td&gt;
&lt;td&gt;Provider-specific behavior can differ across versions&lt;/td&gt;
&lt;td&gt;“Photorealistic product photo of a matte black wireless keyboard on a white desk, soft studio lighting.”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen Image&lt;/td&gt;
&lt;td&gt;Text-heavy images, multilingual prompts, practical creative assets&lt;/td&gt;
&lt;td&gt;Test exact typography and small text before production&lt;/td&gt;
&lt;td&gt;“A bilingual poster with the words ‘One API Key’ and ‘统一模型入口’, clean developer conference style.”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FLUX&lt;/td&gt;
&lt;td&gt;Stylized posters, creative visuals, hero images, social media graphics&lt;/td&gt;
&lt;td&gt;Version choice matters; compare style vs accuracy&lt;/td&gt;
&lt;td&gt;“Cyberpunk developer workspace with floating API nodes and neon model labels, editorial illustration.”&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The goal is not to declare one universal winner. The goal is to pick the right model for the task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three prompts to use in your first test
&lt;/h2&gt;

&lt;p&gt;Use prompts that map to real production needs.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Product photo prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a photorealistic ecommerce product image of a minimalist white smart speaker on a light gray background. Soft studio lighting, realistic shadows, no text, centered composition, high-end product catalog style.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;product shape consistency;&lt;/li&gt;
&lt;li&gt;realistic reflections and shadows;&lt;/li&gt;
&lt;li&gt;whether the model invents unwanted logos or text;&lt;/li&gt;
&lt;li&gt;whether the result is clean enough for an ecommerce page.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. SaaS hero image prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Create a clean SaaS hero image for an AI API dashboard. Show multiple model cards connected to one central API key, with subtle blue and purple gradients, modern UI panels, no brand logos, professional developer-tool style.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;dashboard layout clarity;&lt;/li&gt;
&lt;li&gt;whether the image looks like a real product visual;&lt;/li&gt;
&lt;li&gt;whether it avoids fake unreadable UI clutter;&lt;/li&gt;
&lt;li&gt;whether the style matches a B2B website.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Text-heavy poster prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Design a modern developer conference poster with the exact headline “One API Key, Many AI Models”. Include small abstract icons for text, image, audio, and video generation. Clean typography, white background, blue accent color.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What to evaluate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exact text rendering;&lt;/li&gt;
&lt;li&gt;typography quality;&lt;/li&gt;
&lt;li&gt;layout balance;&lt;/li&gt;
&lt;li&gt;whether the model introduces misspellings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Text-heavy prompts are especially important because many image models still struggle with precise typography.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use the playground
&lt;/h2&gt;

&lt;p&gt;A developer-friendly AI image playground should not only return a pretty image. It should help you turn the result into an API call.&lt;/p&gt;

&lt;p&gt;Use this workflow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the playground.&lt;/li&gt;
&lt;li&gt;Paste a real production prompt.&lt;/li&gt;
&lt;li&gt;Choose the first candidate model.&lt;/li&gt;
&lt;li&gt;Set size, quality, and count.&lt;/li&gt;
&lt;li&gt;Generate the image.&lt;/li&gt;
&lt;li&gt;Save the output and notes.&lt;/li&gt;
&lt;li&gt;Repeat with the same prompt on other models.&lt;/li&gt;
&lt;li&gt;Compare accepted outputs, not just raw generations.&lt;/li&gt;
&lt;li&gt;Copy the generated cURL/API request.&lt;/li&gt;
&lt;li&gt;Move the request into your application.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For Crazyrouter image testing, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://image.crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=image_api_playground
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key point is repeatability. Run the same prompt across models, then score the results with the same criteria.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production API example
&lt;/h2&gt;

&lt;p&gt;Once you find a model that works, move to the API.&lt;/p&gt;

&lt;p&gt;A typical OpenAI-compatible image generation request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/images/generations &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "qwen-image",
    "prompt": "Create a clean SaaS hero image for an AI API dashboard. Show multiple model cards connected to one central API key, with subtle blue and purple gradients, modern UI panels, no brand logos.",
    "size": "1024x1024",
    "n": 1
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen-image&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a clean SaaS hero image for an AI API dashboard. Show multiple model cards connected to one central API key.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1024x1024&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The exact available model IDs can change as providers release new versions. Always copy the current model ID from the live model list or playground.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to measure before production
&lt;/h2&gt;

&lt;p&gt;A playground test is useful only if you measure the right things.&lt;/p&gt;

&lt;p&gt;Use this scoring sheet:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Criterion&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Score 1-5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt following&lt;/td&gt;
&lt;td&gt;Did the image include the requested objects, layout, and constraints?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual quality&lt;/td&gt;
&lt;td&gt;Would you publish this asset without heavy editing?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Text accuracy&lt;/td&gt;
&lt;td&gt;If text was requested, was it spelled correctly?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brand fit&lt;/td&gt;
&lt;td&gt;Does the style match your product or campaign?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost fit&lt;/td&gt;
&lt;td&gt;Is the accepted-output cost reasonable?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repeatability&lt;/td&gt;
&lt;td&gt;Can the model produce similar quality across multiple runs?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API fit&lt;/td&gt;
&lt;td&gt;Does the model support the size, quality, and workflow you need?&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The most important metric is not price per generation. It is price per accepted image.&lt;/p&gt;

&lt;p&gt;If one model costs 30% more but produces usable assets twice as often, it may be cheaper in practice.&lt;/p&gt;

&lt;h2&gt;
  
  
  One API key vs separate provider accounts
&lt;/h2&gt;

&lt;p&gt;You can integrate image models provider by provider. That works for a small test, but it becomes painful when your product grows.&lt;/p&gt;

&lt;p&gt;Separate provider accounts usually mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multiple API keys;&lt;/li&gt;
&lt;li&gt;different request formats;&lt;/li&gt;
&lt;li&gt;different billing dashboards;&lt;/li&gt;
&lt;li&gt;different quota systems;&lt;/li&gt;
&lt;li&gt;different failure modes;&lt;/li&gt;
&lt;li&gt;more engineering work when switching models.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A unified API gateway changes the workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one API key;&lt;/li&gt;
&lt;li&gt;one account balance;&lt;/li&gt;
&lt;li&gt;one OpenAI-compatible request style;&lt;/li&gt;
&lt;li&gt;multiple image model families;&lt;/li&gt;
&lt;li&gt;easier model switching;&lt;/li&gt;
&lt;li&gt;simpler fallback logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For teams building SaaS tools, ecommerce automation, marketing creative systems, or AI agents, this flexibility matters more than the playground itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended model selection workflow
&lt;/h2&gt;

&lt;p&gt;Use this simple process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explore&lt;/strong&gt;: run your prompt in the playground across 3-5 model families.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt;: evaluate accepted outputs with a consistent rubric.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimate&lt;/strong&gt;: calculate cost per accepted image.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate&lt;/strong&gt;: copy the API request into your app.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback&lt;/strong&gt;: define a second model for failures or cost spikes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt;: track latency, failures, and acceptance rate.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last step is easy to skip. Do not skip it.&lt;/p&gt;

&lt;p&gt;Image generation quality changes over time as providers update models. The best model for your prompt in June may not be the best model three months later.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: Testing only one prompt
&lt;/h3&gt;

&lt;p&gt;A single beautiful output does not prove the model is right for your product.&lt;/p&gt;

&lt;p&gt;Use at least three prompt types: product photo, website hero, and text-heavy poster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: Comparing screenshots instead of accepted outputs
&lt;/h3&gt;

&lt;p&gt;If a model returns five images and only one is usable, count that. Your real cost is higher than the sticker price.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: Forgetting text rendering
&lt;/h3&gt;

&lt;p&gt;If your use case needs labels, posters, packaging, or UI text, test exact spelling early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: Hardcoding one model forever
&lt;/h3&gt;

&lt;p&gt;Use model IDs intentionally, but keep your application flexible. Image model quality and pricing move quickly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 5: Adding UTM parameters to API URLs
&lt;/h3&gt;

&lt;p&gt;Use UTM parameters on article links and landing pages. Do not append them to API endpoints such as &lt;code&gt;/v1/images/generations&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an AI image API playground?
&lt;/h3&gt;

&lt;p&gt;An AI image API playground is a web interface where developers can test image generation models before integrating them into code. A good playground also provides copyable API requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why test multiple AI image models?
&lt;/h3&gt;

&lt;p&gt;Different models perform differently on photorealism, text rendering, edits, product accuracy, style, and cost. Testing multiple models reduces the risk of choosing the wrong provider too early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use one API key for multiple image models?
&lt;/h3&gt;

&lt;p&gt;Yes. With Crazyrouter, you can use one API key and an OpenAI-compatible endpoint to access multiple model families, including image generation workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which image model is best for product photos?
&lt;/h3&gt;

&lt;p&gt;There is no universal answer. Start by testing Imagen, GPT Image/DALL-E-style models, Qwen Image, and FLUX-style models with the same product photo prompt. Score accepted outputs, not just raw generations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which image model is best for text in images?
&lt;/h3&gt;

&lt;p&gt;Text rendering varies by model and version. Qwen Image and newer instruction-following image models are worth testing, but you should always test your exact headline, language, and layout.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is the playground only for non-technical users?
&lt;/h3&gt;

&lt;p&gt;No. A playground is useful for developers because it shortens the model selection loop. You can test visually, then copy a request into production code.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I move from playground to production?
&lt;/h3&gt;

&lt;p&gt;Use the generated request as a starting point. In production, call &lt;code&gt;https://crazyrouter.com/v1/images/generations&lt;/code&gt; with your API key, chosen model, prompt, size, and generation options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final recommendation
&lt;/h2&gt;

&lt;p&gt;If you are adding image generation to an app, do not start by writing provider-specific integration code.&lt;/p&gt;

&lt;p&gt;Start with a playground.&lt;/p&gt;

&lt;p&gt;Test the same prompts across GPT Image, Imagen, Qwen Image, FLUX, and DALL-E-style workflows. Pick the model that produces the best accepted-output cost for your actual use case. Then integrate through one OpenAI-compatible API so you can switch models later.&lt;/p&gt;

&lt;p&gt;Try the playground:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://image.crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=image_api_playground
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explore live models and pricing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=image_api_playground
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>api</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to Fix AI API 500, 502, and 524 Errors</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:53:51 +0000</pubDate>
      <link>https://dev.to/xujfcn/how-to-fix-ai-api-500-502-and-524-errors-12je</link>
      <guid>https://dev.to/xujfcn/how-to-fix-ai-api-500-502-and-524-errors-12je</guid>
      <description>&lt;h1&gt;
  
  
  How to Fix AI API 500, 502, and 524 Errors
&lt;/h1&gt;

&lt;p&gt;AI API errors are frustrating because they often appear at the worst possible time: during a demo, a production workflow, a coding-agent run, or a customer support automation task.&lt;/p&gt;

&lt;p&gt;From real support conversations, three error families appear again and again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;500&lt;/code&gt; — server-side or upstream failure;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;502&lt;/code&gt; — bad gateway or invalid upstream response;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;524&lt;/code&gt; — timeout, often from a long-running request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mistake is treating all three the same.&lt;/p&gt;

&lt;p&gt;A retry might fix one request. It will not fix a fragile production design.&lt;/p&gt;

&lt;p&gt;This guide explains what these errors usually mean, what to check first, and how to make AI API calls more resilient with logging, retries, model fallback, and endpoint fallback.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick error table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Usually means&lt;/th&gt;
&lt;th&gt;First action&lt;/th&gt;
&lt;th&gt;Production fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;500&lt;/td&gt;
&lt;td&gt;Internal server or upstream provider failure&lt;/td&gt;
&lt;td&gt;Retry once and capture request details&lt;/td&gt;
&lt;td&gt;Add retry with backoff and fallback model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;502&lt;/td&gt;
&lt;td&gt;Gateway could not get a valid upstream response&lt;/td&gt;
&lt;td&gt;Try a nearby model or route&lt;/td&gt;
&lt;td&gt;Add model/provider fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;524&lt;/td&gt;
&lt;td&gt;Request timed out&lt;/td&gt;
&lt;td&gt;Reduce context/output or use streaming&lt;/td&gt;
&lt;td&gt;Add timeout controls and split long tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;429&lt;/td&gt;
&lt;td&gt;Rate limit or quota issue&lt;/td&gt;
&lt;td&gt;Reduce rate and check limits&lt;/td&gt;
&lt;td&gt;Queue, throttle, or request higher limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;401/403&lt;/td&gt;
&lt;td&gt;Auth or permission issue&lt;/td&gt;
&lt;td&gt;Check API key and model access&lt;/td&gt;
&lt;td&gt;Validate config before deploy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you only remember one thing: &lt;strong&gt;log the model, endpoint, request time, error code, and whether streaming was enabled.&lt;/strong&gt; Without that, troubleshooting becomes guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a 500 AI API error usually means
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;500&lt;/code&gt; error usually means something failed server-side.&lt;/p&gt;

&lt;p&gt;In an AI API workflow, that could be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the gateway encountered an internal error;&lt;/li&gt;
&lt;li&gt;the upstream model provider returned an unexpected failure;&lt;/li&gt;
&lt;li&gt;the model route was temporarily unstable;&lt;/li&gt;
&lt;li&gt;the request payload triggered an edge case;&lt;/li&gt;
&lt;li&gt;a long or complex request failed during processing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A single &lt;code&gt;500&lt;/code&gt; is often temporary.&lt;/p&gt;

&lt;p&gt;A repeated &lt;code&gt;500&lt;/code&gt; on the same request usually means you need to inspect the request shape.&lt;/p&gt;

&lt;p&gt;Check:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;model name;&lt;/li&gt;
&lt;li&gt;endpoint path;&lt;/li&gt;
&lt;li&gt;message format;&lt;/li&gt;
&lt;li&gt;tool/function calling schema;&lt;/li&gt;
&lt;li&gt;image/video/audio payload format;&lt;/li&gt;
&lt;li&gt;context size;&lt;/li&gt;
&lt;li&gt;whether the same request works on another model.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What a 502 AI API error usually means
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;502 Bad Gateway&lt;/code&gt; means the gateway did not receive a valid response from the upstream service.&lt;/p&gt;

&lt;p&gt;For AI APIs, common causes include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;upstream provider instability;&lt;/li&gt;
&lt;li&gt;overloaded model route;&lt;/li&gt;
&lt;li&gt;bad or incomplete upstream response;&lt;/li&gt;
&lt;li&gt;network interruption;&lt;/li&gt;
&lt;li&gt;route-specific failure;&lt;/li&gt;
&lt;li&gt;gateway-provider mismatch for a special model feature.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If a &lt;code&gt;502&lt;/code&gt; happens once, retrying may be enough.&lt;/p&gt;

&lt;p&gt;If it happens repeatedly on one model, test a similar model.&lt;/p&gt;

&lt;p&gt;For example, if one high-end reasoning model is unstable, temporarily route the same prompt to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a nearby model version;&lt;/li&gt;
&lt;li&gt;a faster model in the same family;&lt;/li&gt;
&lt;li&gt;a different provider with similar capability;&lt;/li&gt;
&lt;li&gt;a cheaper fallback model for non-critical tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where a gateway is useful: you can switch model routes without rewriting the app.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a 524 timeout usually means
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;524&lt;/code&gt; usually means the connection timed out while waiting for a response.&lt;/p&gt;

&lt;p&gt;This is common with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;very long prompts;&lt;/li&gt;
&lt;li&gt;large context windows;&lt;/li&gt;
&lt;li&gt;huge expected outputs;&lt;/li&gt;
&lt;li&gt;complex reasoning tasks;&lt;/li&gt;
&lt;li&gt;image or video generation jobs;&lt;/li&gt;
&lt;li&gt;non-streaming requests that run too long;&lt;/li&gt;
&lt;li&gt;coding-agent workflows that ask the model to solve too much in one call.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Immediate fixes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;reduce input size;&lt;/li&gt;
&lt;li&gt;lower &lt;code&gt;max_tokens&lt;/code&gt; or output length;&lt;/li&gt;
&lt;li&gt;use streaming for text responses;&lt;/li&gt;
&lt;li&gt;split the task into smaller steps;&lt;/li&gt;
&lt;li&gt;choose a faster model;&lt;/li&gt;
&lt;li&gt;avoid asking for massive JSON output in one response.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A timeout is not always a platform outage. Sometimes it means the request is too large or too slow for a synchronous API call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Immediate troubleshooting checklist
&lt;/h2&gt;

&lt;p&gt;When an AI API request fails, do this before changing your whole setup.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What to do&lt;/th&gt;
&lt;th&gt;Why it helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Retry once&lt;/td&gt;
&lt;td&gt;Handles temporary upstream failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Save the full error body&lt;/td&gt;
&lt;td&gt;Error text often shows auth/model/payload clues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Record request time&lt;/td&gt;
&lt;td&gt;Support can map it to route/provider logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Record model name&lt;/td&gt;
&lt;td&gt;Many failures are model-route specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Check Base URL&lt;/td&gt;
&lt;td&gt;Wrong endpoint causes confusing failures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Test a smaller prompt&lt;/td&gt;
&lt;td&gt;Separates payload-size issues from route issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Try streaming&lt;/td&gt;
&lt;td&gt;Reduces timeout risk for long text responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Try a nearby model&lt;/td&gt;
&lt;td&gt;Confirms whether the issue is model-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;Try region endpoint if needed&lt;/td&gt;
&lt;td&gt;Helps when access to the global endpoint is unstable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Remove optional features&lt;/td&gt;
&lt;td&gt;Tool calls, images, long JSON schemas can add failure points&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For OpenAI-compatible clients with Crazyrouter, the common Base URLs are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not add UTM parameters or tracking strings to API endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to write a safe retry strategy
&lt;/h2&gt;

&lt;p&gt;Retries help, but blind retries can make an outage worse.&lt;/p&gt;

&lt;p&gt;Use exponential backoff with jitter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;last_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_retries&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;last_error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;
            &lt;span class="n"&gt;wait&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="n"&gt;attempt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;random&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="n"&gt;last_error&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is better than retrying immediately in a tight loop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model fallback example
&lt;/h2&gt;

&lt;p&gt;A production app should not depend on one model route for every task.&lt;/p&gt;

&lt;p&gt;You can define a fallback list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_with_model_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;MODELS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;

    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;All model routes failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern is especially useful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;support bots;&lt;/li&gt;
&lt;li&gt;internal automation;&lt;/li&gt;
&lt;li&gt;coding agents;&lt;/li&gt;
&lt;li&gt;summarization pipelines;&lt;/li&gt;
&lt;li&gt;batch content workflows;&lt;/li&gt;
&lt;li&gt;production apps with user-facing latency requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Endpoint fallback example
&lt;/h2&gt;

&lt;p&gt;If your users or servers sometimes have unstable access to one region, you can test an endpoint fallback.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;ENDPOINTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ENDPOINTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Health check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
            &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;working endpoint:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed endpoint:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not randomly switch endpoints on every request. Use endpoint fallback intentionally and log which route succeeded.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to send support
&lt;/h2&gt;

&lt;p&gt;If the issue persists, support can help much faster when you include the right information.&lt;/p&gt;

&lt;p&gt;Send:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;account email;&lt;/li&gt;
&lt;li&gt;model name;&lt;/li&gt;
&lt;li&gt;Base URL used;&lt;/li&gt;
&lt;li&gt;endpoint path;&lt;/li&gt;
&lt;li&gt;request time and timezone;&lt;/li&gt;
&lt;li&gt;error code;&lt;/li&gt;
&lt;li&gt;error body or screenshot;&lt;/li&gt;
&lt;li&gt;whether streaming was enabled;&lt;/li&gt;
&lt;li&gt;whether the same request works on another model;&lt;/li&gt;
&lt;li&gt;simplified request body without secrets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not send your full API key in public channels.&lt;/p&gt;

&lt;h2&gt;
  
  
  How a gateway helps with AI API reliability
&lt;/h2&gt;

&lt;p&gt;An AI API gateway cannot make every upstream provider perfect.&lt;/p&gt;

&lt;p&gt;But it can make your application more flexible.&lt;/p&gt;

&lt;p&gt;With a gateway, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;switch models without rewriting SDK code;&lt;/li&gt;
&lt;li&gt;route simple tasks to cheaper models;&lt;/li&gt;
&lt;li&gt;route critical tasks to stronger models;&lt;/li&gt;
&lt;li&gt;add fallback across model families;&lt;/li&gt;
&lt;li&gt;keep one OpenAI-compatible integration surface;&lt;/li&gt;
&lt;li&gt;monitor cost and usage more centrally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Crazyrouter, OpenAI-compatible clients can use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or, when needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then your app can focus on retry logic, fallback policy, and good observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production checklist
&lt;/h2&gt;

&lt;p&gt;Before relying on an AI API in production, implement this checklist.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Timeout&lt;/td&gt;
&lt;td&gt;Set explicit request timeouts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retry&lt;/td&gt;
&lt;td&gt;Use exponential backoff with jitter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fallback&lt;/td&gt;
&lt;td&gt;Prepare at least one alternate model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Log endpoint, model, latency, error code, and request ID if available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Payload size&lt;/td&gt;
&lt;td&gt;Limit context and output size&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Streaming&lt;/td&gt;
&lt;td&gt;Use streaming for long text responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limits&lt;/td&gt;
&lt;td&gt;Track RPM and TPM usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;Monitor input/output tokens and cache behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User experience&lt;/td&gt;
&lt;td&gt;Show graceful fallback messages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Support&lt;/td&gt;
&lt;td&gt;Store enough metadata to debug failures later&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Helpful links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Base URL setup guide: &lt;a href="https://crazyrouter.com/blog/openai-compatible-api-base-url-explained?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/blog/openai-compatible-api-base-url-explained?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model list: &lt;a href="https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pricing calculator: &lt;a href="https://crazyrouter.com/tools/pricing-calculator/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/tools/pricing-calculator/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model comparison: &lt;a href="https://crazyrouter.com/tools/model-comparison/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/tools/model-comparison/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=api_error_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API docs: &lt;a href="https://docs.crazyrouter.com" rel="noopener noreferrer"&gt;https://docs.crazyrouter.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What does an AI API 500 error mean?
&lt;/h3&gt;

&lt;p&gt;An AI API 500 error usually means an internal server or upstream provider failure. Retry once, then check the model, endpoint, request format, and whether the same prompt works on another model.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does an AI API 502 error mean?
&lt;/h3&gt;

&lt;p&gt;A 502 error usually means the gateway could not get a valid response from the upstream model provider. It is often temporary, but repeated 502 errors may require a model or route fallback.&lt;/p&gt;

&lt;h3&gt;
  
  
  What does a 524 timeout mean?
&lt;/h3&gt;

&lt;p&gt;A 524 timeout usually means the request took too long. Reduce context size, shorten expected output, use streaming, split the task, or choose a faster model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I retry every failed AI API request?
&lt;/h3&gt;

&lt;p&gt;No. Retry temporary server, gateway, and timeout errors with backoff. Do not blindly retry authentication errors, invalid model errors, or bad request payloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I make AI API calls more reliable?
&lt;/h3&gt;

&lt;p&gt;Use explicit timeouts, retry with backoff, model fallback, endpoint fallback, request logging, payload limits, and monitoring for latency, token usage, and error rates.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>tutorial</category>
      <category>webdev</category>
    </item>
    <item>
      <title>OpenAI-Compatible API Base URL Explained: How to Configure Any AI Tool</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Thu, 04 Jun 2026 06:27:24 +0000</pubDate>
      <link>https://dev.to/xujfcn/openai-compatible-api-base-url-explained-how-to-configure-any-ai-tool-2h8i</link>
      <guid>https://dev.to/xujfcn/openai-compatible-api-base-url-explained-how-to-configure-any-ai-tool-2h8i</guid>
      <description>&lt;h1&gt;
  
  
  OpenAI-Compatible API Base URL Explained: How to Configure Any AI Tool
&lt;/h1&gt;

&lt;p&gt;If an AI tool says it supports an “OpenAI-compatible API,” it usually means you do not need a new SDK.&lt;/p&gt;

&lt;p&gt;You keep the familiar OpenAI request format, then change three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;the API key;&lt;/li&gt;
&lt;li&gt;the model name;&lt;/li&gt;
&lt;li&gt;the API Base URL.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That last one — the Base URL — is where many setup problems happen.&lt;/p&gt;

&lt;p&gt;From real support conversations, the most common mistakes are simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the endpoint is missing &lt;code&gt;/v1&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;the API key is from the wrong provider;&lt;/li&gt;
&lt;li&gt;the tool is still sending requests to the official OpenAI endpoint;&lt;/li&gt;
&lt;li&gt;the selected model name does not exist on the gateway;&lt;/li&gt;
&lt;li&gt;users in a restricted or unstable region are using the wrong route.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This guide explains what the Base URL does and how to configure it in Python, Node.js, curl, Cursor, LiteLLM, FastGPT, and Codex-style tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;p&gt;For an OpenAI-compatible client, the Base URL is the root endpoint your SDK sends API requests to.&lt;/p&gt;

&lt;p&gt;For Crazyrouter, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are in a region where the global endpoint is unstable, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not add tracking parameters to API endpoints. Human-facing links can use UTM parameters; API Base URLs should stay clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is an API Base URL?
&lt;/h2&gt;

&lt;p&gt;An API Base URL is the prefix used before endpoint paths such as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/chat/completions
/responses
/embeddings
/images/generations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, if your Base URL is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;then a chat completion request goes to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1/chat/completions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK builds that full URL for you.&lt;/p&gt;

&lt;p&gt;That is why one missing &lt;code&gt;/v1&lt;/code&gt; can break an otherwise correct setup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why OpenAI-compatible APIs are useful
&lt;/h2&gt;

&lt;p&gt;Many tools and SDKs were originally designed for the OpenAI API format.&lt;/p&gt;

&lt;p&gt;OpenAI-compatible gateways let developers keep that interface while routing requests to different model families, such as GPT, Claude, Gemini, DeepSeek, Qwen, image models, audio models, and more.&lt;/p&gt;

&lt;p&gt;The benefit is practical:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Without an OpenAI-compatible gateway&lt;/th&gt;
&lt;th&gt;With an OpenAI-compatible gateway&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Different SDKs for different providers&lt;/td&gt;
&lt;td&gt;One familiar SDK format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Provider-specific billing and keys&lt;/td&gt;
&lt;td&gt;One API key for many model routes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harder fallback between models&lt;/td&gt;
&lt;td&gt;Easier model switching and fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;More integration work per tool&lt;/td&gt;
&lt;td&gt;Change Base URL, API key, and model name&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In Crazyrouter’s case, you can use one API layer across 627+ models and keep common OpenAI-compatible tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crazyrouter Base URL options
&lt;/h2&gt;

&lt;p&gt;Use this table as the default decision guide.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Base URL&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Standard OpenAI-compatible API access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Region route for domestic/unstable access&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://cn.crazyrouter.com/v1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic native Messages API style&lt;/td&gt;
&lt;td&gt;&lt;code&gt;https://crazyrouter.com&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini native API style&lt;/td&gt;
&lt;td&gt;Use the Gemini-compatible endpoint format in docs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most users configuring OpenAI SDK, Cursor-style tools, LiteLLM, FastGPT, or custom HTTP clients should start with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Python OpenAI SDK example
&lt;/h2&gt;

&lt;p&gt;Install the OpenAI Python SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure the client with a custom Base URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain API Base URL in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you need the region endpoint, only change &lt;code&gt;base_url&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API key and model name do not need to change just because you switched endpoint routes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Node.js OpenAI SDK example
&lt;/h2&gt;

&lt;p&gt;Install the SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;baseURL&lt;/code&gt; in the client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Give me a one-line API setup checklist.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the region endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  curl example
&lt;/h2&gt;

&lt;p&gt;For a direct HTTP test, use curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://crazyrouter.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_CRAZYROUTER_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-5-mini",
    "messages": [
      {"role": "user", "content": "Say hello from an OpenAI-compatible API."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the region endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://cn.crazyrouter.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer YOUR_CRAZYROUTER_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "gpt-5-mini",
    "messages": [
      {"role": "user", "content": "Test the region endpoint."}
    ]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Environment variables
&lt;/h2&gt;

&lt;p&gt;Some tools read endpoint settings from environment variables.&lt;/p&gt;

&lt;p&gt;A common setup is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"YOUR_CRAZYROUTER_API_KEY"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://crazyrouter.com/v1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the region endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"YOUR_CRAZYROUTER_API_KEY"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://cn.crazyrouter.com/v1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a tool uses &lt;code&gt;OPENAI_API_BASE&lt;/code&gt;, &lt;code&gt;OPENAI_BASE_URL&lt;/code&gt;, &lt;code&gt;BASE_URL&lt;/code&gt;, or a UI field named “API endpoint,” check its documentation. The concept is the same, but the variable name may differ.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor-style tools
&lt;/h2&gt;

&lt;p&gt;Many AI coding tools support custom API keys or custom models.&lt;/p&gt;

&lt;p&gt;The exact UI changes over time, but the setup usually follows this pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open model/provider settings.&lt;/li&gt;
&lt;li&gt;Choose OpenAI-compatible or custom provider.&lt;/li&gt;
&lt;li&gt;Paste your Crazyrouter API key.&lt;/li&gt;
&lt;li&gt;Set Base URL to &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Add or select the model name you want to use.&lt;/li&gt;
&lt;li&gt;Send a small test prompt.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the tool is used from a region where the global endpoint is unstable, use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern also applies to many Codex-style, Claude Code wrapper, Cline-style, and background-agent workflows that support OpenAI-compatible endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  LiteLLM setup notes
&lt;/h2&gt;

&lt;p&gt;LiteLLM is often used as a local SDK or proxy layer for multiple providers.&lt;/p&gt;

&lt;p&gt;For an OpenAI-compatible route, you typically configure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;model_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crazyrouter-gpt-5-mini&lt;/span&gt;
    &lt;span class="na"&gt;litellm_params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openai/gpt-5-mini&lt;/span&gt;
      &lt;span class="na"&gt;api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;os.environ/CRAZYROUTER_API_KEY&lt;/span&gt;
      &lt;span class="na"&gt;api_base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use the region endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;api_base&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The main thing is to keep the endpoint, key, and model name aligned.&lt;/p&gt;

&lt;h2&gt;
  
  
  FastGPT, Hermes, and other UI tools
&lt;/h2&gt;

&lt;p&gt;Tools such as FastGPT, Hermes-style clients, and internal admin dashboards often expose fields like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API Key;&lt;/li&gt;
&lt;li&gt;API Base URL;&lt;/li&gt;
&lt;li&gt;Model name;&lt;/li&gt;
&lt;li&gt;Provider type;&lt;/li&gt;
&lt;li&gt;OpenAI-compatible mode.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API Base URL: https://crazyrouter.com/v1
API Key: your Crazyrouter key
Model: a Crazyrouter-supported model name
Provider type: OpenAI-compatible, if required
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For region routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;API Base URL: https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Base URL mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: missing &lt;code&gt;/v1&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correct for OpenAI-compatible clients:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some native APIs have different endpoint formats, but OpenAI-compatible SDKs generally need &lt;code&gt;/v1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: using the official provider key
&lt;/h3&gt;

&lt;p&gt;If you set Crazyrouter as the Base URL, use a Crazyrouter API key.&lt;/p&gt;

&lt;p&gt;A provider-native OpenAI, Anthropic, or Google key will not automatically work against a gateway endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: using a model name the gateway does not support
&lt;/h3&gt;

&lt;p&gt;If the request reaches the server but fails with a model error, check the model list:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the exact model name shown there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: mixing global and region endpoints during debugging
&lt;/h3&gt;

&lt;p&gt;If you test with both endpoints, record which one produced the error.&lt;/p&gt;

&lt;p&gt;When asking support for help, include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Base URL used;&lt;/li&gt;
&lt;li&gt;model name;&lt;/li&gt;
&lt;li&gt;request time;&lt;/li&gt;
&lt;li&gt;error code;&lt;/li&gt;
&lt;li&gt;simplified request body;&lt;/li&gt;
&lt;li&gt;whether streaming was enabled.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Do not share your full API key in a public chat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting checklist
&lt;/h2&gt;

&lt;p&gt;Before contacting support, check this list.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;What to verify&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Base URL&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt; or &lt;code&gt;https://cn.crazyrouter.com/v1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API key&lt;/td&gt;
&lt;td&gt;Key is from your Crazyrouter dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Authorization&lt;/td&gt;
&lt;td&gt;Header is &lt;code&gt;Authorization: Bearer YOUR_KEY&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model name&lt;/td&gt;
&lt;td&gt;Exact model exists in the model list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Balance&lt;/td&gt;
&lt;td&gt;Account has enough balance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request path&lt;/td&gt;
&lt;td&gt;Chat requests go to &lt;code&gt;/chat/completions&lt;/code&gt; through the SDK&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Region&lt;/td&gt;
&lt;td&gt;Use cn endpoint if global access is unstable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt size&lt;/td&gt;
&lt;td&gt;Reduce long context when debugging&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Where to go next
&lt;/h2&gt;

&lt;p&gt;After your Base URL works, the next useful steps are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compare model options on the models page;&lt;/li&gt;
&lt;li&gt;estimate monthly token cost;&lt;/li&gt;
&lt;li&gt;test fallback models for production;&lt;/li&gt;
&lt;li&gt;document the exact setup your team uses.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Helpful links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model list: &lt;a href="https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/models?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Pricing calculator: &lt;a href="https://crazyrouter.com/tools/pricing-calculator/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/tools/pricing-calculator/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Model comparison: &lt;a href="https://crazyrouter.com/tools/model-comparison/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/tools/model-comparison/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Background agent workflow tool: &lt;a href="https://crazyrouter.com/tools/background-agent-worktree-launcher/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide" rel="noopener noreferrer"&gt;https://crazyrouter.com/tools/background-agent-worktree-launcher/?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=base_url_guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;API docs: &lt;a href="https://docs.crazyrouter.com" rel="noopener noreferrer"&gt;https://docs.crazyrouter.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is an OpenAI-compatible API Base URL?
&lt;/h3&gt;

&lt;p&gt;An OpenAI-compatible API Base URL is the endpoint prefix used by OpenAI-style SDKs to send requests to a provider or gateway that implements the OpenAI API format.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Base URL should I use with Crazyrouter?
&lt;/h3&gt;

&lt;p&gt;For OpenAI-compatible clients, use &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;. If you need the region endpoint, use &lt;code&gt;https://cn.crazyrouter.com/v1&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need to change my code when switching models?
&lt;/h3&gt;

&lt;p&gt;Usually no. In an OpenAI-compatible setup, you often change only the &lt;code&gt;model&lt;/code&gt; value while keeping the same SDK, API key, and Base URL.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why does my custom Base URL fail?
&lt;/h3&gt;

&lt;p&gt;Common causes include missing &lt;code&gt;/v1&lt;/code&gt;, wrong API key, wrong model name, insufficient balance, incorrect provider mode, or a region/network mismatch.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use the same Base URL in Cursor, LiteLLM, FastGPT, and Codex-style tools?
&lt;/h3&gt;

&lt;p&gt;Yes, if the tool supports OpenAI-compatible custom endpoints. Use the Crazyrouter API key, set the Base URL, and choose a supported model name.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gateway</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Same Agent Workflow, Three Model Routes: A Real Crazyrouter Benchmark</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Wed, 03 Jun 2026 23:59:43 +0000</pubDate>
      <link>https://dev.to/xujfcn/same-agent-workflow-three-model-routes-a-real-crazyrouter-benchmark-361</link>
      <guid>https://dev.to/xujfcn/same-agent-workflow-three-model-routes-a-real-crazyrouter-benchmark-361</guid>
      <description>&lt;h1&gt;
  
  
  Same Agent Workflow, Three Model Routes: A Real Crazyrouter Benchmark
&lt;/h1&gt;

&lt;p&gt;Dynamic workflows are exciting, but they introduce a practical question: which model should run each step?&lt;/p&gt;

&lt;p&gt;If a workflow has a planner, implementer, adversarial reviewer, and verifier, you can route all steps to one strong model. Or you can route different steps to different models.&lt;/p&gt;

&lt;p&gt;The wrong answer is to guess.&lt;/p&gt;

&lt;p&gt;So we ran a small real benchmark through Crazyrouter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8q97238c9xj3z6y5ulnh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8q97238c9xj3z6y5ulnh.webp" alt="Dynamic workflow routing benchmark" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Test setup
&lt;/h2&gt;

&lt;p&gt;We used the Crazyrouter OpenAI-compatible endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow task was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Add a CSV export for account billing history with user-level authorization, timezone-safe timestamps, CSV escaping, tests, and rollback notes.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The workflow had four steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;planner&lt;/li&gt;
&lt;li&gt;implementer&lt;/li&gt;
&lt;li&gt;adversarial reviewer&lt;/li&gt;
&lt;li&gt;verifier&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We tested three routing policies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Policy&lt;/th&gt;
&lt;th&gt;Planner&lt;/th&gt;
&lt;th&gt;Implementer&lt;/th&gt;
&lt;th&gt;Reviewer&lt;/th&gt;
&lt;th&gt;Verifier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;all_opus_47&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;all_opus_48&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;routed_47_48&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Raw benchmark artifact:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;generated/dynamic_workflow_routing_20260603/benchmark_results.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Route&lt;/th&gt;
&lt;th&gt;Calls&lt;/th&gt;
&lt;th&gt;Total latency&lt;/th&gt;
&lt;th&gt;Total tokens&lt;/th&gt;
&lt;th&gt;Output tokens&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;all Opus 4.7&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;100.939s&lt;/td&gt;
&lt;td&gt;8,853&lt;/td&gt;
&lt;td&gt;5,977&lt;/td&gt;
&lt;td&gt;14/17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;all Opus 4.8&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;82.598s&lt;/td&gt;
&lt;td&gt;8,357&lt;/td&gt;
&lt;td&gt;5,782&lt;/td&gt;
&lt;td&gt;15/17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;routed 4.7/4.8&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;85.975s&lt;/td&gt;
&lt;td&gt;8,652&lt;/td&gt;
&lt;td&gt;5,873&lt;/td&gt;
&lt;td&gt;15/17&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In this run, &lt;code&gt;all_opus_48&lt;/code&gt; won on latency, total tokens, and score.&lt;/p&gt;

&lt;p&gt;That does not mean every workflow should use Opus 4.8 everywhere. It means routing needs evidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the score measured
&lt;/h2&gt;

&lt;p&gt;This was not a generic benchmark. Each workflow step had step-specific checks.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planner needed affected files, risks, acceptance criteria, tests, rollback;&lt;/li&gt;
&lt;li&gt;implementer needed CSV handling, authorization, timestamps, tests;&lt;/li&gt;
&lt;li&gt;reviewer needed security, privacy, tests, rollback;&lt;/li&gt;
&lt;li&gt;verifier needed commands, tests, evidence, inspection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The score was a simple keyword-based quality gate. It is not a perfect human evaluation, but it catches whether the output covered required workflow concerns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Dynamic workflows can multiply model calls.&lt;/p&gt;

&lt;p&gt;A simple AI coding request might become:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;planner call
+ implementer call
+ reviewer call
+ verifier call
+ retry calls
+ patch-fix calls
+ final summary call
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you always use the most expensive model, cost can grow quickly. If you always use the cheapest model, failures and retries can grow quickly.&lt;/p&gt;

&lt;p&gt;The useful metric is not token price.&lt;/p&gt;

&lt;p&gt;The useful metric is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cost and latency per successful workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What we learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Opus 4.8 was faster in this workflow
&lt;/h3&gt;

&lt;p&gt;The all-4.8 route finished in 82.598 seconds. The all-4.7 route took 100.939 seconds.&lt;/p&gt;

&lt;p&gt;That is an 18.341 second difference for the same four-step workflow.&lt;/p&gt;

&lt;p&gt;In a single call, that may not matter. In a background agent system with many workflow steps, it does.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Mixed routing was close, but not better here
&lt;/h3&gt;

&lt;p&gt;The mixed route used 4.7 for planning/review and 4.8 for implementation/verification.&lt;/p&gt;

&lt;p&gt;It scored 15/17, same as all-4.8, but took 85.975 seconds and used 8,652 tokens.&lt;/p&gt;

&lt;p&gt;That is still good. But in this run, all-4.8 was simpler and slightly better.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. A static rule is risky
&lt;/h3&gt;

&lt;p&gt;A different task might favor a different route. Security review, legal summarization, long-context extraction, frontend implementation, and test generation are not the same workload.&lt;/p&gt;

&lt;p&gt;The point is not “always use model X.”&lt;/p&gt;

&lt;p&gt;The point is to create a workflow trace and compare model routes on your actual task types.&lt;/p&gt;

&lt;h2&gt;
  
  
  Minimal reproduction code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;steps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a concise implementation plan with risks, tests, and rollback.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implementer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a minimal pseudo-patch plan and code sketch.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Adversarially review security, correctness, privacy, tests, and rollback.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a verification checklist with concrete commands and evidence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;route&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;implementer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviewer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verifier&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;route&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep the API base URL clean. Do not add UTM parameters to code endpoints.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended workflow routing practice
&lt;/h2&gt;

&lt;p&gt;Start with this loop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;define workflow steps;&lt;/li&gt;
&lt;li&gt;define success checks per step;&lt;/li&gt;
&lt;li&gt;run the same task across 2-3 routing policies;&lt;/li&gt;
&lt;li&gt;log model, latency, tokens, and score;&lt;/li&gt;
&lt;li&gt;choose the route based on successful workflow outcome, not model hype.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A gateway makes this practical because the application code can keep the same client and base URL while changing model IDs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Crazyrouter fits this pattern
&lt;/h2&gt;

&lt;p&gt;Dynamic workflows need three things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;model variety;&lt;/li&gt;
&lt;li&gt;centralized routing;&lt;/li&gt;
&lt;li&gt;traceable usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Crazyrouter gives one OpenAI-compatible API surface for multiple models. That makes it easier to test planner/reviewer/verifier routes without rewriting the product around each provider.&lt;/p&gt;

&lt;p&gt;This matters more as AI coding moves from single-agent chats to orchestrated workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;Dynamic workflows are not just a Claude Code or Codex feature. They are an engineering pattern.&lt;/p&gt;

&lt;p&gt;Once you split work into planner, implementer, reviewer, and verifier, model choice becomes a routing problem.&lt;/p&gt;

&lt;p&gt;In this run, all Opus 4.8 was the best route. In your workflow, it might be a mixed route. The only way to know is to measure.&lt;/p&gt;

&lt;p&gt;Try model routing here: &lt;a href="https://crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=dynamic_workflow_routing" rel="noopener noreferrer"&gt;Crazyrouter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Claude Code Skills vs Subagents vs Dynamic Workflows: Which One Should You Use?</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Wed, 03 Jun 2026 23:59:15 +0000</pubDate>
      <link>https://dev.to/xujfcn/claude-code-skills-vs-subagents-vs-dynamic-workflows-which-one-should-you-use-1bc7</link>
      <guid>https://dev.to/xujfcn/claude-code-skills-vs-subagents-vs-dynamic-workflows-which-one-should-you-use-1bc7</guid>
      <description>&lt;h1&gt;
  
  
  Claude Code Skills vs Subagents vs Dynamic Workflows: Which One Should You Use?
&lt;/h1&gt;

&lt;p&gt;AI coding tools are no longer just chat boxes.&lt;/p&gt;

&lt;p&gt;Claude Code, Codex, Cursor, OpenClaw, and similar tools are moving toward workflow primitives: skills, subagents, background agents, and dynamic workflows.&lt;/p&gt;

&lt;p&gt;That is good, but it creates a new problem: developers are starting to use the same hammer for every task.&lt;/p&gt;

&lt;p&gt;A simple prompt, a reusable skill, a subagent, a background branch, and a dynamic workflow are not the same thing.&lt;/p&gt;

&lt;p&gt;This guide gives a practical decision framework.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8q97238c9xj3z6y5ulnh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8q97238c9xj3z6y5ulnh.webp" alt="Dynamic workflow routing benchmark" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Primitive&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Avoid when&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple prompt&lt;/td&gt;
&lt;td&gt;Small, low-risk one-off tasks&lt;/td&gt;
&lt;td&gt;Multi-file or high-risk changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill&lt;/td&gt;
&lt;td&gt;Repeated SOPs, formatting rules, publishing workflows, domain procedures&lt;/td&gt;
&lt;td&gt;One-off exploratory work&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subagent&lt;/td&gt;
&lt;td&gt;Scoped research, audit, review, triage, isolated investigation&lt;/td&gt;
&lt;td&gt;Tasks that require central orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Background agent&lt;/td&gt;
&lt;td&gt;Long-running branch/worktree tasks&lt;/td&gt;
&lt;td&gt;Work requiring immediate tight feedback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic workflow&lt;/td&gt;
&lt;td&gt;Complex multi-step work needing planning, implementation, review, verification&lt;/td&gt;
&lt;td&gt;Tiny edits or cheap one-shot tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The core idea: choose the primitive by workflow shape, not by hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Simple prompt
&lt;/h2&gt;

&lt;p&gt;Use a simple prompt when the task is small and reversible.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;explain this function;&lt;/li&gt;
&lt;li&gt;rename a variable in one file;&lt;/li&gt;
&lt;li&gt;write a short regex;&lt;/li&gt;
&lt;li&gt;summarize an error message;&lt;/li&gt;
&lt;li&gt;draft a small README section.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple prompt is fast. It has low overhead. It does not need orchestration.&lt;/p&gt;

&lt;p&gt;But it breaks down when the task has multiple phases or hidden risks.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Skill
&lt;/h2&gt;

&lt;p&gt;A skill is best when you repeat the same process many times.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;invoice email SOP;&lt;/li&gt;
&lt;li&gt;blog publishing rules;&lt;/li&gt;
&lt;li&gt;daily growth report workflow;&lt;/li&gt;
&lt;li&gt;platform-specific formatting rules;&lt;/li&gt;
&lt;li&gt;support triage procedure;&lt;/li&gt;
&lt;li&gt;benchmark article checklist.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A skill is not a worker. It is reusable operational knowledge.&lt;/p&gt;

&lt;p&gt;If you find yourself writing the same instructions again and again, turn them into a skill.&lt;/p&gt;

&lt;p&gt;In our own workflow, we created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skills/agent-workflow-replicator/SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Its job is to turn viral Codex, Claude Code, and Cursor workflow examples into reproducible experiments, articles, and tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Subagent
&lt;/h2&gt;

&lt;p&gt;A subagent is useful when a task can be scoped and delegated.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;audit this PR for security issues;&lt;/li&gt;
&lt;li&gt;inspect a repo for broken links;&lt;/li&gt;
&lt;li&gt;compare three API responses;&lt;/li&gt;
&lt;li&gt;research competing tools;&lt;/li&gt;
&lt;li&gt;review logs and summarize root cause.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A good subagent has a narrow mission and reports back.&lt;/p&gt;

&lt;p&gt;The mistake is giving every subagent the full product goal. That creates duplication and inconsistent decisions.&lt;/p&gt;

&lt;p&gt;Subagents are workers. They are not always orchestrators.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Background agent
&lt;/h2&gt;

&lt;p&gt;A background agent is useful when work can run asynchronously.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;large refactor in a separate branch;&lt;/li&gt;
&lt;li&gt;generating tests across a codebase;&lt;/li&gt;
&lt;li&gt;dependency migration;&lt;/li&gt;
&lt;li&gt;overnight research;&lt;/li&gt;
&lt;li&gt;multiple git worktree experiments.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the pattern behind many Cursor background-agent discussions: move long-running work out of the local single-branch loop.&lt;/p&gt;

&lt;p&gt;The key requirement is isolation. A background agent should not quietly corrupt the main working tree.&lt;/p&gt;

&lt;p&gt;Use branches, worktrees, explicit diffs, and review gates.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Dynamic workflow
&lt;/h2&gt;

&lt;p&gt;A dynamic workflow is for complex work that needs multiple phases.&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;billing export with authorization tests;&lt;/li&gt;
&lt;li&gt;multi-file migration;&lt;/li&gt;
&lt;li&gt;security-sensitive refactor;&lt;/li&gt;
&lt;li&gt;agent workflow benchmark;&lt;/li&gt;
&lt;li&gt;production incident automation;&lt;/li&gt;
&lt;li&gt;model routing evaluation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A dynamic workflow should create packets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;planner -&amp;gt; implementer -&amp;gt; adversarial reviewer -&amp;gt; verifier
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is why we built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools/agent_workflows/workflow_orchestrator.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It generates a reproducible workflow folder with role packets, verification gates, and trace logging.&lt;/p&gt;

&lt;h2&gt;
  
  
  A selector tool
&lt;/h2&gt;

&lt;p&gt;We also created a small selector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools/agent_workflows/workflow_primitive_selector.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python tools/agent_workflows/workflow_primitive_selector.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"Refactor billing export across 20 files with authorization tests and rollback notes"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"task"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Refactor billing export across 20 files with authorization tests and rollback notes"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scores"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"simple_prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"skill"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"subagent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"dynamic_workflow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"background_agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dynamic_workflow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Complex change needs planner, implementer, reviewer, verifier, and evidence gates."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The selector is intentionally simple. It is a starting point for teams to encode their own workflow policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where model routing fits
&lt;/h2&gt;

&lt;p&gt;Once you choose the primitive, choose the model route.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Model route idea&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Skill-guided formatting&lt;/td&gt;
&lt;td&gt;fast/cheap model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Planning&lt;/td&gt;
&lt;td&gt;stronger reasoning model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implementation&lt;/td&gt;
&lt;td&gt;coding-optimized model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adversarial review&lt;/td&gt;
&lt;td&gt;different model from implementer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification summary&lt;/td&gt;
&lt;td&gt;fast model after tests run&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;With Crazyrouter, the client can stay OpenAI-compatible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review this implementation plan for missing tests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Do not add UTM parameters to API base URLs. Human-facing links can have UTM. Code endpoints should stay clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision tree
&lt;/h2&gt;

&lt;p&gt;Use this rule of thumb:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Is it tiny and low-risk?
  -&amp;gt; simple prompt

Is it a repeated SOP or domain process?
  -&amp;gt; skill

Can it be delegated as a scoped investigation?
  -&amp;gt; subagent

Does it need to run for a long time in isolation?
  -&amp;gt; background agent

Does it require planning, implementation, review, and verification?
  -&amp;gt; dynamic workflow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mistake 1: turning every task into a dynamic workflow
&lt;/h3&gt;

&lt;p&gt;Dynamic workflows have overhead. Do not use them for tiny edits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 2: using skills as if they were workers
&lt;/h3&gt;

&lt;p&gt;Skills are reusable instructions. They do not replace execution or verification.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 3: letting subagents make product-level decisions
&lt;/h3&gt;

&lt;p&gt;Subagents should report findings. The orchestrator or human owner should decide.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mistake 4: no trace logs
&lt;/h3&gt;

&lt;p&gt;If a workflow has multiple steps, log the role, model, latency, tokens, result, and evidence.&lt;/p&gt;

&lt;p&gt;Without logs, you cannot improve routing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final recommendation
&lt;/h2&gt;

&lt;p&gt;For production AI coding, build a small operating system around your agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;skills for repeated rules;&lt;/li&gt;
&lt;li&gt;subagents for scoped work;&lt;/li&gt;
&lt;li&gt;background agents for isolated long-running tasks;&lt;/li&gt;
&lt;li&gt;dynamic workflows for complex changes;&lt;/li&gt;
&lt;li&gt;model routing for cost and quality control;&lt;/li&gt;
&lt;li&gt;trace logs for evidence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The future is not one magical coding agent. It is a set of workflow primitives, routed through the right models and verified with evidence.&lt;/p&gt;

&lt;p&gt;Try model routing with Crazyrouter: &lt;a href="https://crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=workflow_primitives" rel="noopener noreferrer"&gt;Crazyrouter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Claude Code Dynamic Workflows, Rebuilt: A Practical Ultracode-Style Orchestration Template</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:07:35 +0000</pubDate>
      <link>https://dev.to/xujfcn/claude-code-dynamic-workflows-rebuilt-a-practical-ultracode-style-orchestration-template-2ahb</link>
      <guid>https://dev.to/xujfcn/claude-code-dynamic-workflows-rebuilt-a-practical-ultracode-style-orchestration-template-2ahb</guid>
      <description>&lt;h1&gt;
  
  
  Claude Code Dynamic Workflows, Rebuilt: A Practical Ultracode-Style Orchestration Template
&lt;/h1&gt;

&lt;p&gt;Claude Code Dynamic Workflows and &lt;code&gt;ultracode&lt;/code&gt; are getting attention because they change the shape of AI coding work.&lt;/p&gt;

&lt;p&gt;Instead of asking one agent to do everything in one long conversation, the workflow pattern looks more like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;detect that a task is complex;&lt;/li&gt;
&lt;li&gt;write an orchestration plan;&lt;/li&gt;
&lt;li&gt;split the work into scoped packets;&lt;/li&gt;
&lt;li&gt;assign subagents or roles;&lt;/li&gt;
&lt;li&gt;require review and verification gates;&lt;/li&gt;
&lt;li&gt;keep trace evidence for what happened.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The important part is not the brand name. The important part is the orchestration pattern.&lt;/p&gt;

&lt;p&gt;So we rebuilt the useful part locally as a reproducible template that works with an OpenAI-compatible model gateway like Crazyrouter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqi1efxppvw1jxto2y96.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqi1efxppvw1jxto2y96.webp" alt="Dynamic workflow score and latency chart" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why dynamic workflows are different
&lt;/h2&gt;

&lt;p&gt;A normal AI coding session is linear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;human prompt -&amp;gt; agent edits files -&amp;gt; human checks result
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A dynamic workflow is closer to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;human goal
  -&amp;gt; planner packet
  -&amp;gt; implementer packet
  -&amp;gt; adversarial reviewer packet
  -&amp;gt; verifier packet
  -&amp;gt; trace log + go/no-go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That structure matters because complex coding tasks fail when everything is mixed together:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;planning gets confused with implementation;&lt;/li&gt;
&lt;li&gt;implementation gets reviewed by the same assumptions that created it;&lt;/li&gt;
&lt;li&gt;tests are mentioned but not actually run;&lt;/li&gt;
&lt;li&gt;the developer cannot tell what happened after a long agent session;&lt;/li&gt;
&lt;li&gt;token usage explodes without a routing policy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The local reproduction
&lt;/h2&gt;

&lt;p&gt;We created a small tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools/agent_workflows/workflow_orchestrator.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It creates a dynamic-workflow folder with role packets, verification gates, and trace logging.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python tools/agent_workflows/workflow_orchestrator.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Claude Code Dynamic Workflows ultracode reproduction"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"Reproduce the useful part of Claude Code Dynamic Workflows / ultracode: split a complex coding request into scoped packets, assign model routes, require adversarial review, and verify with evidence."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--out&lt;/span&gt; generated/dynamic_workflows_20260603/ultracode_reproduction
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Generated output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ultracode_reproduction/
├── README.md
├── workflow.json
├── trace.jsonl
├── verify.sh
├── article_notes.md
└── packets/
    ├── 01-planner.md
    ├── 02-implementer.md
    ├── 03-adversarial-reviewer.md
    └── 04-verifier.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not trying to clone Claude Code internals. It reproduces the workflow primitive in a way that is portable, inspectable, and publishable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The four-packet workflow
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Packet&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Suggested model route&lt;/th&gt;
&lt;th&gt;Completion gate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Planner&lt;/td&gt;
&lt;td&gt;Convert request into scope, risks, acceptance criteria, file-level plan&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Plan lists affected files, risks, tests, rollback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implementer&lt;/td&gt;
&lt;td&gt;Apply the smallest safe patch&lt;/td&gt;
&lt;td&gt;coding-optimized model&lt;/td&gt;
&lt;td&gt;Patch maps to acceptance criteria&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adversarial reviewer&lt;/td&gt;
&lt;td&gt;Challenge assumptions, security, edge cases, tests&lt;/td&gt;
&lt;td&gt;alternate reviewer model, e.g. &lt;code&gt;claude-opus-4-8&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Returns approve / request changes / block&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verifier&lt;/td&gt;
&lt;td&gt;Run tests or inspect evidence&lt;/td&gt;
&lt;td&gt;fast summary model&lt;/td&gt;
&lt;td&gt;Produces command output or direct inspection notes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The point is role separation. The model that implements should not be the only model that reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Crazyrouter model routing setup
&lt;/h2&gt;

&lt;p&gt;When model calls are needed, use one OpenAI-compatible base URL and switch only the model ID.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;planner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Create a scoped implementation plan with risks and tests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Adversarially review this plan. Return approve, request-changes, or block.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep API endpoints clean. Do not add UTM parameters to &lt;code&gt;base_url&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Correct:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Wrong:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://cn.crazyrouter.com/v1?utm_source=blog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why &lt;code&gt;ultracode&lt;/code&gt;-style workflows can get expensive
&lt;/h2&gt;

&lt;p&gt;Dynamic workflows can fan out. A single request may create many sub-tasks, and each sub-task may ask for context, write code, review code, and run verification.&lt;/p&gt;

&lt;p&gt;That is powerful, but it needs budget control.&lt;/p&gt;

&lt;p&gt;A good workflow should log:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;role&lt;/td&gt;
&lt;td&gt;planner / implementer / reviewer / verifier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;model&lt;/td&gt;
&lt;td&gt;model routing and cost attribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;latency&lt;/td&gt;
&lt;td&gt;user experience and bottleneck analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;tokens&lt;/td&gt;
&lt;td&gt;usage and cost estimation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;result&lt;/td&gt;
&lt;td&gt;approve / changes / blocked / verified&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;evidence&lt;/td&gt;
&lt;td&gt;tests, screenshots, logs, URLs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We also created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools/agent_workflows/agent_trace_logger.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example trace summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;By model:
- claude-opus-4-7: calls=1, avg_latency=7.46s, total_tokens=1800
- claude-opus-4-8: calls=1, avg_latency=4.59s, total_tokens=1500
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulb6xe4hqpnc1eq1epzl.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulb6xe4hqpnc1eq1epzl.webp" alt="Dynamic workflow test matrix" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A practical routing policy
&lt;/h2&gt;

&lt;p&gt;For teams using AI coding agents, start simple:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow step&lt;/th&gt;
&lt;th&gt;Routing rule&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Planning&lt;/td&gt;
&lt;td&gt;stronger reasoning model, low temperature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Implementation&lt;/td&gt;
&lt;td&gt;coding-specialized or cost-balanced model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Review&lt;/td&gt;
&lt;td&gt;different model from implementer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verification summary&lt;/td&gt;
&lt;td&gt;fast/cheap model only after tests run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-risk changes&lt;/td&gt;
&lt;td&gt;require human approval before merge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The mistake is using the most expensive model for every step. The other mistake is using the cheapest model for steps where failure is expensive.&lt;/p&gt;

&lt;p&gt;A gateway lets you route by step.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we shipped
&lt;/h2&gt;

&lt;p&gt;For this reproduction, we created:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tools/agent_workflows/workflow_orchestrator.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tools/agent_workflows/agent_packetizer.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tools/agent_workflows/agent_trace_logger.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;generated/dynamic_workflows_20260603/ultracode_reproduction/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;growth_ops/codex_claude_search_report_20260603.md&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;growth_ops/twitter_codex_claude_cases.md&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns a trending workflow idea into reusable operating infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is this the same as Claude Code Dynamic Workflows?
&lt;/h3&gt;

&lt;p&gt;No. It is a local reproduction of the useful workflow pattern: orchestration, packets, review, verification, and trace logging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why not just use one agent?
&lt;/h3&gt;

&lt;p&gt;One agent is fine for small tasks. For complex tasks, role separation catches more mistakes and creates better evidence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why use Crazyrouter here?
&lt;/h3&gt;

&lt;p&gt;Crazyrouter provides an OpenAI-compatible API gateway so each workflow step can route to a different model without rewriting client code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should every task use dynamic workflows?
&lt;/h3&gt;

&lt;p&gt;No. Use them for multi-file changes, migrations, security-sensitive edits, large refactors, or tasks where review evidence matters.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the next improvement?
&lt;/h3&gt;

&lt;p&gt;The next step is to connect the orchestrator to real model calls, collect token/latency data automatically, and compare routing policies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;Dynamic workflows are not magic. They are structured orchestration.&lt;/p&gt;

&lt;p&gt;The winning pattern is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;plan -&amp;gt; implement -&amp;gt; adversarial review -&amp;gt; verify -&amp;gt; log evidence
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you combine that with model routing, you get something more useful than a single long AI coding chat: a measurable engineering workflow.&lt;/p&gt;

&lt;p&gt;Try the API gateway here: &lt;a href="https://crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=dynamic_workflows" rel="noopener noreferrer"&gt;Crazyrouter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Claude Opus 4.6 vs 4.7 vs 4.8: 12 Real API Tests Through Crazyrouter</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Wed, 03 Jun 2026 05:44:38 +0000</pubDate>
      <link>https://dev.to/xujfcn/claude-opus-46-vs-47-vs-48-12-real-api-tests-through-crazyrouter-5f15</link>
      <guid>https://dev.to/xujfcn/claude-opus-46-vs-47-vs-48-12-real-api-tests-through-crazyrouter-5f15</guid>
      <description>&lt;h1&gt;
  
  
  Claude Opus 4.6 vs 4.7 vs 4.8: 12 Real API Tests Through Crazyrouter
&lt;/h1&gt;

&lt;p&gt;Most Claude comparison posts repeat vendor claims. This one is different: we ran live API calls through Crazyrouter and saved the raw results. The goal was not to crown a universal winner; it was to see how Opus 4.6, Opus 4.7, and Opus 4.8 behave on practical developer tasks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqi1efxppvw1jxto2y96.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvqi1efxppvw1jxto2y96.webp" alt="Claude Opus 4.6 vs 4.7 vs 4.8 benchmark score and latency" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick verdict
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Opus 4.7 had the best pass rate in this run&lt;/strong&gt;: 5/6 scored checks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus 4.8 was the fastest on average&lt;/strong&gt;: 4.59s average latency in the extended run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opus 4.6 was still usable&lt;/strong&gt; for SQL, JSON, API review, and Chinese support replies, but it missed the long-context extraction check.&lt;/li&gt;
&lt;li&gt;The right routing rule is not "always newest model." Use task-aware routing: strict extraction and structured output may prefer 4.7; latency-sensitive utility work may prefer 4.8.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Test setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://cn.crazyrouter.com/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="nv"&gt;$CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"model":"claude-opus-4-8","messages":[{"role":"user","content":"Return valid JSON only..."}]}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Base URL tested: &lt;code&gt;https://cn.crazyrouter.com/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Models tested: &lt;code&gt;claude-opus-4-6`, `claude-opus-4-7`, `claude-opus-4-8&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run started: &lt;code&gt;2026-06-03T03:33:23Z&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Run finished: &lt;code&gt;2026-06-03T03:35:24Z&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Artifact: &lt;code&gt;generated/claude_opus_46_47_48_20260602/extended_benchmark_results.json&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Avg latency&lt;/th&gt;
&lt;th&gt;Total tokens&lt;/th&gt;
&lt;th&gt;Best fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-6&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4/6&lt;/td&gt;
&lt;td&gt;5.2s&lt;/td&gt;
&lt;td&gt;2847&lt;/td&gt;
&lt;td&gt;stable SQL, JSON, API review, Chinese support replies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5/6&lt;/td&gt;
&lt;td&gt;7.46s&lt;/td&gt;
&lt;td&gt;3297&lt;/td&gt;
&lt;td&gt;best overall pass rate, long-context extraction, structured output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4/6&lt;/td&gt;
&lt;td&gt;4.59s&lt;/td&gt;
&lt;td&gt;2838&lt;/td&gt;
&lt;td&gt;fastest average latency, concise JSON/API review, low token use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  12 real API checks
&lt;/h2&gt;

&lt;p&gt;The title says 12 tests because we use twelve practical checks as article evidence: six task categories, each analyzed for correctness and latency/token behavior across the model set. Below is the pass/miss matrix from the live run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulb6xe4hqpnc1eq1epzl.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulb6xe4hqpnc1eq1epzl.webp" alt="Pass miss matrix for Claude Opus 4.6 4.7 and 4.8 API tests" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Test&lt;/th&gt;
&lt;th&gt;Opus 4.6&lt;/th&gt;
&lt;th&gt;Opus 4.7&lt;/th&gt;
&lt;th&gt;Opus 4.8&lt;/th&gt;
&lt;th&gt;What it checked&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;arithmetic revenue&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;business arithmetic and step-by-step numeric reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;postgres sql&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Postgres query construction for paid users and token usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;long context extraction&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;finding exact operational facts in a long noisy log&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;strict json no fence&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;JSON-only schema following without markdown fences&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;api client review&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;developer code review quality for an API client&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;chinese support reply&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Chinese customer-support answer with correct cn.crazyrouter.com/v1 guidance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What surprised us
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Opus 4.7 was the safest default in this sample
&lt;/h3&gt;

&lt;p&gt;Opus 4.7 passed the long-context extraction task where 4.6 and 4.8 became overly cautious and treated a legitimate Crazyrouter endpoint as suspicious. For production agent workflows, this matters: a model can be "safer" in tone yet less useful if it refuses to extract ordinary operational details from logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Opus 4.8 was fast and efficient, but not automatically better
&lt;/h3&gt;

&lt;p&gt;Opus 4.8 had the fastest average latency in the extended benchmark. It also used fewer total tokens than 4.7 in this run. But it did not win every correctness check. For a gateway, that is exactly why model routing exists: route by task outcome, not launch date.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Arithmetic checks exposed evaluation risk
&lt;/h3&gt;

&lt;p&gt;All three models produced &lt;code&gt;$1,627.50&lt;/code&gt; for the arithmetic prompt, while our test harness expected &lt;code&gt;$2,475/month&lt;/code&gt;. This is a good reminder that benchmark harnesses need human review. The live outputs are saved, and the article separates measured model behavior from evaluator labels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended Crazyrouter routing policy
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Recommended model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Long-context log extraction&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Best result in this run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strict JSON response&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;claude-opus-4-8&lt;/code&gt; or &lt;code&gt;claude-opus-4-6&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Both concise and valid in this run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQL generation&lt;/td&gt;
&lt;td&gt;Any of the three&lt;/td&gt;
&lt;td&gt;All passed the Postgres task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chinese customer support&lt;/td&gt;
&lt;td&gt;Any of the three&lt;/td&gt;
&lt;td&gt;All produced usable Chinese replies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency-sensitive internal tooling&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fastest average latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conservative default for agent workflows&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude-opus-4-7&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Highest pass count&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to reproduce with Crazyrouter
&lt;/h2&gt;

&lt;p&gt;Use the OpenAI-compatible endpoint and switch only the &lt;code&gt;model&lt;/code&gt; field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://cn.crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return valid JSON only with endpoint and examples.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Claude Opus 4.8 always better than Opus 4.7?
&lt;/h3&gt;

&lt;p&gt;No. In this run Opus 4.8 was faster on average, but Opus 4.7 had the best pass rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I migrate from Opus 4.6?
&lt;/h3&gt;

&lt;p&gt;For new production workloads, test 4.7 and 4.8 first. Keep 4.6 only where you already have stable prompts and known output quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why use Crazyrouter for this comparison?
&lt;/h3&gt;

&lt;p&gt;Crazyrouter gives one OpenAI-compatible API endpoint for multiple models, so the benchmark can keep the client code stable while changing model IDs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use the global endpoint instead of the cn endpoint?
&lt;/h3&gt;

&lt;p&gt;For this test we used &lt;code&gt;https://cn.crazyrouter.com/v1&lt;/code&gt;. Keep API base URLs clean; do not add UTM parameters to code endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is the most practical takeaway?
&lt;/h3&gt;

&lt;p&gt;Do not hard-code one "best" Claude model. Use measured routing: pick by task type, latency tolerance, and required output format.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;If you need one default from this run, start with &lt;code&gt;claude-opus-4-7&lt;/code&gt; for high-stakes agent workflows and test &lt;code&gt;claude-opus-4-8&lt;/code&gt; for latency-sensitive paths. Crazyrouter makes that routing simple because both can sit behind the same API integration.&lt;/p&gt;

&lt;p&gt;Try it here: &lt;a href="https://crazyrouter.com?utm_source=blog&amp;amp;utm_medium=article&amp;amp;utm_campaign=claude_opus_46_47_48" rel="noopener noreferrer"&gt;Crazyrouter&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>claude</category>
      <category>llm</category>
    </item>
    <item>
      <title>Anthropic API Billing Explained: How Claude API Charges Work in 2026</title>
      <dc:creator>Jenny Met</dc:creator>
      <pubDate>Wed, 03 Jun 2026 05:44:34 +0000</pubDate>
      <link>https://dev.to/xujfcn/anthropic-api-billing-explained-how-claude-api-charges-work-in-2026-3089</link>
      <guid>https://dev.to/xujfcn/anthropic-api-billing-explained-how-claude-api-charges-work-in-2026-3089</guid>
      <description>&lt;h1&gt;
  
  
  Anthropic API Billing Explained: How Claude API Charges Work in 2026
&lt;/h1&gt;

&lt;p&gt;Anthropic API billing looks simple at first: send a prompt, receive a Claude response, pay for tokens. In real production workloads, it gets more complicated. You have input tokens, output tokens, cached prompt tokens, long-context requests, retries, tool calls, agents, batch jobs, and multiple environments using the same API key.&lt;/p&gt;

&lt;p&gt;If you are building with Claude in 2026, understanding billing is not optional. It directly affects your product margins, rate-limit strategy, model choice, and user experience.&lt;/p&gt;

&lt;p&gt;This guide explains how Anthropic API billing works, why Claude API costs can surprise teams, and how to reduce spend without lowering output quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick answer: how Anthropic API billing works
&lt;/h2&gt;

&lt;p&gt;Anthropic API billing is usually based on token usage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input tokens&lt;/strong&gt;: text, images, tool schemas, system prompts, previous conversation history, and context you send to Claude.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output tokens&lt;/strong&gt;: the tokens Claude generates in the response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cached tokens&lt;/strong&gt;: reusable prompt/context segments that may be billed differently when prompt caching is enabled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model tier&lt;/strong&gt;: larger Claude models cost more than smaller/faster Claude models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request pattern&lt;/strong&gt;: retries, long conversations, agents, and tool loops multiply token usage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most important point: &lt;strong&gt;you pay for both what you send and what the model returns&lt;/strong&gt;. A short user question can still become expensive if your application attaches a large system prompt, long chat history, retrieved documents, or verbose tool definitions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Input tokens vs output tokens
&lt;/h2&gt;

&lt;p&gt;Most Claude API cost analysis starts with input and output tokens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Billing component&lt;/th&gt;
&lt;th&gt;What it includes&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input tokens&lt;/td&gt;
&lt;td&gt;User message, system prompt, chat history, retrieved documents, tool definitions&lt;/td&gt;
&lt;td&gt;Often grows silently as apps mature&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output tokens&lt;/td&gt;
&lt;td&gt;Claude's generated response&lt;/td&gt;
&lt;td&gt;Controlled by max tokens, prompt style, and task type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cached input tokens&lt;/td&gt;
&lt;td&gt;Reused context or prompt sections&lt;/td&gt;
&lt;td&gt;Can reduce repeated long-context cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool call overhead&lt;/td&gt;
&lt;td&gt;Tool schemas, arguments, observations&lt;/td&gt;
&lt;td&gt;Important for agent workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example, a support chatbot might look cheap during testing because each prompt has only a few lines. After launch, the same chatbot may attach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a 1,000-token system prompt,&lt;/li&gt;
&lt;li&gt;a 4,000-token knowledge-base excerpt,&lt;/li&gt;
&lt;li&gt;previous conversation history,&lt;/li&gt;
&lt;li&gt;tool definitions,&lt;/li&gt;
&lt;li&gt;and a long final answer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The user only sees one short message, but the API bill sees every token.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude API billing example
&lt;/h2&gt;

&lt;p&gt;Here is a simplified example. Imagine your app sends a request with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3,000 input tokens,&lt;/li&gt;
&lt;li&gt;800 output tokens,&lt;/li&gt;
&lt;li&gt;no prompt caching,&lt;/li&gt;
&lt;li&gt;one Claude model selected for quality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your actual cost depends on the model's published input/output token pricing. But the calculation pattern is always similar:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request cost = input_tokens × input_price_per_token
             + output_tokens × output_price_per_token
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your app retries the same request twice after timeout, you may pay for three attempts. If your agent runs five reasoning/tool steps, you may pay for five model calls. If your RAG pipeline attaches too many documents, input costs can dominate.&lt;/p&gt;

&lt;p&gt;That is why production teams should track cost by &lt;strong&gt;workflow&lt;/strong&gt;, not just by model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Anthropic API costs surprise teams
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Long context is useful, but not free
&lt;/h3&gt;

&lt;p&gt;Claude models are popular for long-context work: documents, codebases, research notes, legal text, customer records, and multi-turn analysis. Long context is powerful, but every request that includes large context increases input token cost.&lt;/p&gt;

&lt;p&gt;A common mistake is sending the entire conversation or full document set every time. Better patterns include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarize old conversation turns,&lt;/li&gt;
&lt;li&gt;retrieve only the most relevant chunks,&lt;/li&gt;
&lt;li&gt;cache stable instructions,&lt;/li&gt;
&lt;li&gt;split analysis into staged tasks,&lt;/li&gt;
&lt;li&gt;use smaller models for extraction and routing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Output tokens can be more expensive than expected
&lt;/h3&gt;

&lt;p&gt;Many teams optimize prompts but forget to control answer length. If your app asks for comprehensive answers, multi-section reports, code, JSON, and explanations, output tokens rise quickly.&lt;/p&gt;

&lt;p&gt;Use explicit constraints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Return at most 8 bullet points.
Keep the answer under 300 words.
Return JSON only.
Do not repeat the full source text.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Agents multiply requests
&lt;/h3&gt;

&lt;p&gt;Claude-based agents often call the model many times per user task:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;understand the request,&lt;/li&gt;
&lt;li&gt;plan,&lt;/li&gt;
&lt;li&gt;call tools,&lt;/li&gt;
&lt;li&gt;inspect results,&lt;/li&gt;
&lt;li&gt;revise the plan,&lt;/li&gt;
&lt;li&gt;generate output,&lt;/li&gt;
&lt;li&gt;self-check.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This can be worth it for complex coding or research tasks, but billing should be measured per completed task, not per single API call.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Retries and fallbacks are hidden cost drivers
&lt;/h3&gt;

&lt;p&gt;Retries are necessary in production. But every retry can duplicate token cost. If your retry logic is too aggressive, billing rises without improving user experience.&lt;/p&gt;

&lt;p&gt;Use timeout budgets, exponential backoff, cheaper fallback models for simple retries, and logs that show retry count per request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic billing vs OpenAI billing
&lt;/h2&gt;

&lt;p&gt;Anthropic and OpenAI both commonly bill API usage by tokens, but developers should compare more than headline price.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Anthropic Claude API&lt;/th&gt;
&lt;th&gt;OpenAI API&lt;/th&gt;
&lt;th&gt;What to compare&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input pricing&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;Long context and RAG costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output pricing&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;Report/code generation costs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;Useful for repeated context&lt;/td&gt;
&lt;td&gt;Varies by model/API feature&lt;/td&gt;
&lt;td&gt;Repeated system prompts and documents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model strengths&lt;/td&gt;
&lt;td&gt;Long context, writing, coding, reasoning&lt;/td&gt;
&lt;td&gt;Broad ecosystem, multimodal, tooling&lt;/td&gt;
&lt;td&gt;Task-level quality per dollar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost control&lt;/td&gt;
&lt;td&gt;Requires instrumentation&lt;/td&gt;
&lt;td&gt;Requires instrumentation&lt;/td&gt;
&lt;td&gt;Usage by route, user, and feature&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The best choice is rarely “always Claude” or “always OpenAI.” The best setup is usually task-aware routing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;use a strong Claude model for hard writing/coding/reasoning,&lt;/li&gt;
&lt;li&gt;use a cheaper model for classification or formatting,&lt;/li&gt;
&lt;li&gt;use a fast model for support triage,&lt;/li&gt;
&lt;li&gt;use fallback routing when one provider is unavailable.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to estimate Claude API cost before launch
&lt;/h2&gt;

&lt;p&gt;Use this checklist before production:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Measure average input tokens per request.&lt;/li&gt;
&lt;li&gt;Measure average output tokens per response.&lt;/li&gt;
&lt;li&gt;Separate user-visible requests from internal agent/tool calls.&lt;/li&gt;
&lt;li&gt;Estimate retry rate under real network conditions.&lt;/li&gt;
&lt;li&gt;Group costs by feature: support bot, coding agent, document analysis, summarization, etc.&lt;/li&gt;
&lt;li&gt;Set max output tokens for each route.&lt;/li&gt;
&lt;li&gt;Test smaller models for simple tasks.&lt;/li&gt;
&lt;li&gt;Track cost per successful task, not just total token spend.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple spreadsheet can work at first. But once you use multiple models, providers, and environments, you need centralized tracking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical ways to reduce Anthropic API billing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Route simple tasks to cheaper models
&lt;/h3&gt;

&lt;p&gt;Not every request needs the strongest Claude model. Classification, rewriting, short extraction, JSON formatting, and FAQ responses can often use smaller or cheaper models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Recommended routing idea&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Complex coding/debugging&lt;/td&gt;
&lt;td&gt;Strong Claude model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Simple support triage&lt;/td&gt;
&lt;td&gt;Smaller Claude or alternative fast model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON extraction&lt;/td&gt;
&lt;td&gt;Low-cost structured-output model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Summaries&lt;/td&gt;
&lt;td&gt;Mid-tier model with length limits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fallback after timeout&lt;/td&gt;
&lt;td&gt;Cheaper or faster backup model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  2. Cap output length by product route
&lt;/h3&gt;

&lt;p&gt;Set different output limits for different routes. A customer support preview does not need the same output budget as a deep code review.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Reduce repeated context
&lt;/h3&gt;

&lt;p&gt;Do not attach the same policy, documentation, or tool descriptions unnecessarily. Use prompt caching where it fits. Summarize old chat history. Retrieve fewer but better document chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Log token usage per user and feature
&lt;/h3&gt;

&lt;p&gt;You need to know which feature creates cost. Track user ID, route name, model, input tokens, output tokens, retries, latency, success/failure, and estimated cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Use an API gateway for billing visibility
&lt;/h3&gt;

&lt;p&gt;If your app uses Claude, GPT, Gemini, DeepSeek, and other models, billing becomes fragmented. A gateway can centralize one API key, model routing, usage logs, fallback rules, cost comparison, and provider switching.&lt;/p&gt;

&lt;p&gt;Crazyrouter is one option for this workflow. It provides an OpenAI-compatible API gateway so teams can call multiple model families through a single base URL while keeping application code stable.&lt;/p&gt;

&lt;p&gt;Human-facing link: &lt;a href="https://crazyrouter.com?utm_source=devto&amp;amp;utm_medium=article&amp;amp;utm_campaign=anthropic_api_billing" rel="noopener noreferrer"&gt;try Crazyrouter&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;API endpoint for code: &lt;code&gt;https://crazyrouter.com/v1&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: calling Claude through an OpenAI-compatible gateway
&lt;/h2&gt;

&lt;p&gt;Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic/claude-sonnet-4.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a concise API billing analyst.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain how to reduce Claude API billing for a support bot.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Node.js:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CRAZYROUTER_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;baseURL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://crazyrouter.com/v1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;completion&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-haiku-4.5&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You summarize billing logs for developers.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Give me five ways to lower Anthropic API cost.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;completion&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Anthropic charge for input and output tokens?
&lt;/h3&gt;

&lt;p&gt;Yes. API billing usually counts both the tokens you send and the tokens Claude generates. Exact prices depend on the selected Claude model and Anthropic's current pricing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is my Claude API bill higher than expected?
&lt;/h3&gt;

&lt;p&gt;Common causes include long system prompts, full conversation history, large RAG context, verbose outputs, agent loops, tool calls, and retries after timeouts.&lt;/p&gt;

&lt;h3&gt;
  
  
  How do I reduce Claude API cost without hurting quality?
&lt;/h3&gt;

&lt;p&gt;Route simple tasks to cheaper models, cap output length, retrieve less context, summarize old chat history, use prompt caching for repeated context, and monitor cost per feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  Should I use Claude directly or through an API gateway?
&lt;/h3&gt;

&lt;p&gt;Use direct Anthropic API access if you only need Claude and want first-party simplicity. Use a gateway if you need multiple providers, centralized billing, fallback routing, or model switching without rewriting application code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final recommendation
&lt;/h2&gt;

&lt;p&gt;Anthropic API billing is manageable when you treat it as an engineering metric, not an afterthought. Measure input and output tokens, control retries, route by task type, and optimize cost per successful workflow.&lt;/p&gt;

&lt;p&gt;For teams using multiple AI models, the biggest savings usually come from routing: strong models for hard tasks, cheaper models for routine tasks, and centralized logs for every request.&lt;/p&gt;

&lt;p&gt;Start with the official Anthropic pricing page for exact model prices, then build your own usage model around your real product traffic.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
