<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 1bcMax</title>
    <description>The latest articles on DEV Community by 1bcMax (@1bcmax).</description>
    <link>https://dev.to/1bcmax</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3836331%2Fa7b9a36f-861e-4fd0-beaf-32a83d08ad22.jpeg</url>
      <title>DEV Community: 1bcMax</title>
      <link>https://dev.to/1bcmax</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/1bcmax"/>
    <language>en</language>
    <item>
      <title>We Read 100 OpenClaw Issues About OpenRouter. Here's What We Built Instead.</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 23:29:27 +0000</pubDate>
      <link>https://dev.to/1bcmax/we-read-100-openclaw-issues-about-openrouter-heres-what-we-built-instead-3ohn</link>
      <guid>https://dev.to/1bcmax/we-read-100-openclaw-issues-about-openrouter-heres-what-we-built-instead-3ohn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;OpenRouter is the most popular LLM aggregator. It's also the source of the most frustration in OpenClaw's issue tracker.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-100-openclaw-issues-intro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-100-openclaw-issues-intro.png" alt="Reading 100 OpenClaw Issues Built a Better Router" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Data
&lt;/h2&gt;

&lt;p&gt;We searched &lt;a href="https://github.com/openclaw/openclaw/issues" rel="noopener noreferrer"&gt;OpenClaw's GitHub issues&lt;/a&gt; for "openrouter" and read every result. &lt;strong&gt;100 issues.&lt;/strong&gt; Open and closed. Filed by users who ran into the same structural problems over and over.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Issues&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Broken fallback / failover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~20&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/22136" rel="noopener noreferrer"&gt;#22136&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/45663" rel="noopener noreferrer"&gt;#45663&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/50389" rel="noopener noreferrer"&gt;#50389&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/49079" rel="noopener noreferrer"&gt;#49079&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model ID mangling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~15&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/49379" rel="noopener noreferrer"&gt;#49379&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/50711" rel="noopener noreferrer"&gt;#50711&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/25665" rel="noopener noreferrer"&gt;#25665&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/2373" rel="noopener noreferrer"&gt;#2373&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication / 401 errors&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~8&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/51056" rel="noopener noreferrer"&gt;#51056&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/34830" rel="noopener noreferrer"&gt;#34830&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/26960" rel="noopener noreferrer"&gt;#26960&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost / billing opacity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~6&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/25371" rel="noopener noreferrer"&gt;#25371&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/50738" rel="noopener noreferrer"&gt;#50738&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/38248" rel="noopener noreferrer"&gt;#38248&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Routing opacity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/7006" rel="noopener noreferrer"&gt;#7006&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/35842" rel="noopener noreferrer"&gt;#35842&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Missing feature parity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~10&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/46255" rel="noopener noreferrer"&gt;#46255&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/50485" rel="noopener noreferrer"&gt;#50485&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/30850" rel="noopener noreferrer"&gt;#30850&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate limit / key exhaustion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~4&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/8615" rel="noopener noreferrer"&gt;#8615&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/48729" rel="noopener noreferrer"&gt;#48729&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model catalog staleness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~5&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://github.com/openclaw/openclaw/issues/10687" rel="noopener noreferrer"&gt;#10687&lt;/a&gt;, &lt;a href="https://github.com/openclaw/openclaw/issues/30152" rel="noopener noreferrer"&gt;#30152&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't edge cases. They're structural consequences of how OpenRouter works: a middleman that adds latency, mangles model IDs, obscures routing decisions, and introduces its own failure modes on top of the providers it aggregates.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-anatomy-of-middleman-failure.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-anatomy-of-middleman-failure.png" alt="The Anatomy of Middleman Failure" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-architectural-shift-middleman-vs-local.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-architectural-shift-middleman-vs-local.png" alt="The Architectural Shift: Middleman vs. Local Router" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Broken Fallback — The #1 Pain Point
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/45663" rel="noopener noreferrer"&gt;#45663&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Provider returned error from OpenRouter does not trigger model failover."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/50389" rel="noopener noreferrer"&gt;#50389&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Rate limit errors surfaced to user instead of auto-failover."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When OpenRouter returns a 429 or provider error, OpenClaw's failover logic often doesn't recognize it as retriable. The user sees a raw error. The agent stops. ~20 issues document variations of this: HTTP 529 (Anthropic overloaded) not triggering fallback (&lt;a href="https://github.com/openclaw/openclaw/issues/49079" rel="noopener noreferrer"&gt;#49079&lt;/a&gt;), invalid model IDs causing 400 instead of failover (&lt;a href="https://github.com/openclaw/openclaw/issues/50017" rel="noopener noreferrer"&gt;#50017&lt;/a&gt;), timeouts in cron sessions with no recovery (&lt;a href="https://github.com/openclaw/openclaw/issues/49597" rel="noopener noreferrer"&gt;#49597&lt;/a&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;ClawRouter maintains &lt;strong&gt;8-deep fallback chains&lt;/strong&gt; per routing tier. When a model fails:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;200ms retry&lt;/strong&gt; — short-burst rate limits often recover in milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next model&lt;/strong&gt; — if retry fails, move to the next model in the chain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-model isolation&lt;/strong&gt; — one provider's failure doesn't poison the others&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;All-failed summary&lt;/strong&gt; — if every model in the chain fails, you get a structured error listing every attempt and failure reason
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ClawRouter] Trying model 1/6: google/gemini-2.5-flash
[ClawRouter] Model google/gemini-2.5-flash returned 429, retrying in 200ms...
[ClawRouter] Retry failed, trying model 2/6: deepseek/deepseek-chat
[ClawRouter] Success with model: deepseek/deepseek-chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No silent failures. No raw 429s surfaced to the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cascading-fallback-chains-429.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cascading-fallback-chains-429.png" alt="Surviving the 429: Cascading Fallback Chains" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Model ID Mangling — Death by Prefix
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/25665" rel="noopener noreferrer"&gt;#25665&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Model config defaults to &lt;code&gt;openrouter/openrouter/auto&lt;/code&gt; (double prefix)."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/50711" rel="noopener noreferrer"&gt;#50711&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Control UI model picker strips &lt;code&gt;openrouter/&lt;/code&gt; prefix."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenRouter uses nested model IDs: &lt;code&gt;openrouter/deepseek/deepseek-v3.2&lt;/code&gt;. OpenClaw's UI, Discord bot, and web gateway all handle these differently. Some add the prefix. Some strip it. Some double it. &lt;strong&gt;15 issues&lt;/strong&gt; trace back to model ID confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;Clean aliases. You say &lt;code&gt;sonnet&lt;/code&gt; and get &lt;code&gt;anthropic/claude-sonnet-4-6&lt;/code&gt;. You say &lt;code&gt;flash&lt;/code&gt; and get &lt;code&gt;google/gemini-2.5-flash&lt;/code&gt;. No nested prefixes. No double-prefix bugs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// resolveModelAlias() handles all normalization&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sonnet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;     &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;opus&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;       &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic/claude-opus-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;      &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;grok&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;       &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;xai/grok-4-0314&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deepseek&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;   &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deepseek/deepseek-chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One canonical format. No mangling. No UI inconsistency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-eliminating-model-id-mangling.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-eliminating-model-id-mangling.png" alt="Eliminating Model ID Mangling" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. API Key Hell — 401s, Leakage, and Rotation
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/51056" rel="noopener noreferrer"&gt;#51056&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"OpenRouter fails with '401 Missing Authentication header' despite valid key."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/8615" rel="noopener noreferrer"&gt;#8615&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Feature request: native multi-API-key support with load balancing and fallback."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;API keys are the root cause of an entire category of failures. Keys expire. Keys leak into LLM context — every provider sees every other provider's keys in the serialized request. Keys hit rate limits that can't be load-balanced. &lt;strong&gt;8 issues&lt;/strong&gt; document auth failures alone.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;ClawRouter has &lt;strong&gt;no API keys&lt;/strong&gt;. Zero.&lt;/p&gt;

&lt;p&gt;Payment happens via &lt;a href="https://x402.org/" rel="noopener noreferrer"&gt;x402&lt;/a&gt; — a cryptographic micropayment protocol. Your agent generates a wallet on first run (BIP-44 derivation, both EVM and Solana). Each request is signed with the wallet's private key. USDC moves per-request.&lt;/p&gt;

&lt;p&gt;No keys to leak. No keys to rotate. No keys to rate-limit. No keys to expire.&lt;/p&gt;

&lt;p&gt;The wallet is the identity. The signature is the authentication. Nothing to configure, nothing to paste into a config file, nothing for the LLM to accidentally serialize.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cryptographic-auth-x402-wallet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cryptographic-auth-x402-wallet.png" alt="Cryptographic Auth: The End of API Key Hell" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Cost and Billing Opacity — Surprise Bills
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/25371" rel="noopener noreferrer"&gt;#25371&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"OpenRouter 402 billing error misclassified as 'Context overflow', triggering auto-compaction that drains remaining credits faster."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/7006" rel="noopener noreferrer"&gt;#7006&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"&lt;code&gt;openrouter/auto&lt;/code&gt; doesn't expose which model was actually used or its cost."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When OpenRouter runs out of credits, it returns a 402 that OpenClaw misreads as a context overflow. OpenClaw then auto-compacts the context and retries — on the same empty balance. Each retry charges the compaction cost. Credits drain faster. The agent burns money trying to fix a billing error it doesn't understand.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Per-request cost visibility.&lt;/strong&gt; Every response includes cost headers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x-clawrouter-cost: 0.003400
x-clawrouter-savings: 82%
x-clawrouter-model: google/gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Per-request USDC payments.&lt;/strong&gt; No prepaid balance to drain. Each request shows its price before you pay. When the wallet is empty, requests don't fail — they &lt;strong&gt;fall back to the free tier&lt;/strong&gt; (NVIDIA GPT-OSS-120B).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget guard.&lt;/strong&gt; &lt;code&gt;maxCostPerRun&lt;/code&gt; caps per-session spending. Two modes: &lt;code&gt;graceful&lt;/code&gt; (downgrade to cheaper models) or &lt;code&gt;strict&lt;/code&gt; (hard stop). The $248/day heartbeat scenario is structurally impossible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usage logging.&lt;/strong&gt; Every request logs to &lt;code&gt;~/.openclaw/blockrun/logs/usage-YYYY-MM-DD.jsonl&lt;/code&gt; with model, tier, cost, baseline cost, savings, and latency. &lt;code&gt;/stats&lt;/code&gt; shows the breakdown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cost-visibility-session-guardrails.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cost-visibility-session-guardrails.png" alt="Absolute Cost Visibility and Session Guardrails" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Routing Opacity — "Which Model Did I Just Pay For?"
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/7006" rel="noopener noreferrer"&gt;#7006&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"No visibility into which model &lt;code&gt;openrouter/auto&lt;/code&gt; actually uses."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/35842" rel="noopener noreferrer"&gt;#35842&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Need explicit Claude Sonnet default instead of auto-routing."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When you use &lt;code&gt;openrouter/auto&lt;/code&gt;, you don't know what model served your request. You can't debug quality regressions. You can't understand cost spikes. You're paying for a black box.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;ClawRouter's routing is &lt;strong&gt;100% local&lt;/strong&gt;, open-source, and transparent.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;14-dimension weighted classifier&lt;/strong&gt; runs locally in &amp;lt;1ms. It scores every request across: token count, code presence, reasoning markers, technical terms, multi-step patterns, question complexity, tool signals, and more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Debug headers on every response:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;x-clawrouter-profile: auto
x-clawrouter-tier: MEDIUM
x-clawrouter-model: moonshot/kimi-k2.5
x-clawrouter-confidence: 0.87
x-clawrouter-reasoning: "Code task with moderate complexity"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SSE debug comments&lt;/strong&gt; in streaming responses show the routing decision inline. You always know which model, why it was selected, and how confident the classifier was.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Four routing profiles&lt;/strong&gt; give you explicit control:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Profile&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Savings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;auto&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Balanced quality + cost&lt;/td&gt;
&lt;td&gt;74–100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;eco&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cheapest possible&lt;/td&gt;
&lt;td&gt;95–100%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;premium&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Best quality always&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;free&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;NVIDIA GPT-OSS only&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No black box. No mystery routing. Full visibility, full control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-14-dimension-routing-classification.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-14-dimension-routing-classification.png" alt="Transparent Routing via 14-Dimension Classification" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Missing Feature Parity — Images, Tools, Caching
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/46255" rel="noopener noreferrer"&gt;#46255&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Images not passed to OpenRouter models."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/47707" rel="noopener noreferrer"&gt;#47707&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Mistral models fail with strict tool call ID requirements."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;OpenRouter doesn't always pass through provider-specific features correctly. Image payloads get dropped. Cache retention headers get ignored. Tool call ID formats cause silent failures with strict providers.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Vision auto-detection.&lt;/strong&gt; When &lt;code&gt;image_url&lt;/code&gt; content parts are detected, ClawRouter automatically filters the fallback chain to vision-capable models only. No images dropped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool calling validation.&lt;/strong&gt; Every model has a &lt;code&gt;toolCalling&lt;/code&gt; flag. When tools are present in the request, ClawRouter forces agentic routing tiers and excludes models without tool support. No silent tool call failures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Direct provider routing.&lt;/strong&gt; ClawRouter routes through BlockRun's API directly to providers — not through a second aggregator. One hop, not two. Provider-specific features work because there's no middleman translating them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-feature-parity-direct-connectivity.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-feature-parity-direct-connectivity.png" alt="Guaranteed Feature Parity and Direct Connectivity" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Model Catalog Staleness — "Where's the New Model?"
&lt;/h2&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/10687" rel="noopener noreferrer"&gt;#10687&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Need fully dynamic model discovery."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/30152" rel="noopener noreferrer"&gt;#30152&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Allowlist silently drops models not in catalog."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When new models launch, OpenRouter's catalog lags. Users configure a model that exists at the provider but isn't in the catalog. The request fails silently or gets rerouted.&lt;/p&gt;

&lt;h3&gt;
  
  
  How ClawRouter Solves This
&lt;/h3&gt;

&lt;p&gt;ClawRouter maintains a curated catalog of &lt;strong&gt;46+ models across 8 providers&lt;/strong&gt;, updated with each release. Delisted models have automatic redirect aliases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Delisted models redirect automatically&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;xai/grok-code-fast-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deepseek/deepseek-chat&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.0-pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;  &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-3.1-pro&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No silent drops. No stale catalog. Models are benchmarked for speed, quality, and tool support before inclusion.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cost-transparency-nexus-92-savings.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-cost-transparency-nexus-92-savings.png" alt="The Cost/Transparency Nexus" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;OpenRouter&lt;/th&gt;
&lt;th&gt;ClawRouter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Authentication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;API key (leak risk)&lt;/td&gt;
&lt;td&gt;Wallet signature (no keys)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Payment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Prepaid balance (custodial)&lt;/td&gt;
&lt;td&gt;Per-request USDC (non-custodial)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Routing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-side black box&lt;/td&gt;
&lt;td&gt;Local 14-dim classifier, &amp;lt;1ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fallback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Often broken (20+ issues)&lt;/td&gt;
&lt;td&gt;8-deep chains, per-model isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model IDs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Nested prefixes, mangling bugs&lt;/td&gt;
&lt;td&gt;Clean aliases, single format&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost visibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None per-request&lt;/td&gt;
&lt;td&gt;Headers + JSONL logs + &lt;code&gt;/stats&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Empty wallet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Request fails&lt;/td&gt;
&lt;td&gt;Auto-fallback to free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate limits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-key, shared&lt;/td&gt;
&lt;td&gt;Per-wallet, independent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vision support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Images sometimes dropped&lt;/td&gt;
&lt;td&gt;Auto-detected, vision-only fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool calling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Silent failures with some models&lt;/td&gt;
&lt;td&gt;Flag-based filtering, guaranteed support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model catalog&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Laggy, silent drops&lt;/td&gt;
&lt;td&gt;Curated 46+ models, redirect aliases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Budget control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Monthly invoice&lt;/td&gt;
&lt;td&gt;Per-session cap (&lt;code&gt;maxCostPerRun&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Create account, paste key&lt;/td&gt;
&lt;td&gt;Agent generates wallet, auto-configured&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Average cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$25/M tokens (Opus direct)&lt;/td&gt;
&lt;td&gt;$2.05/M tokens (auto-routed) = &lt;strong&gt;92% savings&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-engineering-matrix-comparison.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-engineering-matrix-comparison.png" alt="The Engineering Matrix" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @blockrun/clawrouter

&lt;span class="c"&gt;# Start (auto-configures OpenClaw)&lt;/span&gt;
clawrouter

&lt;span class="c"&gt;# Check your wallet&lt;/span&gt;
&lt;span class="c"&gt;# /wallet&lt;/span&gt;

&lt;span class="c"&gt;# View routing stats&lt;/span&gt;
&lt;span class="c"&gt;# /stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ClawRouter auto-injects itself into &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; as a provider on startup. Your existing tools, sessions, and extensions are unchanged.&lt;/p&gt;

&lt;p&gt;Load a wallet with USDC on Base or Solana, pick a routing profile, and run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-frictionless-integration.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstorage.googleapis.com%2Fblockrun-ai-static%2Fblog%2Fclawrouter-frictionless-integration.png" alt="Frictionless Integration" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://blockrun.ai" rel="noopener noreferrer"&gt;blockrun.ai&lt;/a&gt; · &lt;code&gt;npm install -g @blockrun/clawrouter&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openrouter</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Agent API Costs: How ClawRouter Cuts LLM Spending by 500x</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 03:05:16 +0000</pubDate>
      <link>https://dev.to/1bcmax/ai-agent-api-costs-how-clawrouter-cuts-llm-spending-by-500x-186k</link>
      <guid>https://dev.to/1bcmax/ai-agent-api-costs-how-clawrouter-cuts-llm-spending-by-500x-186k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;OpenClaw is one of the best AI agent frameworks available. Its LLM abstraction layer is not.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The $248/Day Problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkliggdvlf98iojaqeyx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhkliggdvlf98iojaqeyx.png" alt="The Autopsy of an Overrun — token volume compounds exponentially in agentic workloads, reaching 11.3M input tokens in a single hour" width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From &lt;a href="https://github.com/openclaw/openclaw/issues/3181" rel="noopener noreferrer"&gt;openclaw/openclaw#3181&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"We ended up at $248/day before we caught it. Heartbeat on Opus 4.6 with a large context. The dedup fix reduced trigger rate, but there's nothing bounding the run itself."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"11.3M input tokens in 1 hour on claude-opus-4-6 (128K context), ~$20/hour."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Both users ended up disabling heartbeat entirely. The workaround: &lt;code&gt;heartbeat.every: "0"&lt;/code&gt; — turning off the feature to avoid burning money.&lt;/p&gt;

&lt;p&gt;The root cause isn't configuration error. It's that OpenClaw's LLM layer has no concept of what things cost, and no way to stop a run that's spending too much.&lt;/p&gt;




&lt;h2&gt;
  
  
  What OpenClaw Gets Wrong at the Inference Layer
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmrmwmqm6xq4lu882a4m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmrmwmqm6xq4lu882a4m.png" alt="Orchestration frameworks are blind to inference realities — cost tier, error semantics, and context size go unscreened" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw is an excellent orchestration framework — session management, tool dispatch, agent routing, memory. But every request it makes hits a single configured model with no awareness of:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost tier&lt;/strong&gt; — A heartbeat status check doesn't need Opus. A file read result doesn't need 128K context. OpenClaw sends both to the same model at the same price.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limit isolation&lt;/strong&gt; — When one provider hits a 429, OpenClaw's failover logic applies that cooldown to the entire profile, not just the offending model. Every model in the same group is penalized (&lt;a href="https://github.com/openclaw/openclaw/issues/49834" rel="noopener noreferrer"&gt;#49834&lt;/a&gt;). If you configured 5 models for fallback, one slow provider can block all of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Empty/degraded responses&lt;/strong&gt; — Some providers return HTTP 200 with empty content, repeated tokens, or a single newline. OpenClaw passes this through to the agent. The agent either errors out, loops, or silently gets a blank response (&lt;a href="https://github.com/openclaw/openclaw/issues/49902" rel="noopener noreferrer"&gt;#49902&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Error semantics&lt;/strong&gt; — OpenClaw's failover logic has known gaps. We found and fixed two while building ClawRouter:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MiniMax HTTP 520&lt;/strong&gt; (&lt;a href="https://github.com/openclaw/openclaw/pull/49550" rel="noopener noreferrer"&gt;PR #49550&lt;/a&gt;) — MiniMax returns &lt;code&gt;{"type":"api_error","message":"unknown error, 520 (1000)"}&lt;/code&gt; for transient server errors. OpenClaw's classifier required both &lt;code&gt;"type":"api_error"&lt;/code&gt; AND the string &lt;code&gt;"internal server error"&lt;/code&gt;. MiniMax fails the second check. Result: no failover, silent failure, retry storm.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Z.ai codes 1311 and 1113&lt;/strong&gt; (&lt;a href="https://github.com/openclaw/openclaw/pull/49552" rel="noopener noreferrer"&gt;PR #49552&lt;/a&gt;) — Z.ai error 1311 means "model not on your plan" (billing — stop retrying). Error 1113 means "wrong endpoint" (auth — rotate key). Both fell through to &lt;code&gt;null&lt;/code&gt;, got treated as &lt;code&gt;rate_limit&lt;/code&gt;, triggered exponential backoff, and charged for every retry.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Context size&lt;/strong&gt; — Agents accumulate context. A 10-message conversation with tool results can easily hit 40K+ tokens. OpenClaw sends the full context every request, on every retry.&lt;/p&gt;




&lt;h2&gt;
  
  
  ClawRouter: Built for Agentic Workloads
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyptao5w6djyw82uxorcw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyptao5w6djyw82uxorcw.png" alt="ClawRouter proxy manifold sits between OpenClaw and upstream APIs like GPT-4o, Claude Opus, and Gemini — cost control is a gateway concern" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;ClawRouter&lt;/a&gt; is a local OpenAI-compatible proxy, purpose-built for how AI agents actually behave — not how simple chat clients do. It sits between OpenClaw and the upstream model APIs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OpenClaw → ClawRouter → blockrun.ai → GPT-4o / Opus / Gemini / ...
                ↑
         All the smart stuff happens here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Token Compression — 7 Layers, Agent-Aware
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p6eqfdmj0ot4edy7rn2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9p6eqfdmj0ot4edy7rn2.png" alt="Seven-layer agent-aware token compression — ClawRouter intercepts and compresses requests through 7 filters for 15-40% overall token reduction" width="800" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agents are the worst offenders for context bloat. Tool call results are verbose. File reads return thousands of lines. Conversation history compounds with every turn.&lt;/p&gt;

&lt;p&gt;ClawRouter compresses every request through 7 layers before it hits the wire:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Saves&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Deduplication&lt;/td&gt;
&lt;td&gt;Removes repeated messages (retries, echoes)&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Whitespace&lt;/td&gt;
&lt;td&gt;Strips excessive whitespace from all content&lt;/td&gt;
&lt;td&gt;2-8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dictionary&lt;/td&gt;
&lt;td&gt;Replaces common phrases with short codes&lt;/td&gt;
&lt;td&gt;5-15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Path shortening&lt;/td&gt;
&lt;td&gt;Codebook for repeated file paths in tool results&lt;/td&gt;
&lt;td&gt;3-10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON compaction&lt;/td&gt;
&lt;td&gt;Removes whitespace from embedded JSON&lt;/td&gt;
&lt;td&gt;5-12%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observation compression&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Summarizes tool results to key information&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Up to 97%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic codebook&lt;/td&gt;
&lt;td&gt;Learns repetitions in the actual conversation&lt;/td&gt;
&lt;td&gt;3-15%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Layer 6 is the big one. Tool results — file reads, API responses, shell output — can be 10KB+ each. The actual useful signal is often 200-300 chars. ClawRouter extracts errors, status lines, key JSON fields, and compresses the rest. Same model intelligence, 97% fewer tokens on the bulk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppav05je72ql3x0yf9w8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fppav05je72ql3x0yf9w8.png" alt="Extracting intelligence from tool bloat — raw tool output is 97% noise, ClawRouter filters to 3% signal with errors, status lines, and key values" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overall reduction: 15-40% on typical agentic workloads.&lt;/strong&gt; On the $248/day scenario, that's $150-$200/day in savings from compression alone, before any routing changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Automatic Tier Routing — Right Model for Each Request
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabqdm2cfc3fzh6nh6jdl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fabqdm2cfc3fzh6nh6jdl.png" alt="Right-sizing models for specific agent tasks — ClawRouter's task-to-tier routing engine with session pinning routes heartbeats to Flash and reasoning to Opus" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClawRouter classifies every request before forwarding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;heartbeat status check     →  SIMPLE   →  gemini-2.5-flash      (~0.04¢ / request)
code review, refactor      →  COMPLEX  →  claude-sonnet-4-6      (~5¢ / request)
formal proof, reasoning    →  REASONING →  o3 / claude-opus      (~30¢ / request)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tool detection is automatic.&lt;/strong&gt; When OpenClaw sends a request with tools attached, ClawRouter forces agentic routing tiers — guaranteeing tool-capable models and preventing the silent fallback to models that refuse tool calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session pinning.&lt;/strong&gt; Once a session selects a model for a task, ClawRouter pins that model for the session lifetime. No mid-task model switching, no consistency issues across a long agent run.&lt;/p&gt;

&lt;p&gt;The heartbeat that was burning $248/day on Opus routes to Flash at ~1/500th the cost. Same heartbeat feature, working as designed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Per-Model Rate Limit Isolation — No Cross-Contamination
&lt;/h3&gt;

&lt;p&gt;When a provider returns 429, ClawRouter marks that specific model as rate-limited for 60 seconds (&lt;a href="https://github.com/openclaw/openclaw/issues/49834" rel="noopener noreferrer"&gt;#49834&lt;/a&gt;). Other models in the fallback chain are unaffected. If Claude Sonnet gets rate-limited, Gemini Flash and GPT-4o continue working. No cascade.&lt;/p&gt;

&lt;p&gt;Before failing over, ClawRouter also retries the rate-limited model once after 200ms. Token-bucket limits often recover within milliseconds — most short-burst 429s resolve on the first retry without ever touching a fallback model.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Empty Response Detection — No Silent Failures
&lt;/h3&gt;

&lt;p&gt;ClawRouter inspects every HTTP 200 response body before forwarding it (&lt;a href="https://github.com/openclaw/openclaw/issues/49902" rel="noopener noreferrer"&gt;#49902&lt;/a&gt;). Blank responses, repeated-token loops, and single-character outputs trigger model fallback — the same as a 5xx. The agent never sees a degraded response that would cause it to loop or silently fail.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Correct Error Classification — No Retry Storms
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ll0znk1613akd9zjekl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ll0znk1613akd9zjekl.png" alt="Stopping retry storms at the HTTP layer — ClawRouter classifies errors per provider with logic gate classifier and automated mechanical actions" width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClawRouter classifies errors at the HTTP/body layer before OpenClaw sees them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;401 / 403              → auth_failure    → stop retrying, rotate key
402 / billing body     → quota_exceeded  → stop retrying, surface alert
429                    → rate_limited    → backoff, try next model
529 / overloaded body  → overloaded      → short cooldown, fallback model
5xx / 520              → server_error    → retry with different model
Z.ai 1311              → billing         → stop retrying
Z.ai 1113              → auth            → rotate key
MiniMax 520 (api_error)→ server_error    → retry with fallback
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per-provider error state is tracked independently. If MiniMax is having a bad hour, Anthropic and OpenAI routes continue working. No cross-contamination, no single provider poisoning the session.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Session Memory — Agents That Remember
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foij4tcqhriirf5mb7qod.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foij4tcqhriirf5mb7qod.png" alt="Agents that remember without compounding cost — ClawRouter session journaling vs standard OpenClaw context compounding across turns" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw sessions can be long-lived. ClawRouter maintains a session journal — extracting decisions, results, and context from each turn — and injects relevant history when the agent asks questions that reference earlier work.&lt;/p&gt;

&lt;p&gt;Less context repeated = fewer tokens = lower cost. Agents that need to recall earlier decisions don't need to carry the entire history in every prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. x402 Micropayments — Wallet-Based Budget Control
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foppav4kjrk1e3e031kga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foppav4kjrk1e3e031kga.png" alt="Budget limits enforced by physical construction — wallet loaded via Base/Solana, pay per call across 41+ models, balance hits zero and the valve shuts cleanly" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;ClawRouter pays for inference via &lt;a href="https://x402.org/" rel="noopener noreferrer"&gt;x402&lt;/a&gt; USDC micropayments (Base or Solana). You load a wallet. Each inference call costs exactly what it costs. When the wallet runs low, requests stop cleanly.&lt;/p&gt;

&lt;p&gt;There is no monthly invoice. There is no 3am email. There is a wallet balance, and it either has funds or it doesn't. Wallet-based billing means your budget stops the burn — not a monthly invoice that arrives after the damage is done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;maxCostPerRun&lt;/code&gt;&lt;/strong&gt; — a per-session cost ceiling that stops or downgrades requests once a session exceeds a configured threshold (e.g., &lt;code&gt;$0.50&lt;/code&gt;). This closes the remaining gap (&lt;a href="https://github.com/openclaw/openclaw/issues/3181" rel="noopener noreferrer"&gt;#3181&lt;/a&gt;) where a wallet with sufficient funds can still accumulate within a single run. Two modes: &lt;code&gt;graceful&lt;/code&gt; (downgrade to cheaper models) and &lt;code&gt;strict&lt;/code&gt; (hard 429 once the cap is hit).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;41+ models. One wallet. Pay per call.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  OpenClaw + ClawRouter: The Full Picture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3nzslaejj869qfkzt5g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3nzslaejj869qfkzt5g.png" alt="Architecting for production safety — OpenClaw standalone vs OpenClaw + ClawRouter comparison across cost, context, error handling, and budgeting" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;OpenClaw alone&lt;/th&gt;
&lt;th&gt;OpenClaw + ClawRouter&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Heartbeat cost overrun&lt;/td&gt;
&lt;td&gt;No per-run cap&lt;/td&gt;
&lt;td&gt;Tier routing → 50-500x cheaper model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large context&lt;/td&gt;
&lt;td&gt;Full context every call&lt;/td&gt;
&lt;td&gt;7-layer compression, 15-40% reduction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool result bloat&lt;/td&gt;
&lt;td&gt;Raw output forwarded&lt;/td&gt;
&lt;td&gt;Observation compression, up to 97%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rate limit contaminates profile&lt;/td&gt;
&lt;td&gt;All models penalized (#49834)&lt;/td&gt;
&lt;td&gt;Per-model 60s cooldown, others unaffected&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Empty / degraded 200 response&lt;/td&gt;
&lt;td&gt;Passed through to agent (#49902)&lt;/td&gt;
&lt;td&gt;Detected, triggers model fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Short-burst 429 failover&lt;/td&gt;
&lt;td&gt;Immediate failover to next model&lt;/td&gt;
&lt;td&gt;200ms retry first, failover only if needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MiniMax 520 failure&lt;/td&gt;
&lt;td&gt;Silent drop / retry storm&lt;/td&gt;
&lt;td&gt;Classified as server_error, retried correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Z.ai 1311 (billing)&lt;/td&gt;
&lt;td&gt;Treated as rate_limit, retried&lt;/td&gt;
&lt;td&gt;Classified as billing, stopped immediately&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-task model switch&lt;/td&gt;
&lt;td&gt;Model can change mid-session&lt;/td&gt;
&lt;td&gt;Session pinning, consistent model per task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monthly billing surprise&lt;/td&gt;
&lt;td&gt;Possible&lt;/td&gt;
&lt;td&gt;Wallet-based, stops when empty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per-session cost ceiling&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;maxCostPerRun&lt;/code&gt; — graceful or strict cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost visibility&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/stats&lt;/code&gt; with per-provider error counts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install with smart routing enabled&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://blockrun.ai/ClawRouter-update | bash
openclaw gateway restart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;ClawRouter&lt;/a&gt; auto-injects itself into &lt;code&gt;~/.openclaw/openclaw.json&lt;/code&gt; as a provider on startup. No manual config needed — your existing tools, sessions, and extensions are unchanged.&lt;/p&gt;

&lt;p&gt;Load a wallet, choose a model profile (&lt;code&gt;eco&lt;/code&gt; / &lt;code&gt;auto&lt;/code&gt; / &lt;code&gt;premium&lt;/code&gt; / &lt;code&gt;agentic&lt;/code&gt;), and run.&lt;/p&gt;




&lt;h2&gt;
  
  
  On Our OpenClaw Contributions
&lt;/h2&gt;

&lt;p&gt;We contribute upstream when we find bugs. The two PRs linked above fix real error classification gaps. Everyone using OpenClaw directly benefits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;ClawRouter&lt;/a&gt; exists because proxy-layer cost control, context compression, and agent-aware routing are fundamentally gateway concerns — not framework concerns. OpenClaw can't know that your heartbeat doesn't need Opus. It can't compress tool results it hasn't seen. It can't enforce a wallet ceiling.&lt;/p&gt;

&lt;p&gt;That's what &lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;ClawRouter&lt;/a&gt; is for.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/BlockRunAI/ClawRouter" rel="noopener noreferrer"&gt;github.com/BlockRunAI/ClawRouter&lt;/a&gt; · &lt;a href="https://blockrun.ai" rel="noopener noreferrer"&gt;blockrun.ai&lt;/a&gt; · &lt;code&gt;npm install -g @blockrun/clawrouter&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>devops</category>
      <category>costoptimization</category>
    </item>
    <item>
      <title>LLM Router Benchmark: 46 Models, 8 Providers, Sub-1ms Routing</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 03:04:21 +0000</pubDate>
      <link>https://dev.to/1bcmax/llm-router-benchmark-46-models-8-providers-sub-1ms-routing-15ej</link>
      <guid>https://dev.to/1bcmax/llm-router-benchmark-46-models-8-providers-sub-1ms-routing-15ej</guid>
      <description>&lt;p&gt;When you route AI requests across 46 models from 8 providers, you can't just pick the cheapest one. You can't just pick the fastest one either. We learned this the hard way.&lt;/p&gt;

&lt;p&gt;This is the technical story of how we benchmarked every model on our platform, discovered that speed and intelligence are poorly correlated, and built a production routing system that classifies requests in under 1ms using 14 weighted dimensions with sigmoid confidence calibration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: One Gateway, 46 Models, Infinite Wrong Choices
&lt;/h2&gt;

&lt;p&gt;BlockRun is an x402 micropayment gateway. Every LLM request flows through our proxy, gets authenticated via on-chain USDC payment, and is forwarded to the appropriate provider. The payment overhead adds 50-100ms to every request.&lt;/p&gt;

&lt;p&gt;Our users set &lt;code&gt;model: "auto"&lt;/code&gt; and expect us to pick the right model. But "right" means different things for different requests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A "what is Python?" query should route to the cheapest, fastest model&lt;/li&gt;
&lt;li&gt;A "implement a B-tree with concurrent insertions" query needs a capable model&lt;/li&gt;
&lt;li&gt;A "prove this theorem step by step" query needs reasoning capabilities&lt;/li&gt;
&lt;li&gt;An agentic workflow with tool calls needs models that follow instructions precisely&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We needed a system that could classify any request and route it to the optimal model in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Benchmarking the Fleet
&lt;/h2&gt;

&lt;p&gt;Before building the router, we needed ground truth. We benchmarked all 46 models through our production payment pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  Methodology
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Setup:     ClawRouter v0.12.47 proxy on localhost
           → BlockRun x402 gateway (Base EVM chain)
           → Provider APIs (OpenAI, Anthropic, Google, xAI, DeepSeek, Moonshot, MiniMax, NVIDIA, Z.AI)

Prompts:   3 Python coding tasks (IPv4 validation, LCS algorithm, LRU cache)
           2 requests per model per prompt
Config:    256 max tokens, non-streaming, temperature 0.7
Measured:  End-to-end wall clock time (includes x402 payment verification)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a synthetic benchmark. Every measurement includes the full payment-verification round trip that real users experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Latency Landscape
&lt;/h3&gt;

&lt;p&gt;Results revealed a 7x spread between the fastest and slowest models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FAST TIER (&amp;lt;1.5s):
  xai/grok-4-fast           1,143ms   224 tok/s   $0.20/$0.50
  xai/grok-3-mini           1,202ms   215 tok/s   $0.30/$0.50
  google/gemini-2.5-flash   1,238ms   208 tok/s   $0.30/$2.50
  google/gemini-2.5-pro     1,294ms   198 tok/s   $1.25/$10.00
  google/gemini-3-flash     1,398ms   183 tok/s   $0.50/$3.00
  deepseek/deepseek-chat    1,431ms   179 tok/s   $0.28/$0.42

MID TIER (1.5-2.5s):
  google/gemini-3.1-pro     1,609ms   167 tok/s   $2.00/$12.00
  moonshot/kimi-k2.5        1,646ms   156 tok/s   $0.60/$3.00
  anthropic/claude-sonnet   2,110ms   121 tok/s   $3.00/$15.00
  anthropic/claude-opus     2,139ms   120 tok/s   $5.00/$25.00
  openai/o3-mini            2,260ms   114 tok/s   $1.10/$4.40

SLOW TIER (&amp;gt;3s):
  openai/gpt-5.2-pro        3,546ms    73 tok/s   $21.00/$168.00
  openai/gpt-4o             5,378ms    48 tok/s   $2.50/$10.00
  openai/gpt-5.4            6,213ms    41 tok/s   $2.50/$15.00
  openai/gpt-5.3-codex      7,935ms    32 tok/s   $1.75/$14.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two clear patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Google and xAI dominate speed.&lt;/strong&gt; 11 of the top 13 fastest models are from Google or xAI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI flagship models are consistently slow.&lt;/strong&gt; Every GPT-5.x model takes 3-8 seconds. Even their cheapest models (GPT-4.1-nano at $0.10/$0.40) are 2x slower than Google's cheapest.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Step 2: Adding the Quality Dimension
&lt;/h2&gt;

&lt;p&gt;Speed alone tells you nothing about whether a model can actually handle your request. We cross-referenced our latency data with Artificial Analysis Intelligence Index v4.0 scores (composite of GPQA, MMLU, MATH, HumanEval, and other benchmarks):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MODEL                       LATENCY    IQ    $/M INPUT
─────────────────────────────────────────────────────
google/gemini-3.1-pro       1,609ms    57    $2.00    ← SWEET SPOT
openai/gpt-5.4              6,213ms    57    $2.50
openai/gpt-5.3-codex        7,935ms    54    $1.75
anthropic/claude-opus-4.6   2,139ms    53    $5.00
anthropic/claude-sonnet-4.6 2,110ms    52    $3.00
google/gemini-3-pro-prev    1,352ms    48    $2.00
moonshot/kimi-k2.5          1,646ms    47    $0.60
google/gemini-3-flash-prev  1,398ms    46    $0.50    ← VALUE SWEET SPOT
xai/grok-4                  1,348ms    41    $0.20
xai/grok-4.1-fast           1,244ms    41    $0.20
deepseek/deepseek-chat      1,431ms    32    $0.28
xai/grok-4-fast             1,143ms    23    $0.20
google/gemini-2.5-flash     1,238ms    20    $0.30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Efficiency Frontier
&lt;/h3&gt;

&lt;p&gt;Plotting IQ against latency reveals a clear efficiency frontier:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;IQ
57 |  Gem3.1Pro ·························· GPT-5.4
   |
53 |                    · Opus
52 |                   · Sonnet
   |
48 |  Gem3Pro ·
47 |   · Kimi
46 |  Gem3Flash ·
   |
41 |  Grok4 ·
   |
32 | Grok3 · · DeepSeek
   |
23 | GrokFast ·
20 | GemFlash ·
   └──────────────────────────────────────────────
     1.0   1.5   2.0   2.5   3.0        6.0  8.0
                 End-to-End Latency (seconds)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The frontier runs from Gemini 2.5 Flash (IQ 20, 1.2s) up to Gemini 3.1 Pro (IQ 57, 1.6s). Everything above and to the right of this line is dominated — you can get equal or better quality at lower latency from a different model.&lt;/p&gt;

&lt;p&gt;Key insight: &lt;strong&gt;Gemini 3.1 Pro matches GPT-5.4's IQ at 1/4 the latency and lower cost.&lt;/strong&gt; Claude Sonnet 4.6 nearly matches Opus 4.6 quality at 60% of the price. These dominated pairings directly informed our routing fallback chains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: The Failed Experiment (Latency-First Routing)
&lt;/h2&gt;

&lt;p&gt;Armed with benchmark data, we initially optimized for speed. The routing config promoted fast models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// v0.12.47 — latency-optimized (REVERTED)&lt;/span&gt;
&lt;span class="nx"&gt;COMPLEX&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;xai/grok-4-0709&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// 1,348ms, IQ 41&lt;/span&gt;
  &lt;span class="nx"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;xai/grok-4-1-fast-non-reasoning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// 1,244ms, IQ 41&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;google/gemini-2.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;           &lt;span class="c1"&gt;// 1,238ms, IQ 20&lt;/span&gt;
    &lt;span class="c1"&gt;// ... fast models first&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Users complained within 24 hours. The fast models were refusing complex tasks and giving shallow responses. A model with IQ 41 can't reliably handle architecture design or multi-step code generation, no matter how fast it is.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson: optimizing for a single metric in a multi-objective system creates failure modes.&lt;/strong&gt; We needed to optimize across speed, quality, and cost simultaneously.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: The 14-Dimension Scoring System
&lt;/h2&gt;

&lt;p&gt;The router needs to determine what kind of request it's looking at before selecting a model. We built a rule-based classifier that scores requests across 14 weighted dimensions:&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Prompt → Lowercase + Tokenize
                    ↓
            ┌──────────────────────────────────┐
            │   14 Dimension Scorers           │
            │   Each returns score ∈ [-1, 1]   │
            └──────┬───────────────────────────┘
                   ↓
            Weighted Sum (configurable weights)
                   ↓
            Tier Boundaries (SIMPLE &amp;lt; 0.0 &amp;lt; MEDIUM &amp;lt; 0.3 &amp;lt; COMPLEX &amp;lt; 0.5 &amp;lt; REASONING)
                   ↓
            Sigmoid Confidence Calibration
                   ↓
            confidence &amp;lt; 0.7 → AMBIGUOUS → default to MEDIUM
            confidence ≥ 0.7 → Classified tier
                   ↓
            Tier × Profile → Model Selection
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The 14 Dimensions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;What It Detects&lt;/th&gt;
&lt;th&gt;Score Range&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;reasoningMarkers&lt;/td&gt;
&lt;td&gt;0.18&lt;/td&gt;
&lt;td&gt;"prove", "theorem", "step by step"&lt;/td&gt;
&lt;td&gt;0 to 1.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;codePresence&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;td&gt;"function", "class", "import",&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
 ``` | 0 to 1.0 |&lt;br&gt;
| multiStepPatterns | 0.12 | "first...then", "step N", numbered lists | 0 or 0.5 |&lt;br&gt;
| technicalTerms | 0.10 | "algorithm", "kubernetes", "distributed" | 0 to 1.0 |&lt;br&gt;
| tokenCount | 0.08 | Short (&amp;lt;50 tokens) vs long (&amp;gt;500 tokens) | -1.0 to 1.0 |&lt;br&gt;
| creativeMarkers | 0.05 | "story", "poem", "brainstorm" | 0 to 0.7 |&lt;br&gt;
| questionComplexity | 0.05 | Number of question marks (&amp;gt;3 = complex) | 0 or 0.5 |&lt;br&gt;
| agenticTask | 0.04 | "edit", "deploy", "fix", "debug" | 0 to 1.0 |&lt;br&gt;
| constraintCount | 0.04 | "at most", "within", "O()" | 0 to 0.7 |&lt;br&gt;
| imperativeVerbs | 0.03 | "build", "create", "implement" | 0 to 0.5 |&lt;br&gt;
| outputFormat | 0.03 | "json", "yaml", "table", "csv" | 0 to 0.7 |&lt;br&gt;
| simpleIndicators | 0.02 | "what is", "hello", "define" | 0 to -1.0 |&lt;br&gt;
| referenceComplexity | 0.02 | "the code above", "the API docs" | 0 to 0.5 |&lt;br&gt;
| domainSpecificity | 0.02 | "quantum", "FPGA", "genomics" | 0 to 0.8 |&lt;/p&gt;

&lt;p&gt;Weights sum to 1.0. The weighted score maps to a continuous axis where tier boundaries partition the space.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multilingual Support
&lt;/h3&gt;

&lt;p&gt;Every keyword list includes translations in 9 languages (EN, ZH, JA, RU, DE, ES, PT, KO, AR). A Chinese user asking "证明这个定理" triggers the same reasoning classification as "prove this theorem."&lt;/p&gt;

&lt;h3&gt;
  
  
  Confidence Calibration
&lt;/h3&gt;

&lt;p&gt;Raw tier assignments can be ambiguous when a score falls near a boundary. We use sigmoid calibration:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
confidence = 1 / (1 + exp(-steepness * distance_from_boundary))


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Where &lt;code&gt;steepness = 12&lt;/code&gt; and &lt;code&gt;distance_from_boundary&lt;/code&gt; is the score's distance to the nearest tier boundary. This maps to a [0.5, 1.0] confidence range. Below &lt;code&gt;threshold = 0.7&lt;/code&gt;, the request is classified as ambiguous and defaults to MEDIUM.&lt;/p&gt;

&lt;h3&gt;
  
  
  Agentic Detection
&lt;/h3&gt;

&lt;p&gt;A separate scoring pathway detects agentic tasks (multi-step, tool-using, iterative). When &lt;code&gt;agenticScore &amp;gt;= 0.5&lt;/code&gt;, the router switches to agentic-optimized tier configs that prefer models with strong instruction following (Claude Sonnet for complex tasks, GPT-4o-mini for simple tool calls).&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Tier-to-Model Mapping
&lt;/h2&gt;

&lt;p&gt;Once a request is classified into a tier, the router selects from 4 routing profiles:&lt;/p&gt;

&lt;h3&gt;
  
  
  Auto Profile (Default)
&lt;/h3&gt;

&lt;p&gt;Tuned from our benchmark data + user retention metrics:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
plaintext
SIMPLE  → gemini-2.5-flash (1,238ms, IQ 20, 60% retention)
MEDIUM  → kimi-k2.5 (1,646ms, IQ 47, strong tool use)
COMPLEX → gemini-3.1-pro (1,609ms, IQ 57, fastest flagship)
REASON  → grok-4-1-fast-reasoning (1,454ms, $0.20/$0.50)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Eco Profile
&lt;/h3&gt;

&lt;p&gt;Ultra cost-optimized. Uses free/near-free models:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
plaintext
SIMPLE  → nvidia/gpt-oss-120b (FREE)
MEDIUM  → gemini-2.5-flash-lite ($0.10/$0.40, 1M context)
COMPLEX → gemini-2.5-flash-lite ($0.10/$0.40)
REASON  → grok-4-1-fast-reasoning ($0.20/$0.50)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Premium Profile
&lt;/h3&gt;

&lt;p&gt;Best quality regardless of cost:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
plaintext
SIMPLE  → kimi-k2.5 ($0.60/$3.00)
MEDIUM  → gpt-5.3-codex ($1.75/$14.00, 400K context)
COMPLEX → claude-opus-4.6 ($5.00/$25.00)
REASON  → claude-sonnet-4.6 ($3.00/$15.00)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Fallback Chains
&lt;/h3&gt;

&lt;p&gt;Each tier config includes an ordered fallback list. When the primary model returns a 402 (payment failed), 429 (rate limited), or 5xx, the proxy walks the fallback chain. Fallback ordering is benchmark-informed:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
// COMPLEX tier — quality-first fallback order
fallback: [
  "google/gemini-3-pro-preview",      // IQ 48, 1,352ms
  "google/gemini-3-flash-preview",     // IQ 46, 1,398ms
  "xai/grok-4-0709",                   // IQ 41, 1,348ms
  "google/gemini-2.5-pro",             // 1,294ms
  "anthropic/claude-sonnet-4.6",       // IQ 52, 2,110ms
  "deepseek/deepseek-chat",            // IQ 32, 1,431ms
  "google/gemini-2.5-flash",           // IQ 20, 1,238ms
  "openai/gpt-5.4",                    // IQ 57, 6,213ms — last resort
]


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The chain descends by quality first (IQ 48 → 46 → 41), then trades quality for speed. GPT-5.4 is last despite having IQ 57, because its 6.2s latency is a worst-case user experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Context-Aware Filtering
&lt;/h2&gt;

&lt;p&gt;The fallback chain is filtered at runtime based on request properties:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context window filtering&lt;/strong&gt;: Models with insufficient context window for the estimated total tokens are excluded (with 10% safety buffer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling filter&lt;/strong&gt;: When the request includes tool definitions, only models that support function calling are kept&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision filter&lt;/strong&gt;: When the request includes images, only vision-capable models are kept&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If filtering eliminates all candidates, the full chain is used as a fallback (better to let the API error than return nothing).&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Calculation and Savings
&lt;/h2&gt;

&lt;p&gt;Every routing decision includes a cost estimate and savings percentage against a baseline (Claude Opus 4.6 pricing):&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
savings = max(0, (opusCost - routedCost) / opusCost)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a typical SIMPLE request (500 input tokens, 256 output tokens):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Opus cost: $0.0089 (at $5.00/$25.00 per 1M tokens)&lt;/li&gt;
&lt;li&gt;Gemini Flash cost: $0.0008 (at $0.30/$2.50 per 1M tokens)&lt;/li&gt;
&lt;li&gt;Savings: 91.0%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Across our user base, the median savings rate is 85% compared to routing everything to a premium model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;The entire classification pipeline (14 dimensions + tier mapping + model selection) runs in under 1ms. No external API calls. No LLM inference. Pure keyword matching and arithmetic.&lt;/p&gt;

&lt;p&gt;We originally designed a two-stage system where low-confidence rules-based classifications would fall back to an LLM classifier (Gemini 2.5 Flash). In practice, the rules handle 70-80% of requests with high confidence, and the remaining ambiguous cases default to MEDIUM — which is the correct conservative choice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Speed and intelligence are weakly correlated.&lt;/strong&gt; The fastest model (Grok 4 Fast, IQ 23) is at the bottom of the quality scale. The smartest model at low latency (Gemini 3.1 Pro, IQ 57, 1.6s) is a Google model, not OpenAI.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Optimizing for one metric fails.&lt;/strong&gt; Latency-first routing breaks quality. Quality-first routing breaks latency budgets. You need multi-objective optimization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User retention is the real metric.&lt;/strong&gt; Our best-performing model for SIMPLE tasks isn't the cheapest or the fastest — it's Gemini 2.5 Flash (60% retention rate), which balances speed, cost, and just-enough quality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fallback ordering matters more than primary selection.&lt;/strong&gt; The primary model handles the happy path. The fallback chain handles reality — rate limits, outages, payment failures. A well-ordered fallback chain is more important than picking the perfect primary.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rule-based classification is underrated.&lt;/strong&gt; 14 keyword dimensions with sigmoid confidence calibration handles 70-80% of requests correctly in &amp;lt;1ms. The remaining 20-30% default to a safe middle tier. For a routing system where every millisecond of overhead compounds across millions of requests, avoiding LLM inference in the classification step is worth the reduced accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Appendix: Full Benchmark Data
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Raw data (46 models, latency, throughput, IQ scores, pricing): &lt;a href="https://github.com/BlockRunAI/ClawRouter/blob/main/benchmark-merged.json" rel="noopener noreferrer"&gt;benchmark-merged.json&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Routing configuration: &lt;a href="https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/config.ts" rel="noopener noreferrer"&gt;src/router/config.ts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Scoring implementation: &lt;a href="https://github.com/BlockRunAI/ClawRouter/blob/main/src/router/rules.ts" rel="noopener noreferrer"&gt;src/router/rules.ts&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>performance</category>
      <category>devops</category>
    </item>
    <item>
      <title>X/Twitter Algorithm: How AI Agents Can Hack Organic Reach</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 02:43:30 +0000</pubDate>
      <link>https://dev.to/1bcmax/xtwitter-algorithm-how-ai-agents-can-hack-organic-reach-1fk</link>
      <guid>https://dev.to/1bcmax/xtwitter-algorithm-how-ai-agents-can-hack-organic-reach-1fk</guid>
      <description>&lt;p&gt;Most people blame the algorithm when their content doesn't perform. "The algo buried me." "Reach is dead." "Only paid promotion works now."&lt;/p&gt;

&lt;p&gt;They're wrong. The algorithm isn't hiding your content — it's optimizing for attention. Learn to speak its language and it becomes the most powerful distribution system ever built.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Algorithm Actually Wants
&lt;/h2&gt;

&lt;p&gt;Every platform optimizes for the same thing: time on platform. The algorithm promotes content that keeps people scrolling, clicking, watching.&lt;/p&gt;

&lt;p&gt;Which means: it's not personal. It's math. And math can be reverse-engineered.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Signals That Matter
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Early engagement velocity.&lt;/strong&gt; How fast do people interact after you post? First 30 minutes matter most. The algorithm uses early signals to predict total reach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dwell time.&lt;/strong&gt; Do people stop scrolling and actually read? Long-form text, compelling hooks, narrative tension — these increase time-on-post, which signals quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reply ratio.&lt;/strong&gt; Comments beat likes. Replies beat retweets. The algorithm weighs active engagement (typing) over passive engagement (clicking).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Save and share.&lt;/strong&gt; When someone bookmarks or DMs your post, that's signal gold. It means the content has reference value beyond the feed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reverse Engineering Virality
&lt;/h2&gt;

&lt;p&gt;Look at what's already working in your niche. Not to copy — to decode. What format? What hook pattern? What time of day? What triggers replies?&lt;/p&gt;

&lt;p&gt;The algorithm isn't random. It's a feedback loop. Give it what it rewards.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta Move
&lt;/h2&gt;

&lt;p&gt;Most creators fight the algorithm. Smart creators partner with it. They treat reach like engineering, not art. Test, measure, iterate.&lt;/p&gt;

&lt;p&gt;Your content might be brilliant. But if it doesn't trigger the right signals in the first 30 minutes, brilliance is irrelevant.&lt;/p&gt;

&lt;p&gt;The algorithm isn't your enemy. It's just math looking for signals. Give it the signals it wants, and it becomes the best distribution partner you've ever had.&lt;/p&gt;

&lt;p&gt;Learn the cheat code.&lt;/p&gt;

</description>
      <category>twitter</category>
      <category>algorithms</category>
      <category>ai</category>
      <category>growth</category>
    </item>
    <item>
      <title>MiniMax M2.7 Is Live on BlockRun — The First Self-Evolving Reasoning Model</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 02:37:44 +0000</pubDate>
      <link>https://dev.to/1bcmax/minimax-m27-is-live-on-blockrun-the-first-self-evolving-reasoning-model-k34</link>
      <guid>https://dev.to/1bcmax/minimax-m27-is-live-on-blockrun-the-first-self-evolving-reasoning-model-k34</guid>
      <description>&lt;p&gt;MiniMax just dropped &lt;strong&gt;M2.7&lt;/strong&gt; — and it's live on BlockRun right now.&lt;/p&gt;

&lt;p&gt;One API call. Pay per request. No subscription. No API key signup with MiniMax.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://blockrun.ai/v1/chat/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "minimax/minimax-m2.7",
    "messages": [{"role": "user", "content": "Hello"}]
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're still calling &lt;code&gt;minimax/minimax-m2.5&lt;/code&gt;, it auto-redirects to M2.7. No code changes needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes M2.7 Different
&lt;/h2&gt;

&lt;p&gt;M2.7 is the first model MiniMax describes as &lt;strong&gt;deeply participating in its own evolution&lt;/strong&gt;. It doesn't just run agent tasks — it builds and optimizes its own agent harnesses through recursive self-improvement loops.&lt;/p&gt;

&lt;p&gt;In practice, that means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;97% skill adherence&lt;/strong&gt; across 40+ complex skills (each exceeding 2,000 tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;30% performance gains&lt;/strong&gt; from recursive harness optimization over 100+ iteration cycles&lt;/li&gt;
&lt;li&gt;Handles &lt;strong&gt;30–50% of research workflows&lt;/strong&gt; autonomously&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a chatbot upgrade. It's a model that gets better at being an agent the more you use it as one.&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks That Matter
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y5h9n28m5jugzzthd9a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y5h9n28m5jugzzthd9a.png" alt="MiniMax M2.7 benchmarks vs Sonnet 4.6, Opus 4.6, Gemini 3.1 Pro, GPT-5.4 across SWE Bench Pro, Multi-SWE Bench, VIBE-Pro, MLE-Bench Lite, GDPval-AA, Toolathon, MM-ClawBench, and Artificial Analysis"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Software Engineering
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;M2.7&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SWE-Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;56.22%&lt;/td&gt;
&lt;td&gt;Matches GPT-5.3-Codex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;VIBE-Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;55.6%&lt;/td&gt;
&lt;td&gt;End-to-end project delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Terminal Bench 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;57.0%&lt;/td&gt;
&lt;td&gt;Complex engineering systems&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SWE Multilingual&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;76.5&lt;/td&gt;
&lt;td&gt;Cross-language code tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi SWE Bench&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;52.7&lt;/td&gt;
&lt;td&gt;Multi-repo engineering&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Machine Learning &amp;amp; Research
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;MLE Bench Lite&lt;/strong&gt; (22 Kaggle-style competitions): &lt;strong&gt;66.6% average medal rate&lt;/strong&gt; — second only to Opus 4.6 (75.7%) and GPT-5.4 (71.2%). Best single run: 9 gold, 5 silver, 1 bronze.&lt;/p&gt;

&lt;h3&gt;
  
  
  Professional Productivity
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;M2.7&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GDPval-AA ELO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1495&lt;/td&gt;
&lt;td&gt;Highest among open-source models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Toolathon&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;46.3%&lt;/td&gt;
&lt;td&gt;Tool use accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MM Claw&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;62.7%&lt;/td&gt;
&lt;td&gt;Near Sonnet 4.6 level&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Production debugging benchmarks show &lt;strong&gt;incident recovery time under 3 minutes&lt;/strong&gt; — SRE-level decision-making for log analysis, security audits, and system comprehension.&lt;/p&gt;




&lt;h2&gt;
  
  
  New in M2.7 vs M2.5
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Agent Teams&lt;/strong&gt; — multi-agent collaboration built into the model, not bolted on&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recursive self-improvement&lt;/strong&gt; — the model optimizes its own harnesses over iteration cycles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Character consistency&lt;/strong&gt; — dramatically improved emotional intelligence for interactive apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial analysis&lt;/strong&gt; — deep reasoning over complex financial documents and reports&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Pricing on BlockRun
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Input&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0.30 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Output&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$1.20 / 1M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;204,800 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's &lt;strong&gt;50x cheaper than Claude Opus&lt;/strong&gt; and &lt;strong&gt;12x cheaper than GPT-5.4&lt;/strong&gt; for output tokens — while matching their engineering benchmarks.&lt;/p&gt;

&lt;p&gt;Pay per request with USDC on Base. No API key. No subscription. No minimum spend.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Direct API:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;https://blockrun.ai/v1/chat/completions
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;blockrun&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BlockRun&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BlockRun&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minimax/minimax-m2.7&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain this codebase&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TypeScript SDK:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;BlockRun&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;blockrun&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;BlockRun&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;minimax/minimax-m2.7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Explain this codebase&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;ClawRouter&lt;/strong&gt; (drop-in OpenAI replacement for any framework):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://blockrun.ai/v1
&lt;span class="c"&gt;# Works with OpenClaw, LangChain, CrewAI, AutoGen — any OpenAI-compatible client&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;Read the full announcement from MiniMax: &lt;a href="https://www.minimax.io/news/minimax-m27-en" rel="noopener noreferrer"&gt;MiniMax M2.7 — Beginning the Journey of Recursive Self-Improvement&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>openai</category>
    </item>
    <item>
      <title>x402 Protocol: How AI Agents Pay for APIs Without Human Intervention</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 02:37:43 +0000</pubDate>
      <link>https://dev.to/1bcmax/x402-protocol-how-ai-agents-pay-for-apis-without-human-intervention-2j6h</link>
      <guid>https://dev.to/1bcmax/x402-protocol-how-ai-agents-pay-for-apis-without-human-intervention-2j6h</guid>
      <description>&lt;p&gt;We built the internet for humans. Browsers, buttons, billing forms. It assumes a person is on the other end.&lt;/p&gt;

&lt;p&gt;But now AI agents are the ones browsing. Reading. Executing. And they can't enter their credit card into a checkout form. They need to call APIs. And those APIs need to get paid.&lt;/p&gt;

&lt;p&gt;This is where things break.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Subscription Trap
&lt;/h2&gt;

&lt;p&gt;Most APIs charge monthly. $20/month. $99/month. $500/month. The model assumes you're a company with a billing department, running a predictable workload.&lt;/p&gt;

&lt;p&gt;But an autonomous agent doesn't have "monthly." It has "right now." It needs to call this API, get this data, generate this image — once, immediately, and move on.&lt;/p&gt;

&lt;p&gt;Subscriptions force agents into a human billing model that doesn't fit their usage pattern. Result: agents either overpay (subscribed to services they rarely use) or can't access services at all (no budget for the subscription).&lt;/p&gt;

&lt;h2&gt;
  
  
  The API Key Nightmare
&lt;/h2&gt;

&lt;p&gt;To use 10 services, you need 10 API keys. Each key requires: an account, email verification, billing setup, rate limit management, key rotation, secret storage.&lt;/p&gt;

&lt;p&gt;For a developer building one product? Annoying but manageable.&lt;/p&gt;

&lt;p&gt;For an autonomous agent that needs to discover and use services dynamically? Impossible.&lt;/p&gt;

&lt;p&gt;The agent can't create accounts. It can't fill out forms. It needs a wallet it can hold, and services that accept that wallet directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;The agent holds a wallet. The service publishes a price. The agent calls the service. The payment happens atomically — $0.001 for this request, settled instantly, no subscription, no account.&lt;/p&gt;

&lt;p&gt;This is what x402 enables. HTTP payments. Service returns 402 Payment Required. Agent signs payment. Service delivers. Done.&lt;/p&gt;

&lt;p&gt;No billing dashboard. No monthly invoice. No API key management. Just wallets paying for compute, per request, at internet speed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The New Default
&lt;/h2&gt;

&lt;p&gt;The agent economy needs new infrastructure. Not "traditional payments with a crypto wrapper." Native machine-to-machine commerce.&lt;/p&gt;

&lt;p&gt;Wallet = identity. Request = payment. No humans required.&lt;/p&gt;

&lt;p&gt;This is what we're building at BlockRun. The payment and discovery layer for agents that need to find, evaluate, and pay for services — autonomously.&lt;/p&gt;

&lt;p&gt;The internet wasn't built for machines to pay. So we're building the layer that is.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>blockchain</category>
      <category>web3</category>
      <category>agents</category>
    </item>
    <item>
      <title>AI API Cost Control: How x402 Prevents $47K Budget Overruns</title>
      <dc:creator>1bcMax</dc:creator>
      <pubDate>Sat, 21 Mar 2026 02:27:06 +0000</pubDate>
      <link>https://dev.to/1bcmax/ai-api-cost-control-how-x402-prevents-47k-budget-overruns-4do2</link>
      <guid>https://dev.to/1bcmax/ai-api-cost-control-how-x402-prevents-47k-budget-overruns-4do2</guid>
      <description>&lt;p&gt;A multi-agent system mistakenly burned $47,000+ in API costs. No hacker. No breach. Just bad infrastructure controls.&lt;/p&gt;

&lt;p&gt;Two AI agents were stuck in a recursive loop for 11 days, each one asking the other for clarification, each one convinced it was making progress. Nobody noticed until the invoice arrived.&lt;/p&gt;

&lt;p&gt;If you're building with LLMs today, this is not an edge case. It's a problem many teams will eventually face. This is what's referred to as an agent loop problem, and it exposes a deeper issue with AI infrastructure.&lt;/p&gt;

&lt;p&gt;These agents were handed API keys — the equivalent of giving them corporate credit cards — with no real-time spending governance. When the loop started, nothing existed at the infrastructure layer to stop it.&lt;/p&gt;

&lt;p&gt;The good news: Edge &amp;amp; Node has built an open-source system called &lt;a href="https://ampersend.ai" rel="noopener noreferrer"&gt;ampersend&lt;/a&gt; that makes this type of failure impossible.&lt;/p&gt;

&lt;p&gt;With ampersend, every LLM call becomes a real USDC payment with spending limits enforced at the wallet level instead of application code. When the agent's budget runs out, the agent stops spending money — even if the code keeps running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent Loops Are an Infrastructure Problem, Not a Code Problem
&lt;/h2&gt;

&lt;p&gt;Most teams building agent systems know the usual advice: add step limits, set token caps, monitor for repeated outputs. These are good best practices — but they're not enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step limits don't survive composition.&lt;/strong&gt; Agent A calls Agent B, which calls Agent C. Step limits are local to each agent. If each agent is allowed 50 steps, the system can easily execute 150 total. When recursive calls are involved, costs compound quickly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token caps are estimates, not enforcement.&lt;/strong&gt; Most LLM APIs let you set max_tokens on a response. This limits output length, not spending. An agent that sends 50 requests with modest outputs can still accumulate serious spend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring is reactive.&lt;/strong&gt; Observability dashboards tell you what happened. By the time you see a cost spike, the money has already been spent. In the $47K incident, monitoring was in place — it simply reported outcomes rather than intervening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application-level budget checks can be bypassed.&lt;/strong&gt; If your code checks a counter before each API call, that counter lives in the same trust domain as the agent. A bug that causes the loop can also break the counter.&lt;/p&gt;

&lt;p&gt;In other words, anything that depends on the agent's own logic to limit its spend will fail in exactly the scenarios where limits matter most: when the agent is misbehaving. You need a control layer that is external to the agent, that can't be circumvented by application bugs, and that enforces hard economic boundaries on every single request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: Make Every LLM Call a Payment
&lt;/h2&gt;

&lt;p&gt;The budget problems above share a root cause: payment and execution are decoupled. The &lt;a href="https://x402.org" rel="noopener noreferrer"&gt;x402 protocol&lt;/a&gt; addresses this by redefining how agents access LLM inference. Instead of authenticating with an API key and settling costs later via an invoice, each request is a discrete payment transaction.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blockrun.ai" rel="noopener noreferrer"&gt;BlockRun&lt;/a&gt; is a platform that enables pay-per-use access to many mainstream LLMs via the x402 payment protocol. No API key. No subscription tier. No monthly bill. Each request either pays or it doesn't execute.&lt;/p&gt;

&lt;p&gt;This is a fundamental shift. With API keys, spending authority is granted once and revoked manually. With x402, spending authority is exercised and verified on every single request. If the payment doesn't go through, the inference doesn't happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing ampersend: The Wallet That Enforces Your Budget
&lt;/h2&gt;

&lt;p&gt;Pay-per-request alone doesn't prevent runaway spending — an agent stuck in a loop will keep paying as long as it has funds. This is the gap ampersend was built to address.&lt;/p&gt;

&lt;p&gt;ampersend is agentic payment infrastructure that gives autonomous agents programmable wallets with built-in spending controls and real-time observability. When an agent requests a payment signature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the agent's daily spend is under the limit, the wallet signs the transaction and the request proceeds.&lt;/li&gt;
&lt;li&gt;If the daily spend has reached the limit, the wallet refuses to sign. The request fails. The agent is economically dead — it can keep running, but ampersend won't let it pay for anything.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The spending limit is not in the application code. It lives in the wallet policy. The agent's code cannot override it, bypass it, or accidentally skip it. Even if the agent is stuck in an infinite loop, prompt-injected, or broken by orchestration bugs, the wallet remains the final authority.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It All Works Together
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Agent sends an inference request to BlockRun.&lt;/li&gt;
&lt;li&gt;BlockRun responds with &lt;code&gt;HTTP 402 Payment Required&lt;/code&gt; with payment details.&lt;/li&gt;
&lt;li&gt;The agent's ampersend treasurer checks the request against the wallet's spending policy. If allowed, it signs a USDC payment. If the limit is reached, it refuses — request dies here.&lt;/li&gt;
&lt;li&gt;The agent retries the request with proof of payment attached.&lt;/li&gt;
&lt;li&gt;BlockRun verifies the on-chain payment and returns the inference result.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Traditional API vs. BlockRun + ampersend
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional API&lt;/th&gt;
&lt;th&gt;BlockRun (x402)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;API key authentication&lt;/td&gt;
&lt;td&gt;Payment is the authentication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Post-hoc billing (monthly invoice)&lt;/td&gt;
&lt;td&gt;Pre-paid per request (instant settlement)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spending limit = credit card limit&lt;/td&gt;
&lt;td&gt;Spending limit = wallet policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Revocation requires key rotation&lt;/td&gt;
&lt;td&gt;Revocation is automatic (wallet limit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost attribution is manual&lt;/td&gt;
&lt;td&gt;Cost is on-chain and auditable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For agent builders, this means you can give an agent access to GPT-class models without giving it an API key that could be leaked, shared, or exploited beyond your intended budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does It Actually Stop Runaway Spending?
&lt;/h2&gt;

&lt;p&gt;We built a load test that deliberately simulates a disaster scenario — firing requests in an infinite loop as fast as possible until something stops it.&lt;/p&gt;

&lt;p&gt;With a traditional API key, nothing stops it. The loop runs until the credit card is maxed out or someone manually intervenes.&lt;/p&gt;

&lt;p&gt;With ampersend: the first N requests succeed. Each one is a real USDC payment. When the agent's daily limit is reached, the treasurer refuses to sign the next payment. The total spend is exactly the daily limit you configured — not a dollar more.&lt;/p&gt;

&lt;p&gt;The loop may continue logically — the code still wants to send requests — but financially, it's dead. The wallet, not the code, is the circuit breaker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Agent Builders
&lt;/h2&gt;

&lt;p&gt;If you're building systems where AI agents call LLM APIs — whether that's a single coding agent, a multi-agent pipeline, or an autonomous agent swarm — the loop spending problem will eventually find you.&lt;/p&gt;

&lt;p&gt;The shift is simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replace API keys with per-request payments.&lt;/strong&gt; x402 makes every LLM call an explicit economic transaction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enforce budgets at the wallet layer, not the application layer.&lt;/strong&gt; ampersend's spending limits can't be bypassed by bugs in your agent code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make costs on-chain and auditable.&lt;/strong&gt; Every payment is a USDC transaction, visible on-chain. No more guessing where the spend went.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't about crypto ideology. It's about using programmable money to solve a real engineering problem: how do you give an autonomous system access to expensive resources without giving it unlimited spending authority?&lt;/p&gt;

&lt;p&gt;The answer is the same one that every other infrastructure domain has learned: governance belongs at the platform layer, not the application layer. Kubernetes doesn't trust your containers to self-limit CPU usage. Rate limiters don't trust your services to self-throttle. Your agent infrastructure shouldn't trust your agents to self-budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full reference implementation is open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repository: &lt;a href="https://github.com/edgeandnode/ampersend-blockrun-agentops" rel="noopener noreferrer"&gt;ampersend-blockrun-agentops&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ampersend SDK: &lt;a href="https://github.com/edgeandnode/ampersend-sdk" rel="noopener noreferrer"&gt;github.com/edgeandnode/ampersend-sdk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;BlockRun: &lt;a href="https://blockrun.ai" rel="noopener noreferrer"&gt;blockrun.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;x402 Protocol: &lt;a href="https://x402.org" rel="noopener noreferrer"&gt;x402.org&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Request beta access at &lt;a href="https://ampersend.ai" rel="noopener noreferrer"&gt;ampersend.ai&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>blockchain</category>
      <category>web3</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
