<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: GDS K S</title>
    <description>The latest articles on DEV Community by GDS K S (@thegdsks).</description>
    <link>https://dev.to/thegdsks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3592860%2F7dec468f-4f91-4b1d-9d24-99091e204707.jpg</url>
      <title>DEV Community: GDS K S</title>
      <link>https://dev.to/thegdsks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thegdsks"/>
    <language>en</language>
    <item>
      <title>Your bundle is 4000x bigger than Quake. The 9-step audit that fixes it.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Thu, 14 May 2026 04:29:48 +0000</pubDate>
      <link>https://dev.to/thegdsks/your-bundle-is-4000x-bigger-than-quake-the-9-step-audit-that-fixes-it-5cpb</link>
      <guid>https://dev.to/thegdsks/your-bundle-is-4000x-bigger-than-quake-the-9-step-audit-that-fixes-it-5cpb</guid>
      <description>&lt;p&gt;In February 2026 a developer named daivuk shipped a playable Quake-like first person shooter in a 64 kilobyte Windows executable. Multiple levels, four enemy types, textures, music, the whole game. The trick was not magic. He wrote a custom language and a custom virtual machine because the standard toolchain shipped too many features he did not use. Two extra kilobytes of generic runtime would have killed the fourth level.&lt;/p&gt;

&lt;p&gt;That story sat with me for a week, because almost every web app I open is 30 to 60 times the size of QUOD. The page you are reading right now, by the time it finishes loading on Dev.to, weighs more than four hundred copies of QUOD running at once. The marketing page for the framework your app is built on is heavier than QUOD by three orders of magnitude. We have collectively forgotten what bytes cost.&lt;/p&gt;

&lt;p&gt;This article is the audit playbook I use when a Next.js or Vite project crosses my desk and the Lighthouse score reads orange. Nine steps, in the exact order, with the commands, the expected output, and the typical wins. Everything you need to cut your bundle by 50 to 90 percent in a single afternoon. No "rewrite in Rust" theater. Just deletions.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;What you run&lt;/th&gt;
&lt;th&gt;Typical win&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Baseline&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;npx next build&lt;/code&gt; then read the output table&lt;/td&gt;
&lt;td&gt;knowing where you stand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Visualise&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@next/bundle-analyzer&lt;/code&gt; or &lt;code&gt;rollup-plugin-visualizer&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;the map&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Kill date libraries&lt;/td&gt;
&lt;td&gt;swap &lt;code&gt;moment&lt;/code&gt; for &lt;code&gt;date-fns&lt;/code&gt; or native &lt;code&gt;Intl&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;50 to 90 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Kill icon sets&lt;/td&gt;
&lt;td&gt;one import per icon, never the full pack&lt;/td&gt;
&lt;td&gt;20 to 200 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Kill lodash&lt;/td&gt;
&lt;td&gt;swap &lt;code&gt;lodash&lt;/code&gt; for &lt;code&gt;lodash-es&lt;/code&gt; or native&lt;/td&gt;
&lt;td&gt;60 to 80 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6. Audit polyfills&lt;/td&gt;
&lt;td&gt;drop IE 11 support; target ES2022&lt;/td&gt;
&lt;td&gt;30 to 100 KB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7. Code-split routes&lt;/td&gt;
&lt;td&gt;dynamic imports for non-critical pages&lt;/td&gt;
&lt;td&gt;100 KB to 1 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8. Replace images&lt;/td&gt;
&lt;td&gt;AVIF or modern WebP, properly sized&lt;/td&gt;
&lt;td&gt;200 KB to 2 MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9. Re-baseline&lt;/td&gt;
&lt;td&gt;run step 1 again, write the number down&lt;/td&gt;
&lt;td&gt;confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The numbers in the table come from documented case studies on web.dev, the HTTP Archive 2025 annual report, and the Vercel Next.js docs. Your mileage will vary. The order will not.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Baseline
&lt;/h2&gt;

&lt;p&gt;You cannot improve what you have not measured. Before you touch anything, get an honest number.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Next.js&lt;/span&gt;
npx next build
&lt;span class="c"&gt;# read the "First Load JS" table at the bottom&lt;/span&gt;

&lt;span class="c"&gt;# Vite&lt;/span&gt;
npx vite build
&lt;span class="c"&gt;# read the dist/ output sizes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The number you want is "First Load JS shared by all," and then your largest individual route. Write both down. This number will be your accountability for the rest of the audit. If it does not go down by at least 30 percent by step 9, you skipped something or you have a genuinely small project, which is fine, you are done.&lt;/p&gt;

&lt;p&gt;The HTTP Archive's 2025 annual web almanac reports a median JavaScript transfer size of 612 KB on desktop and 555 KB on mobile. If your number is meaningfully bigger than that, you have low hanging fruit. If it is meaningfully smaller, you are already ahead of most of the industry.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Visualise the bundle
&lt;/h2&gt;

&lt;p&gt;A list of files is not a map. You need the map.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Next.js&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; @next/bundle-analyzer
&lt;span class="c"&gt;# in next.config.js wrap your config with the analyzer&lt;/span&gt;
&lt;span class="nv"&gt;ANALYZE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true &lt;/span&gt;npm run build

&lt;span class="c"&gt;# Vite&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; rollup-plugin-visualizer
&lt;span class="c"&gt;# add it to vite.config.ts&lt;/span&gt;
npx vite build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The analyzer opens a treemap in your browser. The treemap is the entire audit's source of truth. Every fat block is a question. Every question is one of the next seven steps.&lt;/p&gt;

&lt;p&gt;Spend ten minutes here. Hover the rectangles. Find the ones that are unfamiliar. The ones you cannot explain are the ones that have the most byte fat.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Kill the date library
&lt;/h2&gt;

&lt;p&gt;The single most common bundle bloat in the entire JavaScript ecosystem. Moment.js is 67 KB minified before gzip. day.js is 7 KB. date-fns with tree shaking can drop to 12 KB. Native &lt;code&gt;Intl.DateTimeFormat&lt;/code&gt; is zero.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// before&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;moment&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;moment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;moment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;YYYY-MM-DD&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// after, native, zero bytes added&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;Intl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DateTimeFormat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;en-CA&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// or with date-fns, tree shakes cleanly&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;format&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;date-fns&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;yyyy-MM-dd&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run a global grep for &lt;code&gt;moment&lt;/code&gt; and &lt;code&gt;dayjs&lt;/code&gt; in your codebase. If you find moment, you have a 50 to 90 KB win sitting on the floor. The migration is mechanical and well documented.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Kill the icon set import
&lt;/h2&gt;

&lt;p&gt;The second most common bundle bloat, especially in dashboards built on Material UI, Chakra, or any "we have icons" library. The trap is the default import.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// before, ships the entire icon set&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Menu&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mui/icons-material&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// after, ships only the three icons&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Search&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mui/icons-material/Search&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;User&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mui/icons-material/Person&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Menu&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mui/icons-material/Menu&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The default barrel import in many icon packs is the entire 2 MB of SVG. The per-icon import path ships only what you reference. Material UI's documentation explicitly warns about this. Many teams ignore it. Check yours.&lt;/p&gt;

&lt;p&gt;For Lucide, Heroicons, and Phosphor, tree-shaking generally works correctly if your bundler is set up right. Verify it in the analyzer. If you see the full icon library in your treemap, the tree shake did not happen and you need to fix the import path.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Kill the utility library
&lt;/h2&gt;

&lt;p&gt;Lodash is 70 KB. Most apps use seven functions from it. The fix is either &lt;code&gt;lodash-es&lt;/code&gt; with tree shaking, or replacing the seven functions with native equivalents.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// before&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;_&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lodash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;grouped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;category&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unique&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// after, native&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;grouped&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupBy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;unique&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="c1"&gt;// or, tree shaken&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;groupBy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lodash-es/groupBy&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;uniq&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lodash-es/uniq&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Object.groupBy&lt;/code&gt; shipped in 2024 and is widely available. &lt;code&gt;Map.groupBy&lt;/code&gt; is also there. The Set constructor handles uniqueness in one line. Underscore is even worse than lodash for the same reason. Check your dependency tree, find them, replace them, save bytes.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Audit the polyfill load
&lt;/h2&gt;

&lt;p&gt;If your project supports browsers older than the last two years of Chrome, Safari, and Firefox, you are shipping polyfills you do not need. The &lt;code&gt;.browserslistrc&lt;/code&gt; or &lt;code&gt;browserslist&lt;/code&gt; field in package.json governs this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;before&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"browserslist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&amp;gt; 0.5%"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last 2 versions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Firefox ESR"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"not dead"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;after,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;modern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;targets&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;only&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"browserslist"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"last 2 chrome versions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last 2 firefox versions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
                 &lt;/span&gt;&lt;span class="s2"&gt;"last 2 safari versions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"last 2 edge versions"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The wins here vary by project. A React app that explicitly targets IE 11 ships about 50 KB more than the same app targeting last-two-versions. Vue and Svelte have similar ratios. Check the analyzer for &lt;code&gt;core-js&lt;/code&gt;, &lt;code&gt;regenerator-runtime&lt;/code&gt;, &lt;code&gt;@babel/runtime&lt;/code&gt;. Each of those is a polyfill bundle, and each shrinks meaningfully when you raise the target.&lt;/p&gt;

&lt;p&gt;The honest tradeoff: if you serve enterprise customers stuck on Internet Explorer, you cannot do this. Almost everyone else can.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. Code split by route
&lt;/h2&gt;

&lt;p&gt;The biggest single lever. Most apps load every component on every page because the bundler does not know which routes need what. The fix is dynamic imports for non-critical paths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="c1"&gt;// before, eager import&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;HeavyDashboard&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./HeavyDashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="c1"&gt;// after, lazy&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Suspense&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;HeavyDashboard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lazy&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./HeavyDashboard&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt; &lt;span class="na"&gt;fallback&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Spinner&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;HeavyDashboard&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Suspense&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Next.js the App Router does most of this automatically per route. The wins come from splitting heavy components inside a route. A chart library, a markdown editor, a video player, a payment SDK. Each of those is a candidate.&lt;/p&gt;

&lt;p&gt;Run the analyzer again after this step. The shared bundle should drop by 100 KB to a megabyte, depending on what you split. The page-specific bundles will be larger, but only loaded when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. Replace your images
&lt;/h2&gt;

&lt;p&gt;Almost forgot the part where pictures of food account for 70 percent of the bytes on the average e-commerce page.&lt;/p&gt;

&lt;p&gt;The 2026 image stack is straightforward. Serve AVIF with a WebP fallback and a JPEG fallback. Size them to the actual display dimensions, not the original camera resolution. Use the native &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; element or a framework wrapper like &lt;code&gt;next/image&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;picture&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt; &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"hero.avif"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"image/avif"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;source&lt;/span&gt; &lt;span class="na"&gt;srcset=&lt;/span&gt;&lt;span class="s"&gt;"hero.webp"&lt;/span&gt; &lt;span class="na"&gt;type=&lt;/span&gt;&lt;span class="s"&gt;"image/webp"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;img&lt;/span&gt; &lt;span class="na"&gt;src=&lt;/span&gt;&lt;span class="s"&gt;"hero.jpg"&lt;/span&gt; &lt;span class="na"&gt;alt=&lt;/span&gt;&lt;span class="s"&gt;"..."&lt;/span&gt; &lt;span class="na"&gt;width=&lt;/span&gt;&lt;span class="s"&gt;"1200"&lt;/span&gt; &lt;span class="na"&gt;height=&lt;/span&gt;&lt;span class="s"&gt;"630"&lt;/span&gt; &lt;span class="na"&gt;loading=&lt;/span&gt;&lt;span class="s"&gt;"lazy"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/picture&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The width and height attributes prevent layout shift and give the browser an early hint. The loading=lazy attribute defers off-screen images. The AVIF source typically shaves 30 to 50 percent off the file size compared to JPEG at the same quality.&lt;/p&gt;

&lt;p&gt;A typical e-commerce site that does the full image audit drops its page weight by a megabyte or two. That single change moves Lighthouse scores more than the previous six steps combined.&lt;/p&gt;

&lt;h2&gt;
  
  
  9. Re-baseline and write the number down
&lt;/h2&gt;

&lt;p&gt;Run step 1 again. Write the new number next to the old one. Compare.&lt;/p&gt;

&lt;p&gt;If you ran all eight changes on a typical Next.js app with one heavy dashboard, an icon library, lodash, and unoptimized images, you should see the First Load JS drop from a starting point of 400 to 600 KB down to 100 to 200 KB. The Lighthouse performance score should jump 20 to 40 points. The Time to Interactive should fall by a full second on a throttled mid-range Android device.&lt;/p&gt;

&lt;p&gt;If you did not get those wins, one of two things happened. Either your app is already lean, in which case congratulations, or you skipped a step. Run the analyzer again and find the rectangle that is still too big.&lt;/p&gt;

&lt;h2&gt;
  
  
  The framework you can actually keep
&lt;/h2&gt;

&lt;p&gt;The nine steps above are a one-time audit. The hard part is keeping the wins after the audit ends. Three rules I run on every project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Rule 1: A bundle budget &lt;span class="k"&gt;in &lt;/span&gt;CI.
  Bundle size has to be a number &lt;span class="k"&gt;in &lt;/span&gt;a green or red box on every PR.
  npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--save-dev&lt;/span&gt; bundlewatch
  Add it to your &lt;span class="nb"&gt;test &lt;/span&gt;script. Set a max. Fail the build on regression.

Rule 2: A dependency review on every PR that touches package.json.
  Use the @sentry/bundle-analyzer or @next/bundle-analyzer &lt;span class="k"&gt;in &lt;/span&gt;CI.
  Post the diff as a comment. The team will see it. The team will care.

Rule 3: A monthly &lt;span class="s2"&gt;"what got fat"&lt;/span&gt; report.
  Once a month, run the analyzer and look at the biggest rectangles.
  One of them will surprise you. Fix it.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without these three rules the wins drift back inside six months. With them, the bundle stays at the size you decided it should be at the audit, indefinitely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest take
&lt;/h2&gt;

&lt;p&gt;You are not going to ship your next SaaS in 64 KB. Nobody is asking you to. But the lesson from QUOD is not about the absolute number, it is about the constraint mindset. The standard toolchain ships every feature you do not use. Every dependency is a vote against your users on a slow connection. Every imported icon set is a tax on the laptop battery of the person reading your page on a flight.&lt;/p&gt;

&lt;p&gt;The good news is that the audit pays back in hours, not weeks. The first time I ran this playbook on a real codebase, I cut a 540 KB First Load JS down to 168 KB in one afternoon. The before and after Lighthouse score difference would have taken six months of "performance work" if I had done it gradually. Doing it all in one focused sweep is dramatically faster.&lt;/p&gt;

&lt;p&gt;The next time you reach for a 4 MB library to format a date, think about QUOD. Then think about whether your users would rather download your full app, or four hundred copies of QUOD running at the same time, with guns in them.&lt;/p&gt;

&lt;p&gt;Question for the comments: what is the biggest single byte win you ever shipped, and what tool did you replace?&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GDS K S&lt;/strong&gt; · &lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt; · follow on X &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;@thegdsks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Every byte in your bundle is a tiny vote against your users on a slow connection.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>performance</category>
      <category>javascript</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>I asked Cursor to rename a function. It sent 8,400 tokens. I checked.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Wed, 13 May 2026 03:58:11 +0000</pubDate>
      <link>https://dev.to/thegdsks/i-asked-cursor-to-rename-a-function-it-sent-8400-tokens-i-checked-434h</link>
      <guid>https://dev.to/thegdsks/i-asked-cursor-to-rename-a-function-it-sent-8400-tokens-i-checked-434h</guid>
      <description>&lt;p&gt;&lt;em&gt;The afternoon I learned what my AI subscription was actually doing, and the 200 lines that took my next bill down 41 percent.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I had been using Cursor for six months when I noticed the discrepancy. I was renaming a function. A short one. Three lines of body. One call site. The kind of refactor that takes, at most, six seconds of human attention.&lt;/p&gt;

&lt;p&gt;I had two windows open. The Cursor chat panel where I had typed "rename getUser to fetchUser". And the Anthropic console in another tab, because I had been debugging a different project earlier and forgot to close it.&lt;/p&gt;

&lt;p&gt;The Anthropic console refreshed while the Cursor request was in flight. I watched the token counter tick up live. The number it landed on for that single rename request was 8,400 input tokens. The actual prompt I had typed was eleven words.&lt;/p&gt;

&lt;p&gt;I sat there for a moment. Then I opened a fresh terminal and made the same call directly through the Anthropic API with my own minimal prompt. Same model. Same intent. Same outcome.&lt;/p&gt;

&lt;p&gt;The direct call used 1,900 input tokens. Cursor had sent 6,500 extra tokens of context to perform the same rename.&lt;/p&gt;

&lt;p&gt;That observation was the start of the spreadsheet that ate the next four hours of my evening.&lt;/p&gt;

&lt;h2&gt;
  
  
  What was in those 6,500 tokens
&lt;/h2&gt;

&lt;p&gt;I do not have inside knowledge of Cursor's internals. I have my own experiments and the public documentation, neither of which gives a complete picture. Here is what I have: when I ran the same prompt 50 times across different parts of my codebase, the input token count varied between roughly 4,000 and 14,000 with a median around 8,000. The variation correlated loosely with how many open buffers I had and how recently I had viewed files in the same directory.&lt;/p&gt;

&lt;p&gt;The reasonable inference is that Cursor was sending me a system prompt, plus indexing context derived from my recent activity, plus tool definitions for the agent framework, plus the actual prompt I had typed. The first three are the routing layer doing its job. The fourth was the only one I could see in the chat panel.&lt;/p&gt;

&lt;p&gt;Some of that context helps. When I ask for a refactor that touches several files, Cursor knowing about those files is the entire point. When I ask to rename a function whose call sites fit in three lines of context, sending 6,500 tokens of unrelated buffer state is the routing layer playing it safe on my behalf in a way that benefits Cursor more than it benefits me.&lt;/p&gt;

&lt;p&gt;Cursor charges a fixed seat fee. Anthropic charges Cursor by the token. The math runs the wrong direction for me as a heavy user, because the marginal token cost ends up in my own direct API calls (which Cursor's seat does not cover) plus the fixed seat itself. Conservative context is cheap to ship and expensive to consume. The incentive lands on the wrong side of the table.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bill that made me pay attention
&lt;/h2&gt;

&lt;p&gt;My March bill arrived on April 2. The Anthropic line item had grown 50 percent month over month for three months running. Cursor at $20 was flat. Copilot at $10 was flat. The variable line was the API I was hitting from my own CLI for things Cursor was not the right tool for.&lt;/p&gt;

&lt;p&gt;The growth was not from doing more work. I checked. My weekly logged hours were stable. The growth was from an increasing fraction of those hours involving AI calls that I was making more casually because the AI was getting more useful.&lt;/p&gt;

&lt;p&gt;The trend was straight. If I extrapolated, by August I was going to be paying more in direct Anthropic API spend than in Cursor seats, and my Cursor seat would still be running the same conservative context overhead on every chat panel turn. The bill was going to keep growing in two places at once.&lt;/p&gt;

&lt;p&gt;I cancelled Cursor that weekend.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 200 lines
&lt;/h2&gt;

&lt;p&gt;The thing I built to replace the routing piece of Cursor was small enough to embarrass me for not having built it months earlier. A regex based intent classifier with five rules. Trivial prompts route to Haiku. Code prompts route to Sonnet. Planning prompts route to Opus. Embedding-style classification prompts route to a cheap OpenAI model. Default to Sonnet if nothing matches.&lt;/p&gt;

&lt;p&gt;That is the entire routing logic. Two hundred lines of TypeScript including imports, error handling, a pricing table, and a cost calculator that logs every call. The full file fits on one screen if you have a tall display.&lt;/p&gt;

&lt;p&gt;I tested it on a hundred prompts I had logged from the previous week. The breakdown shifted hard. Sonnet went from 70 percent of calls to 25 percent. Haiku went from zero to 60 percent. Opus stayed at 5 percent. The estimated cost reduction was 47 percent on the test set.&lt;/p&gt;

&lt;p&gt;I did not believe the number. I assumed I had a bug. I instrumented the router to log the actual model picked per prompt and the cost in real time, and I ran my normal workflow for two weeks. The actual reduction came in at 41 percent on the May 2 bill, with 30 percent more total calls because the cheaper per call cost made me reach for AI more often.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I did not understand before
&lt;/h2&gt;

&lt;p&gt;The routing layer is the most valuable part of the AI tool stack right now, and the wrappers want to own that part most of all.&lt;/p&gt;

&lt;p&gt;Every coding tool I have looked at in the last 90 days has shipped a model dropdown. Cursor added one. GitHub Copilot added one. Windsurf added one. The story they tell is customer choice. The story underneath, I think, is that they have all noticed the same thing I noticed in April. The user can route their own calls. The user is starting to. If the wrappers do not own the routing layer, they own a chat panel and an autocomplete and not much else.&lt;/p&gt;

&lt;p&gt;The chat panel is real value. The autocomplete is real value. They are not $20 a month of value for a heavy user who would prefer to route directly. They are maybe $5 to $10 a month of value, sized to the actual work they save.&lt;/p&gt;

&lt;p&gt;I think we are two quarters away from a wave of users doing this exercise. The wrappers know that. The pricing pages are starting to reflect it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would tell my March self
&lt;/h2&gt;

&lt;p&gt;Three things, in the order they matter.&lt;/p&gt;

&lt;p&gt;The first: open the Anthropic dashboard. Look at the input token count on three of your normal Cursor turns. If the number runs more than 3x your direct call baseline, the routing layer is not earning its cost on your usage pattern. That does not mean cancel. That means notice.&lt;/p&gt;

&lt;p&gt;The second: log every AI call you make for one week. Cost per call, model picked, prompt length, output length. The log takes twenty lines of code per provider. The data will surprise you. No honest way exists to optimize a bill you cannot see.&lt;/p&gt;

&lt;p&gt;The third: write the router. Two hundred lines. The first version does not have to be smart. Five regex rules for intent capture 70 percent of the savings. Iterate on the rules later.&lt;/p&gt;

&lt;p&gt;The reason I would tell my March self these things: I would have done the same exercise three months earlier and saved roughly $300 in subscription overlap. The cost of doing the exercise: one Saturday afternoon. The cost of not doing it: whatever your bill grows to in the next quarter, which for most people building with AI right now lands at a number bigger than they want to admit.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this does not solve
&lt;/h2&gt;

&lt;p&gt;The router does not give me a multi file editing agent. It does not index my codebase. It does not know about my open buffers. It does not autocomplete inline. None of that is its job.&lt;/p&gt;

&lt;p&gt;I kept Copilot for the inline ghost text in VS Code, because that is a different product solving a different problem and the $10 is not the line that hurts. For the multi file agent work I would have used Cursor for, I now use Claude Code from the terminal, which I pay for separately through my Max plan. The total stack is cheaper than Cursor plus my old direct API spend.&lt;/p&gt;

&lt;p&gt;If your usage pattern is different, your math will be different. If you live in the chat panel and rarely go outside it, Cursor is probably still a fair trade. If your AI work spans chat, agent loops, embedding pipelines, and one off CLI calls, the routing layer is the one piece worth owning yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing
&lt;/h2&gt;

&lt;p&gt;The bill came in on April 2. The new bill came in on May 2. The difference between them was 41 percent and 200 lines of code and one weekend afternoon I was going to spend half asleep in front of a movie.&lt;/p&gt;

&lt;p&gt;The lesson should have been obvious. The wrappers have an incentive to send more tokens than necessary. The user has an incentive to send fewer. The routing layer is where the two incentives meet. Whoever owns the routing layer wins.&lt;/p&gt;

&lt;p&gt;The routing layer can be 200 lines of yours.&lt;/p&gt;

&lt;p&gt;What is the line on your AI bill that grew the fastest last month? Drop it in a reply. I read everything.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by **GDS K S&lt;/em&gt;* (&lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt;), building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt;.*&lt;br&gt;
&lt;em&gt;If this was useful, follow me on &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;X / @thegdsks&lt;/a&gt;. I write about the parts of the AI stack vendors keep off the pricing page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cursor</category>
      <category>claude</category>
      <category>productivity</category>
    </item>
    <item>
      <title>AWS Lambda Is Dead. The $0.20 Was Never the Price</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Tue, 12 May 2026 04:24:50 +0000</pubDate>
      <link>https://dev.to/thegdsks/aws-lambda-is-dead-the-020-was-never-the-price-2k4j</link>
      <guid>https://dev.to/thegdsks/aws-lambda-is-dead-the-020-was-never-the-price-2k4j</guid>
      <description>&lt;p&gt;Last quarter we migrated 47 Lambda functions off AWS. The monthly bill dropped from $8,362 to $1,790. Lambda invocations were 22% of that bill. The rest was the part AWS never put on the pricing page, never put in the AWS Lambda 101 docs, and never came up when our solutions architect ran our forecast workshop in 2024.&lt;/p&gt;

&lt;p&gt;We are committing to one position in this piece and we are not flipping at the end. Lambda is dead for the API, webhook, auth, and edge workload that most teams actually deploy. Every comparison article we read while researching the move closed with "but evaluate your needs carefully." That sentence is what kept us on Lambda for an extra year. We are not writing that sentence today.&lt;/p&gt;

&lt;p&gt;If you are skimming, the punchline is at the top.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Thing&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Why Lambda loses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;INIT billing (Aug 2025)&lt;/td&gt;
&lt;td&gt;Cold start init time now bills like duration&lt;/td&gt;
&lt;td&gt;The price floor moved up, quietly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The orchestration bundle&lt;/td&gt;
&lt;td&gt;API Gateway, CloudWatch, NAT, egress&lt;/td&gt;
&lt;td&gt;Lambda is 20 to 40% of your serverless bill&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;The crossover point&lt;/td&gt;
&lt;td&gt;Where Fargate or Workers wins&lt;/td&gt;
&lt;td&gt;Moved from 20M to about 2M invocations a month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;V8 isolates&lt;/td&gt;
&lt;td&gt;Cold start of 2 to 5ms on Workers&lt;/td&gt;
&lt;td&gt;No warm-up tax, no provisioned concurrency to buy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. The $0.20 Per Million Is a Loss Leader
&lt;/h2&gt;

&lt;p&gt;Lambda lists at $0.20 per million requests and $0.0000166667 per GB-second. People build their forecasts on this number. Then the bill arrives and the Lambda line item is a quarter of the total.&lt;/p&gt;

&lt;p&gt;A breakdown from a real account, redacted, one month:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lambda invocations + duration   $1,847   (22%)
API Gateway                     $1,612   (19%)
CloudWatch Logs                 $1,398   (17%)
NAT Gateway hours               $1,287   (15%)
Data egress                     $1,094   (13%)
X-Ray, KMS, Secrets, parameter store   $1,124   (14%)
                              ─────────
Total                          $8,362
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn1nga3oqwgdstvelty0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffn1nga3oqwgdstvelty0.png" alt="Cost Breakdown - GLINR" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Read that table again. Lambda is the fifth column. The wrapper. CloudWatch Logs is bigger. NAT Gateway is bigger. Almost everything is bigger.&lt;/p&gt;

&lt;p&gt;Lambda by itself was fine. Lambda inside the AWS bundle was a loss leader for the bundle. CloudWatch Logs at the default ingestion rate eats functions that log a single audit line per call. NAT Gateway hours rack up if your function needs to reach a private RDS or any non-VPC endpoint, which is, you know, basically every real function. Data egress at $0.09 per gigabyte compounds with every response payload.&lt;/p&gt;

&lt;p&gt;We have a name for this pattern on our internal docs. The Token Tab. The headline price advertises the wrapper. The real money is in the orchestration around it. AWS did not invent this pattern. AWS just runs the most disciplined version of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. INIT Billing Changed the Floor (And Nobody Noticed)
&lt;/h2&gt;

&lt;p&gt;In August 2025, AWS quietly changed how Lambda billed cold start initialization. Before the change, INIT time was free for managed runtimes. Now it bills as duration.&lt;/p&gt;

&lt;p&gt;For a JVM function with 800ms of init, every cold invocation costs an extra 80% on top of the actual work. Boot up Spring Boot? Pay for the boot. Cold start your fat Python ML container? Pay for the import storm.&lt;/p&gt;

&lt;p&gt;The change shipped in a release note. Not a keynote. Not a tweet from a Principal Engineer. A release note.&lt;/p&gt;

&lt;p&gt;We did not find out about it until April 2026 when we audited a function that was supposed to cost $40 and was billing $110.&lt;/p&gt;

&lt;p&gt;Reports floating around early 2026 (&lt;a href="https://medium.com/infradecodedops/aws-lambda-is-your-worst-performing-cost-saver-the-2026-cold-start-data-will-shock-you-0822418e57c8" rel="noopener noreferrer"&gt;source: InfraDecodedOps cold-start teardown&lt;/a&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;23% of customer-facing Lambda invocations hit a cold start&lt;/li&gt;
&lt;li&gt;p99 cold latency of 1.8 seconds on those endpoints&lt;/li&gt;
&lt;li&gt;AWS counters that fewer than 1% of invocations across Lambda are cold&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both numbers are true. The 1% headline reflects hot, internal, async functions running at scale. The 23% reflects what your users actually feel on your auth endpoint at 6:42am UTC when traffic is sparse.&lt;/p&gt;

&lt;p&gt;SnapStart helps for Java, Python, .NET. SnapStart does not help for the part of your bill that is API Gateway and NAT.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Crossover Moved (And It Moved a Lot)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ib7lr30517fbhi4onob.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ib7lr30517fbhi4onob.png" alt="Crossover GRAPH" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Folk wisdom in 2022 was that Lambda was cheaper until you hit roughly 20 million invocations a month. After that, Fargate. After that, real instances.&lt;/p&gt;

&lt;p&gt;With INIT billing layered onto the bundle pricing, &lt;a href="https://dev.to/alanwest/aws-lambdas-hidden-costs-when-to-migrate-to-containers-and-how-2h1n"&gt;the crossover is closer to 2 million&lt;/a&gt; for typical API workloads. That is an order of magnitude shift in three years. Nobody updated the blog posts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Monthly cost
   ▲
   │                    ╱── Lambda + bundle
   │              ╱────╱
   │        ╱────╱
   │  ╱────╱─────────── Fargate (1 task, ALB)
   │═════════════════════ Workers + KV/D1
   │
   └──────────────────────►
    0    1M   2M    5M   10M  invocations/mo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Workers stays flat because Workers bills CPU time, not wall clock. If your function waits 200ms on a database response but only burns 15ms of CPU, you pay for 15ms. Lambda bills for the full 200ms.&lt;/p&gt;

&lt;p&gt;That is not a small detail. That is the whole game. The typical API spends 80% of its time waiting on a database, a downstream service, or an LLM. Lambda charges you for the wait. Workers does not.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The Cope Chart (Optimizations That Do Not Save You)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjexak1f75irjun1tjge.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjexak1f75irjun1tjge.png" alt="Cope Chart" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Things we tried in 2024 and 2025 to "fix" our Lambda bill:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tactic&lt;/th&gt;
&lt;th&gt;Effect on Lambda line&lt;/th&gt;
&lt;th&gt;Effect on total bill&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Trim function memory&lt;/td&gt;
&lt;td&gt;-8% on Lambda&lt;/td&gt;
&lt;td&gt;-1.7% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Switch Python to Node&lt;/td&gt;
&lt;td&gt;-12% on Lambda&lt;/td&gt;
&lt;td&gt;-2.6% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ARM64 (Graviton2)&lt;/td&gt;
&lt;td&gt;-15% on Lambda&lt;/td&gt;
&lt;td&gt;-3.3% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add provisioned concurrency&lt;/td&gt;
&lt;td&gt;+30% on Lambda&lt;/td&gt;
&lt;td&gt;+6.6% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SnapStart on JVM functions&lt;/td&gt;
&lt;td&gt;-22% on Lambda&lt;/td&gt;
&lt;td&gt;-4.8% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit and prune CloudWatch retention&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;-9% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Add VPC endpoints to kill NAT&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;-14% on total&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You see the pattern. The Lambda optimizations move the Lambda line a little. The bundle optimizations move the bill a lot. We spent a year picking up dimes on the Lambda line while the bundle was charging us in 50s.&lt;/p&gt;

&lt;p&gt;The audit-and-kill-bundle tactics are what actually save money on AWS. Nobody writes blog posts about them because they are not sexy. There is no AWS re:Invent talk about deleting your unused log groups. There should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Workers vs Lambda, Side by Side
&lt;/h2&gt;

&lt;p&gt;A real example. A signed URL generator for S3, ported to R2 on Workers.&lt;/p&gt;

&lt;p&gt;Lambda version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S3Client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;GetObjectCommand&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-sdk/client-s3&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getSignedUrl&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@aws-sdk/s3-request-presigner&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;S3Client&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;us-east-1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathParameters&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;missing key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GetObjectCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;assets&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;getSignedUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;expiresIn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cold start with the SDK loads at 400 to 700ms. Warm at 30 to 80ms. Bills wall clock. Behind API Gateway. Logs go to CloudWatch by default. Outbound to S3 may hit NAT depending on VPC config.&lt;/p&gt;

&lt;p&gt;Workers version with R2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;missing key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ASSETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;not found&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;404&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cache-control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;public, max-age=300&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cold start at 2 to 5ms (&lt;a href="https://blog.rebalai.com/en/2026/03/09/cloudflare-workers-vs-aws-lambda-which-edge-runtim/" rel="noopener noreferrer"&gt;Cloudflare's own dashboards&lt;/a&gt; and a six-month production comparison back this up). R2 egress to the public internet is zero. No log retention to forget about. No NAT in the path. The pricing page does not need a footnote.&lt;/p&gt;

&lt;p&gt;Same function. Different platform. The bill says everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  5b. The Platform Comparison
&lt;/h2&gt;

&lt;p&gt;Here is the table that should be at the top of every serverless conversation in 2026. Pricing is approximate and cold starts come from production benchmarks reported by independent teams. The links go to canonical comparison or pricing pages for each platform.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Cold start&lt;/th&gt;
&lt;th&gt;Billing model&lt;/th&gt;
&lt;th&gt;Free egress&lt;/th&gt;
&lt;th&gt;Sweet spot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.vantage.sh/blog/cloudflare-workers-vs-aws-lambda-cost" rel="noopener noreferrer"&gt;AWS Lambda&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;200-700ms (cold), bills INIT since Aug 2025&lt;/td&gt;
&lt;td&gt;Wall clock + bundle&lt;/td&gt;
&lt;td&gt;No ($0.09/GB)&lt;/td&gt;
&lt;td&gt;AWS-native event glue, GPU jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://leaper.dev/blog/cloudflare-workers-vs-lambda-2026" rel="noopener noreferrer"&gt;Cloudflare Workers&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2-5ms (V8 isolate)&lt;/td&gt;
&lt;td&gt;CPU time only&lt;/td&gt;
&lt;td&gt;Yes (R2 zero)&lt;/td&gt;
&lt;td&gt;HTTP APIs, edge, webhooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://blog.rebalai.com/en/2026/03/09/cloudflare-workers-vs-aws-lambda-which-edge-runtim/" rel="noopener noreferrer"&gt;Google Cloud Run&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;50-200ms&lt;/td&gt;
&lt;td&gt;Wall clock&lt;/td&gt;
&lt;td&gt;Yes within region&lt;/td&gt;
&lt;td&gt;Container portability, ML inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://leanopstech.com/blog/aws-lambda-pricing-2026/" rel="noopener noreferrer"&gt;Azure Functions Premium&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Pre-warmed (0)&lt;/td&gt;
&lt;td&gt;Flat + per-exec&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Enterprise, no-cold-start needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://dev.to/alanwest/aws-lambdas-hidden-costs-when-to-migrate-to-containers-and-how-2h1n"&gt;Fly.io Machines&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;250ms-2s (cold)&lt;/td&gt;
&lt;td&gt;Per-second machine&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Stateful regional apps, full VM control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://leaper.dev/blog/cloudflare-workers-vs-lambda-2026" rel="noopener noreferrer"&gt;Railway&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Container boot&lt;/td&gt;
&lt;td&gt;Usage-based&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Solo and small team backend hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://leaper.dev/blog/cloudflare-workers-vs-lambda-2026" rel="noopener noreferrer"&gt;Baselime reported&lt;/a&gt; an 80% cloud-cost drop after migrating from AWS to Cloudflare. &lt;a href="https://www.sitepoint.com/case-study-cloud-to-local-ai-pwa/" rel="noopener noreferrer"&gt;Sitepoint published a case study&lt;/a&gt; where a team moved their Lambda proxy entirely into the browser and cut their bill from $2,400 to $140 a month. Different workloads, same direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The Honest Counter (And Why It Does Not Save Lambda)
&lt;/h2&gt;

&lt;p&gt;Three places Lambda still earns its keep. We are not pretending otherwise.&lt;/p&gt;

&lt;p&gt;GPU and long-running compute. Workers has a 30-second CPU cap and no GPU. If you are doing inference, video transcoding, or anything past a 30-second budget, Lambda or SageMaker or a real instance is the answer. We kept zero of these. We do not run any.&lt;/p&gt;

&lt;p&gt;Heavy AWS-native event glue. If your workload is S3 to DynamoDB to SQS to Step Functions and you never leave AWS, the orchestration bundle is the platform, and Lambda is the right glue. We kept four functions for this. They run for cents.&lt;/p&gt;

&lt;p&gt;Sparse async cron jobs. A nightly batch at 3am that runs once a day for 200ms? Lambda is essentially free at that scale. Workers Cron Triggers are also fine. Either works.&lt;/p&gt;

&lt;p&gt;Notice what is absent from that list. The typical API. The webhook receiver. The edge function. The auth gateway. The fan-out worker that most teams actually deploy. That whole pile is the dead zone for Lambda in 2026.&lt;/p&gt;

&lt;p&gt;If your team's Lambda fleet is mostly the typical API workload, you are in the dead zone whether you have noticed yet or not. The bill will tell you eventually. Usually after a quarterly investor update where someone asks why infrastructure spend grew faster than revenue.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. The Migration Playbook
&lt;/h2&gt;

&lt;p&gt;If you want to move the workloads that no longer make sense on Lambda:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pull a 30-day Cost Explorer report. Tag by Lambda function name. Find the top 10 by total cost.&lt;/li&gt;
&lt;li&gt;For each, count outbound calls per invocation, average duration, average response size, cold start frequency, and whether it crosses a NAT.&lt;/li&gt;
&lt;li&gt;Candidates that move first are HTTP-fronted, low-CPU, high-invocation, network-bound. Those are the ones bleeding wall-clock dollars.&lt;/li&gt;
&lt;li&gt;Port to Workers or Cloud Run. Keep the AWS-native event glue on Lambda. Do not try a big-bang migration. Move one function, watch it for two days, move the next.&lt;/li&gt;
&lt;li&gt;Watch the bill for two cycles. Decommission API Gateway routes and CloudWatch log groups as functions go quiet. The teardown is where 30% of the savings actually live.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We did this over six weeks. The hardest part was not the rewrites. The hardest part was untangling which CloudWatch log groups still mattered and which ones quietly cost $40 a month to keep indexed. The second hardest part was convincing one of our seniors that the AWS SDK he had written wrappers around for two years was the part we were throwing out.&lt;/p&gt;

&lt;h2&gt;
  
  
  8. What We Are Not Coming Back For
&lt;/h2&gt;

&lt;p&gt;Lambda was the right shape in 2015. The shape of cloud has moved. V8 isolates that start in 2ms. Container platforms with sub-second cold boots. Edge runtimes that bill CPU instead of wall clock. Those are the new default for the workload Lambda used to own.&lt;/p&gt;

&lt;p&gt;We will not migrate back. Not because Workers is perfect, but because the bundle math does not reverse. If AWS dropped Lambda pricing to $0.10 per million tomorrow, our bill would drop by 11%. The other 89% would still be the orchestration tax.&lt;/p&gt;

&lt;p&gt;The $0.20 per million was a beautiful marketing line. That was never the price. The price was always the bundle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;If your team is building an API in 2026 and Lambda is the first thing on the architecture diagram, ask why. Ask it out loud. The honest answer is almost always organizational momentum, not technical fit. Someone built this pattern at their last job. The first hire wired it up because the AWS reference architecture said to. Nobody questioned it because nobody had time to audit Cost Explorer.&lt;/p&gt;

&lt;p&gt;We had that conversation. Six weeks later our bill was 79% smaller and our p99 was 22 times faster.&lt;/p&gt;

&lt;p&gt;Lambda is dead for the workload most teams actually run. The pricing page never told you why. The bill always will.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GDS K S&lt;/strong&gt; · &lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt; · follow on X &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;@thegdsks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Lambda is dead for the workloads most teams actually run. The bundle around it was always the real product.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://medium.com/infradecodedops/aws-lambda-is-your-worst-performing-cost-saver-the-2026-cold-start-data-will-shock-you-0822418e57c8" rel="noopener noreferrer"&gt;Sandesh, &lt;em&gt;AWS Lambda Is Your Worst-Performing Cost-Saver: The 2026 Cold Start Data&lt;/em&gt;&lt;/a&gt;. Source for 23% cold start figure and 1.8s p99.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://leanopstech.com/blog/aws-lambda-pricing-2026/" rel="noopener noreferrer"&gt;LeanOps, &lt;em&gt;AWS Lambda Pricing 2026: Costs, Fees and Hidden Traps&lt;/em&gt;&lt;/a&gt;. Source for 20-40% base Lambda share of total spend.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/alanwest/aws-lambdas-hidden-costs-when-to-migrate-to-containers-and-how-2h1n"&gt;Alan West, &lt;em&gt;AWS Lambda's Hidden Costs: When to Migrate to Containers&lt;/em&gt;&lt;/a&gt;. Source for the crossover point math.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.vantage.sh/blog/cloudflare-workers-vs-aws-lambda-cost" rel="noopener noreferrer"&gt;Vantage, &lt;em&gt;Cloudflare Workers vs AWS Lambda Cost&lt;/em&gt;&lt;/a&gt;. Source for CPU-time vs wall-clock pricing breakdown.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://leaper.dev/blog/cloudflare-workers-vs-lambda-2026" rel="noopener noreferrer"&gt;Leaper, &lt;em&gt;Cloudflare Workers vs AWS Lambda 2026&lt;/em&gt;&lt;/a&gt;. Source for 50-80% cost reduction at scale and Baselime case.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://blog.rebalai.com/en/2026/03/09/cloudflare-workers-vs-aws-lambda-which-edge-runtim/" rel="noopener noreferrer"&gt;Rebal AI, &lt;em&gt;Cloudflare Workers vs AWS Lambda: Six Months of Production Reality&lt;/em&gt;&lt;/a&gt;. Source for six-month production comparison.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.sitepoint.com/case-study-cloud-to-local-ai-pwa/" rel="noopener noreferrer"&gt;SitePoint, &lt;em&gt;Case Study: Cloud to Local-First AI Migration&lt;/em&gt;&lt;/a&gt;. Source for the $2,400 to $140 PWA migration.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>cloudflare</category>
      <category>webdev</category>
    </item>
    <item>
      <title>The first time you watch an AI agent buy something, you will feel something you cannot name.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Sun, 10 May 2026 17:08:53 +0000</pubDate>
      <link>https://dev.to/thegdsks/the-first-time-you-watch-an-ai-agent-buy-something-you-will-feel-something-you-cannot-name-35f3</link>
      <guid>https://dev.to/thegdsks/the-first-time-you-watch-an-ai-agent-buy-something-you-will-feel-something-you-cannot-name-35f3</guid>
      <description>&lt;p&gt;&lt;em&gt;A 91 second experiment, an $11.78 charge, and a moment of hesitation that surprised me more than the result.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I knew the agent was going to spend money. I had set the cap. I had created the Stripe Project. I had signed the OAuth flow. I had watched a YouTube demo of somebody else doing the same thing two hours earlier.&lt;/p&gt;

&lt;p&gt;When the moment came, my hand still moved toward Control C.&lt;/p&gt;

&lt;p&gt;The agent was 38 seconds into its run. It had checked the Cloudflare API, found &lt;code&gt;quiet-thunder-7821.dev&lt;/code&gt; available, queried the registrar, and received a 402 Payment Required response with a price of $11.78 for one year. The next thing it would do, in the next four seconds: charge the card.&lt;/p&gt;

&lt;p&gt;I let it.&lt;/p&gt;

&lt;p&gt;The charge cleared. The domain registered. The Worker deployed. The smoke test passed. The agent printed a URL. I copied the URL into a browser. The page said &lt;code&gt;hi from an agent&lt;/code&gt; in plain text on a white background. From the moment I ran the script to the moment the page rendered: 91 seconds.&lt;/p&gt;

&lt;p&gt;I sat there for a long minute after that, not doing anything. Just looking at the cursor.&lt;/p&gt;

&lt;p&gt;This piece is about that minute.&lt;/p&gt;

&lt;h2&gt;
  
  
  The protocol, briefly
&lt;/h2&gt;

&lt;p&gt;Cloudflare and Stripe shipped something this week called Machine Payments Protocol. The technical version is HTTP 402 with a JSON price body, OAuth scoped to a per agent Stripe Project, and a default $100 monthly spending cap that lives on Stripe's side. The marketing version is "agents can now provision Cloudflare accounts, register domains, and deploy applications without human intervention".&lt;/p&gt;

&lt;p&gt;The cap is the part that lets you sleep. Stripe enforces it server side. The agent cannot raise it from inside its own runtime. If the agent goes wild and tries to spend $5,000 on premium domains, Stripe stops it at $100 and you get a notification. The blast radius stays bounded.&lt;/p&gt;

&lt;p&gt;I knew that. I had read the docs page twice. The cap was not the reason my hand moved.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why my hand moved
&lt;/h2&gt;

&lt;p&gt;The thing I underestimated was the difference between "the agent could spend money" as a concept and "the agent is about to spend my money" as a live event.&lt;/p&gt;

&lt;p&gt;I have given AI tools access to my code repository. I have given them access to my email. I have given them production database read credentials. None of those felt the way this felt. The pattern is similar. The instinct was different.&lt;/p&gt;

&lt;p&gt;I think the difference is that money is the one resource I have a lifelong physical relationship with. I have handled cash. I have signed checks. I have watched receipts print at gas stations. My brain encoded "spending" through years of physical signal. The mental model for "reading email" did not get the same wiring. When the agent was about to spend, that part of my brain woke up and asked who had approved this.&lt;/p&gt;

&lt;p&gt;I had approved it. I had set the cap. The amount was small. My reasoning brain knew that. My reasoning brain was not driving my hand.&lt;/p&gt;

&lt;p&gt;I think a lot of the resistance to agentic payments over the next year is going to look like this. Not measured objections. Not policy debates. A specific and slightly embarrassing instinct that fires the first time you watch one happen.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing I was not expecting
&lt;/h2&gt;

&lt;p&gt;The agent picked the domain name itself. I had told it to pick something in the format adjective-noun-number.dev. I had not told it to pick &lt;code&gt;quiet-thunder-7821&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I sat with that for longer than I sat with the payment. The payment was a transaction with bounded outcomes. The naming was an aesthetic choice. An aesthetic choice made by software, on my behalf, with my money, about a thing that would now exist in the world under my account and bill to my card every year if I did not delete it.&lt;/p&gt;

&lt;p&gt;I do not know what to do with that observation. The agent's choice was fine. &lt;code&gt;quiet-thunder-7821.dev&lt;/code&gt; is a perfectly cromulent domain. If you had asked me to pick one in 30 seconds I might have done worse.&lt;/p&gt;

&lt;p&gt;But the domain was not mine. The domain belonged to the agent's taste. The card was mine, the legal liability was mine, the renewal would be mine. The taste was the agent's.&lt;/p&gt;

&lt;p&gt;This is the part that nobody is talking about yet. The protocol is well designed. The cap is well placed. The OAuth scoping is correct. The unanswered question is what happens when agents start making aesthetic and judgment calls inside the bounds we set for them, and we discover that the bounds were the easy part.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I did with the domain
&lt;/h2&gt;

&lt;p&gt;I deleted it the next morning. The Worker came down. The DNS unwound. The Stripe Project archived itself with a complete ledger of every cent the agent had touched. Total spend on the experiment: $11.78 for the registration plus a fraction of a cent for the Worker compute.&lt;/p&gt;

&lt;p&gt;The deletion took six seconds. The registration was non refundable. So somewhere out there, in a Cloudflare ledger, $11.78 of mine paid for a domain that lived for 14 hours and whose only content was the string &lt;code&gt;hi from an agent&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I am fine with that. The 14 hour domain was the price of the demonstration. The demonstration was worth more than the domain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I am going to do next
&lt;/h2&gt;

&lt;p&gt;The next experiment is bigger. I am going to give an agent a slightly larger budget and ask it to spin up a real piece of software. A small SaaS. Frontend, backend, storage, a Stripe Checkout flow that takes payments from real users and routes them to a real revenue split. End to end, from a cold start, with no human in the deploy loop.&lt;/p&gt;

&lt;p&gt;I am giving the agent a $200 monthly cap. I am giving it a virtual card with a $300 balance. I am giving it a domain budget of $20. I am scoping the Stripe Project so it cannot reach my main account.&lt;/p&gt;

&lt;p&gt;The reason I am doing this is not because I think it will work the first time. I think it will fail somewhere in the middle. I think the failure mode will be interesting. I think the cap will save me.&lt;/p&gt;

&lt;p&gt;The reason I am writing it down ahead of time is that I want to commit, in public, to running the experiment without bailing out at the moment my hand moves toward Control C. If you check back in two weeks, I will tell you what happened. &lt;/p&gt;

&lt;h2&gt;
  
  
  The closing
&lt;/h2&gt;

&lt;p&gt;The first time you watch an AI agent buy something, you will feel something you cannot put a name on. The feeling will fire in the middle of an event you approved, sized correctly, capped appropriately, and conceptually understood. The feeling will not care.&lt;/p&gt;

&lt;p&gt;I think the right thing to do with the feeling is to notice it, decide whether to act on it, and let the agent finish if you decide not to. The cap is real. The audit trail is real. The protocol is well designed. The instinct that fires anyway is older than any of those things.&lt;/p&gt;

&lt;p&gt;If you are about to run your first agent payment, run it small. Run it on a thing you can delete. Run it when you have time to sit with the cursor for a minute afterward. The minute is part of the experience.&lt;/p&gt;

&lt;p&gt;What are you going to give an agent a card for?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by **GDS K S&lt;/em&gt;* (&lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt;), building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt;.*&lt;br&gt;
&lt;em&gt;If this was useful, follow me on &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;X / @thegdsks&lt;/a&gt;. I write about the parts of the AI stack vendors keep off the pricing page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>cloudflare</category>
      <category>stripe</category>
    </item>
    <item>
      <title>Anthropic hit B ARR in 16 months. I went looking for where the money is actually coming from.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Sat, 09 May 2026 16:41:10 +0000</pubDate>
      <link>https://dev.to/thegdsks/anthropic-hit-b-arr-in-16-months-i-went-looking-for-where-the-money-is-actually-coming-from-5f25</link>
      <guid>https://dev.to/thegdsks/anthropic-hit-b-arr-in-16-months-i-went-looking-for-where-the-money-is-actually-coming-from-5f25</guid>
      <description>&lt;p&gt;&lt;em&gt;A revenue chart that goes vertical does not tell you who pays. It tells you who gets charged.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;A friend who runs infrastructure at a 200 person SaaS company called me on Monday. He had just gotten the quarterly bill from his AI vendor through an enterprise contract. He told me the number. I asked him to repeat it.&lt;/p&gt;

&lt;p&gt;The number ran high. Not "we are building a foundation model" high. High enough that he had a week of his life mapped out to write a proposal to bring some of the workloads on prem so the company could keep the AI features without the bill getting written into the next earnings call.&lt;/p&gt;

&lt;p&gt;That conversation kept me thinking. Anthropic's annualized revenue went from roughly $1 billion in late 2024 to $30 billion as of April 2026. A 30x in 16 months. That is not a SaaS curve. It is not a marketplace curve. It is a curve that comes from a small number of contracts each writing a number with a comma in it.&lt;/p&gt;

&lt;p&gt;So I went looking for the line items. Not the press releases. The places the money actually flows from. Here is what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  What just happened, in numbers
&lt;/h2&gt;

&lt;p&gt;The revenue trajectory:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Late 2024: ~$1 billion ARR&lt;/li&gt;
&lt;li&gt;Mid 2025: ~$5 billion ARR&lt;/li&gt;
&lt;li&gt;Late 2025: ~$15 billion ARR&lt;/li&gt;
&lt;li&gt;April 2026: ~$30 billion ARR&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In April, Anthropic passed OpenAI on revenue for the first time. The two companies shifted from "duopoly with one obvious leader" to "duopoly where the lead changes hands by quarter."&lt;/p&gt;

&lt;p&gt;Behind the headline number sits a much smaller story than most people want to make it. The breakdown that leaked out across coverage points to roughly three quarters of revenue coming from API calls, the rest from Pro and Max subscriptions plus smaller enterprise integrations. Within the API revenue, a handful of customers account for an outsized share. AWS, through Bedrock. Microsoft, through the new Copilot integrations. Three hyperscalers and four large coding tool vendors who resell Claude wrapped in their own products.&lt;/p&gt;

&lt;p&gt;When Anthropic says they hit $30 billion ARR, the honest reading is: a handful of large enterprise contracts, a handful of large coding tool vendors paying through the nose for Opus 4.7 access on behalf of their users, and a tail of API and subscription revenue from everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing nobody is saying
&lt;/h2&gt;

&lt;p&gt;Individual developers do not write the $30 billion check. Their employers do, twice removed.&lt;/p&gt;

&lt;p&gt;Trace the path. A senior engineer at a mid sized company opens GitHub Copilot, picks Claude Sonnet 4.6 from the model dropdown that GitHub added last month, and asks it to refactor a function. That call goes to GitHub. GitHub forwards it to Anthropic. Anthropic charges GitHub the per token API rate. GitHub charges the company a Copilot Enterprise seat plus, starting June 1, premium request budgets. The company charges, ultimately, the customers of its product through whatever line item is closest to "engineering labor cost".&lt;/p&gt;

&lt;p&gt;Every leg of that chain takes margin. The model call costs $0.04 in raw inference. By the time three companies have wrapped, billed, and amortized that call, the end customer pays a multiple of the original cost.&lt;/p&gt;

&lt;p&gt;Anthropic gets the smallest cut by percentage. They get the largest cut by absolute dollars, because the volume runs enormous and the gross margin on inference at scale runs high. The wrappers earn the rest. The end users notice nothing because the cost lives buried in seat fees and monthly minimums.&lt;/p&gt;

&lt;p&gt;That is the mechanism. The $30 billion is not magic. The number reflects the visible part of an iceberg of indirect billing that runs through every developer tool that touched a Claude API key in the last 12 months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Numbers that matter
&lt;/h2&gt;

&lt;p&gt;A few benchmarks to anchor the scale:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vendor&lt;/th&gt;
&lt;th&gt;Approx ARR&lt;/th&gt;
&lt;th&gt;When&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;$30B&lt;/td&gt;
&lt;td&gt;April 2026&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snowflake&lt;/td&gt;
&lt;td&gt;$4B&lt;/td&gt;
&lt;td&gt;Late 2024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Databricks&lt;/td&gt;
&lt;td&gt;$3B&lt;/td&gt;
&lt;td&gt;Late 2024&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;$2B&lt;/td&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Anthropic now ranks larger by revenue than any independent data company in history. They got there in two years from product launch. The closest comparable: the early growth of AWS, which took eight years to reach the same scale and had Amazon's retail business funding it.&lt;/p&gt;

&lt;p&gt;The other number that matters: gross margin. Public analyst estimates put inference gross margin at large hyperscale operators in the 50 to 70 percent range. If Anthropic sits at the lower end, $30 billion ARR generates roughly $15 billion in gross profit. That covers a lot of training compute. Not yet a lot of net profit, because training the next model class still costs enough to consume most of the gross.&lt;/p&gt;

&lt;p&gt;Which is where the pressure on the consumer subscriptions comes from. Pro plan subscribers cost more to serve than they pay, on average, when they use Claude Code heavily. Enterprise customers pay per token and are profitable per call. The math points in one direction. Optimize for the segment that pays per use. Defend the consumer segment as a brand and onboarding play, but do not let it grow faster than the gross can carry.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest take
&lt;/h2&gt;

&lt;p&gt;The thing that makes this market different from past compute waves: cost scales hyper in both directions. A single team using Claude Code aggressively can spend more in a month than they did on cloud the entire previous year. A single agent loop run wrong can spend a thousand dollars overnight. The unit economics of an inference call now drive the unit economics of software.&lt;/p&gt;

&lt;p&gt;Exhilarating if you build the wrappers. Uncomfortable if you write the checks. The companies my friend works with are realizing that the line item labeled "AI tooling" will keep growing every quarter for the foreseeable future, because the tools are getting better at burning tokens, not just at writing code.&lt;/p&gt;

&lt;p&gt;Two things follow.&lt;/p&gt;

&lt;p&gt;First, the wrapper layer holds the next big margin compression. Every coding tool that resells Claude or GPT sells on convenience, not on inference cost. The convenience is real. The convenience is also not defensible. Anyone with two days and an API key can build a thinner wrapper for their own team that captures 80% of the value. The wrappers know this. That explains why GitHub, Cursor, and Windsurf have all announced model dropdowns and pricing tiers in the last 90 days. They are racing to build platform features around the model so that switching costs grow before the customer notices the bill.&lt;/p&gt;

&lt;p&gt;Second, the agentic workflow becomes the new unit. Inference cost per chat message converges across vendors and approaches a floor. Inference cost per agent run varies by ten times depending on how you write the agent. The next wave of optimization work happens here, and individual engineering teams can actually move the needle on their own bills without waiting for the vendor to drop prices.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I am doing about it
&lt;/h2&gt;

&lt;p&gt;Three concrete things, in case they help.&lt;/p&gt;

&lt;p&gt;I cancelled the AI seat subscription that I was using for chat and switched to direct API access via a small script. The gross saving is small. The visibility is large. I now know exactly what each conversation costs.&lt;/p&gt;

&lt;p&gt;I built a multi model router that picks the cheapest model that can do the job. About 60 percent of my queries route to Haiku. The bill dropped 41 percent in the first month with no perceptible quality loss on the routed traffic.&lt;/p&gt;

&lt;p&gt;I started logging the cost of every AI call in our internal projects, broken out by feature. The first week of data was an unflattering surprise. One feature accounted for half the bill. We cut its budget by limiting context length and the entire feature still works. Nobody noticed.&lt;/p&gt;

&lt;p&gt;None of this is novel. Old infrastructure habits applied to a new compute primitive. The only reason these moves feel novel: the AI vendor pitch has read "do not worry about the cost, the wrapper handles it" for so long that thinking about the bill counts as a contrarian position.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing
&lt;/h2&gt;

&lt;p&gt;The headline number is real. The story behind it runs smaller and stranger than it sounds. A handful of contracts with hyperscalers. A handful of coding tool vendors paying through the nose to put Opus 4.7 in front of their users. The rest: a fast growing tail of API and subscription revenue from individual developers and small teams.&lt;/p&gt;

&lt;p&gt;Anthropic earned the number. They built the better model, they shipped the better agent, and the market rewarded them with a revenue chart that reads more like an oil discovery than a software company. None of that comes off as wrong.&lt;/p&gt;

&lt;p&gt;Worth keeping in mind: a feedback loop funds the chart. Better model. More usage. Higher bill. Higher bill funds more training. More training produces a better model. The loop runs as long as the customers keep paying without flinching. The day they flinch will be the day the wrapper market starts compressing and the routing layer becomes the most valuable real estate in the AI stack.&lt;/p&gt;

&lt;p&gt;I think that day comes sooner than the press releases suggest. The pricing tremors of April 2026 are the early reading. Watch the bill.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by **GDS K S&lt;/em&gt;* (&lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt;), building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt;.*&lt;br&gt;
&lt;em&gt;If this was useful, follow me on &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;X / @thegdsks&lt;/a&gt;. I write about the parts of the AI stack vendors keep off the pricing page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>anthropic</category>
      <category>ai</category>
      <category>startup</category>
      <category>economics</category>
    </item>
    <item>
      <title>Anthropic just rented Elon Musk's data center. The price of a Claude token is about to make sense.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Thu, 07 May 2026 13:25:45 +0000</pubDate>
      <link>https://dev.to/thegdsks/anthropic-just-rented-elon-musks-data-center-the-price-of-a-claude-token-is-about-to-make-sense-lc3</link>
      <guid>https://dev.to/thegdsks/anthropic-just-rented-elon-musks-data-center-the-price-of-a-claude-token-is-about-to-make-sense-lc3</guid>
      <description>&lt;p&gt;&lt;em&gt;A 300 megawatt deal, a $180 a month decision, and the strange feeling of having a vendor solve your problem 12 minutes before you were about to pay them more.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I had the Anthropic billing page open. Two columns. Max 5x on the left, what I was paying. Max 20x on the right, what I was about to pay. The difference was $180 a month. The reason for the difference was that my Tuesday night Claude Code sessions had started bumping into the peak hour throttle, and the throttle was making the agent loop sluggish in the exact moments I needed it to be fast.&lt;/p&gt;

&lt;p&gt;I was on the upgrade page because I had decided, two days earlier, that another $180 a month was a fair price for not having to feel the throttle. I had walked away from the page once already. I had come back. The cursor was hovering over the Confirm button.&lt;/p&gt;

&lt;p&gt;A friend texted me a CNBC link.&lt;/p&gt;

&lt;p&gt;I read the headline. I read the subhead. I closed the upgrade tab.&lt;/p&gt;

&lt;p&gt;The headline said Anthropic had signed a deal with SpaceX for 300 megawatts of compute capacity at the Memphis Colossus 1 data center. Two hundred and twenty thousand NVIDIA GPUs coming online within a month. The blog post Anthropic put out alongside it said the new capacity would lift peak hour reductions on Pro and Max accounts and raise per request limits on Opus.&lt;/p&gt;

&lt;p&gt;In other words, the throttle I was about to pay $180 a month to outrun was about to ease on its own. Within 30 days. For free.&lt;/p&gt;

&lt;p&gt;This piece is about why I think that timing matters more than the headline lets on.&lt;/p&gt;

&lt;h2&gt;
  
  
  The throttle was telling us something
&lt;/h2&gt;

&lt;p&gt;For the last 60 days, every Anthropic pricing headline read like a tighter limit. April 21, the brief disappearance of Claude Code from the $20 Pro plan. Late April, the weekly caps creeping down. Early May, the peak hour reductions that pushed me to consider Max 20x in the first place.&lt;/p&gt;

&lt;p&gt;I had read those moves the way I think a lot of users read them: as repricing. As Anthropic discovering that the bundle did not work at current usage levels and starting to walk it back. The implicit story was about cost.&lt;/p&gt;

&lt;p&gt;The SpaceX deal tells me the story was always about capacity. Not what the inference cost, but how many requests the existing fleet could serve at peak. When you cannot serve every customer who wants in, you do not raise prices. You throttle. The throttle looks like a pricing problem from outside. Inside, the throttle is a queue.&lt;/p&gt;

&lt;p&gt;A 300 megawatt deal solves a queue. Repricing solves a cost problem. The fact that Anthropic chose to spend a number with three commas in it on capacity, rather than push a price increase to the existing fleet, tells you which problem they thought they had.&lt;/p&gt;

&lt;p&gt;The implication for those of us watching the pricing page: the moves of the last two months were tactical, not structural. They are reversible. The SpaceX deal is the reversal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes for the Max 5x holder
&lt;/h2&gt;

&lt;p&gt;I am going to be specific because vague is what made me almost hit Confirm.&lt;/p&gt;

&lt;p&gt;Max 5x today gets me roughly 5x the messages and Claude Code usage of Pro, with peak hour reductions kicking in around 4 PM Eastern on weekdays. The reductions are not a hard cap. They slow my agent loop down by about 30 to 40 percent during the affected window. On a Tuesday night when I am in the middle of a refactor, that 30 to 40 percent feels like sand in the gears.&lt;/p&gt;

&lt;p&gt;Max 20x removes most of the peak reductions and lifts the message ceiling four times over. The price is around $200 a month, give or take depending on the billing cycle.&lt;/p&gt;

&lt;p&gt;The SpaceX capacity comes online in 30 days. According to Anthropic's own messaging, peak hour reductions on Max accounts ease as that capacity lights up. The question for me, sitting on the upgrade page, became: am I willing to pay $180 a month for the next 30 days of unthrottled access, knowing that by month two the throttle on Max 5x will likely ease anyway?&lt;/p&gt;

&lt;p&gt;I closed the tab. I am going to let the 30 days play out. If by mid June my Tuesday nights still feel like sand, I will revisit. If they do not, I just kept $180.&lt;/p&gt;

&lt;p&gt;I am not telling everybody on Max 5x to do the same. I am suggesting everybody on Max 5x should pause before clicking Confirm. The new capacity is real and the timeline is short.&lt;/p&gt;

&lt;h2&gt;
  
  
  The wrappers should be more nervous than they look
&lt;/h2&gt;

&lt;p&gt;Cursor, GitHub Copilot, Windsurf. They all resell Claude under their own pricing. They all spent the last two months absorbing the same throttle I was feeling. Their tactic: a pricing change of their own (premium request budgets, daily quotas, model dropdowns) that pushed some of the cost back onto the user.&lt;/p&gt;

&lt;p&gt;When Anthropic's serving capacity doubles in 30 days, the wrappers lose part of their pitch. The pitch was something like: we route around the capacity issues so you do not have to. If there are no capacity issues to route around, the routing layer is doing less work, and the user is paying for that routing in a seat fee that just got harder to justify.&lt;/p&gt;

&lt;p&gt;I do not think the wrappers go away. I think the conversation with their CFO gets harder in Q3. The numbers that mattered for the last 12 months were "we provide reliable capacity". The numbers that will matter for the next 12 are "we provide better routing than you can build yourself". The first one was easy to charge for. The second one is a much harder sell when the user can write 200 lines of TypeScript and have the routing themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  The orbital footnote
&lt;/h2&gt;

&lt;p&gt;I want to mention this and then drop it, because I do not want it to swallow the piece. The SpaceX press release also said Anthropic had expressed interest in working with SpaceX to develop gigawatts of compute capacity in space. Not next year. Not anytime soon. The phrase: "expressed interest".&lt;/p&gt;

&lt;p&gt;That sentence will spawn a thousand think pieces about orbital data centers. Most of them will be silly. The line shows up in the press release because the comparative cost of cooling at orbit gets interesting once you are spending hundreds of millions a year on terrestrial cooling, and SpaceX has the only credible path to mass orbit at a price that makes the math work.&lt;/p&gt;

&lt;p&gt;I am not writing that think piece today. I just want the record to show that the line was in the announcement. In 18 months somebody will argue Anthropic hid it, and the receipts will say otherwise.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I am going to do
&lt;/h2&gt;

&lt;p&gt;Three things, all small.&lt;/p&gt;

&lt;p&gt;Sit on Max 5x for 30 more days. Watch the peak hour throttle. If it eases, the upgrade decision is dead. If it does not, I revisit.&lt;/p&gt;

&lt;p&gt;Log my Anthropic API spend daily, not weekly. The 30 day window is short enough that a weekly snapshot misses the inflection. Daily is cheap to set up. I want a clean before and after on the throttle change.&lt;/p&gt;

&lt;p&gt;Send the upgrade page screenshot to the Glincker channel as a reminder. The thing I almost did made sense at the moment I almost did it. Twelve minutes later, the same decision read wrong. The lesson: pricing decisions made under throttle differ from pricing decisions made when the throttle is going away. Wait if you can.&lt;/p&gt;

&lt;h2&gt;
  
  
  The closing
&lt;/h2&gt;

&lt;p&gt;I had the upgrade page open. The cursor was hovering. A friend sent a link. I closed the page.&lt;/p&gt;

&lt;p&gt;The thing I want you to take, if you take anything: when a vendor announces capacity that solves the exact problem you were about to pay them more to outrun, wait. Not forever. Thirty days. The cost of waiting is one Tuesday night with a slow agent. The cost of upgrading is $180 a month, every month, until you remember to cancel.&lt;/p&gt;

&lt;p&gt;Most of us are not going to remember to cancel.&lt;/p&gt;

&lt;p&gt;If you are on Max 5x and were eyeing Max 20x for the same reason I was, the only honest take I can give you is the one I gave myself this afternoon. Watch the throttle through the first week of June. Decide then. The capacity will have arrived or it will not. Right now we do not know. Spending $180 to find out is not the right way to learn.&lt;/p&gt;

&lt;p&gt;What is on your billing page right now that you were about to upgrade?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by **GDS K S&lt;/em&gt;* (&lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt;), building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt;.*&lt;br&gt;
&lt;em&gt;If this was useful, follow me on &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;X / @thegdsks&lt;/a&gt;. I write about the parts of the AI stack vendors keep off the pricing page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>anthropic</category>
      <category>ai</category>
      <category>infrastructure</category>
      <category>startup</category>
    </item>
    <item>
      <title>I built a 200 line AI router in TypeScript. My monthly bill dropped 41%.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Thu, 07 May 2026 03:34:05 +0000</pubDate>
      <link>https://dev.to/thegdsks/i-built-a-200-line-ai-router-in-typescript-my-monthly-bill-dropped-41-23ok</link>
      <guid>https://dev.to/thegdsks/i-built-a-200-line-ai-router-in-typescript-my-monthly-bill-dropped-41-23ok</guid>
      <description>&lt;p&gt;I track my own AI spend across three projects. In March, the line item that grew fastest was not Claude or GPT calls. It was my Cursor seat plus my Copilot seat plus the Anthropic API I was hitting from a personal CLI. Three subscriptions, three meters, and the same Opus tokens billed twice because Cursor was sending the same context to its own backend that I was already passing through to Anthropic directly.&lt;/p&gt;

&lt;p&gt;The wrappers do not advertise this. The router code is not their product. The product is the convenience of not thinking about which model handles which prompt. You pay the orchestration tax in margin baked into the seat price.&lt;/p&gt;

&lt;p&gt;I got tired of paying it. So I wrote the router. It is 200 lines of TypeScript. My April bill came in 41% under March on roughly the same volume of work.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input $/M tokens&lt;/th&gt;
&lt;th&gt;Output $/M tokens&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Haiku 4.5&lt;/td&gt;
&lt;td&gt;0.80&lt;/td&gt;
&lt;td&gt;4.00&lt;/td&gt;
&lt;td&gt;Lookups, classification, typo fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet 4.6&lt;/td&gt;
&lt;td&gt;3.00&lt;/td&gt;
&lt;td&gt;15.00&lt;/td&gt;
&lt;td&gt;Default coding, refactors, code review&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Opus 4.7&lt;/td&gt;
&lt;td&gt;5.00&lt;/td&gt;
&lt;td&gt;25.00&lt;/td&gt;
&lt;td&gt;Multi step planning, architecture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 mini&lt;/td&gt;
&lt;td&gt;0.50&lt;/td&gt;
&lt;td&gt;2.00&lt;/td&gt;
&lt;td&gt;Cheap classification, embeddings prep&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 41% saving came from one thing: stopping Sonnet from handling tasks that Haiku could finish in a tenth of the cost. Most coding queries are lookups dressed up as questions. Route by intent, not by habit.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The orchestration tax is real
&lt;/h2&gt;

&lt;p&gt;Every wrapper makes the same trade. They pick a model for you, they prepend a system prompt you cannot edit, and they hold a context window you cannot inspect. In return, you do not have to think.&lt;/p&gt;

&lt;p&gt;The cost of not thinking shows up two ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The wrapper calls the most expensive model that fits its SLA, because that is what makes the demo look good&lt;/li&gt;
&lt;li&gt;The wrapper bills you for context it sent on your behalf, including its own system prompt and tool definitions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I logged 30 days of Cursor usage against the Anthropic dashboard. Cursor was sending an average of 8,400 input tokens per chat turn. My direct API calls for the same chats averaged 1,900. The 6,500 token delta is Cursor's frame, plus indexing context, plus its agent scaffolding. Useful, but not free.&lt;/p&gt;

&lt;p&gt;When you build the router yourself, you choose what to send. That is the whole game.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The 200 line router
&lt;/h2&gt;

&lt;p&gt;Here is the file. Drop it in a project, give it your API keys, and it picks a model per request based on rules you control.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// router.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;Anthropic&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@anthropic-ai/sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;OpenAI&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;trivial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;plan&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;embed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;RouteRule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;ModelConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ROUTES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ModelConfig&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;trivial&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;plan&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8192&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;openai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;RULES&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;RouteRule&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;trivial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\?&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;trivial&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;what is|define|fix typo|rename&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;plan&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;refactor|design|architect|migrate|plan&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;``&lt;/span&gt;&lt;span class="err"&gt;`
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="nc"&gt;const&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;embed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CLASSIFY:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pickIntent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;Intent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rule&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;RULES&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;rule&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;anthropic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OPENAI_API_KEY&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;RouteResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;costUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;PRICING&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;25&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-5-mini&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;in&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;priceCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;inTok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;outTok&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;PRICING&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;inTok&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;outTok&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="nx"&gt;_000_000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;RouteResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pickIntent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ROUTES&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;anthropic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;costUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;priceCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;costUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;priceCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cfg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That is it. Two providers, four intents, five rules, and a cost calculator. Use it like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
import { route } from "./router";

const out = await route("rename this function from getUser to fetchUser");
console.log(out.model, out.costUsd.toFixed(5));
// claude-haiku-4-5-20251001 0.00012



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The rules are deliberately dumb. Length plus regex covers maybe 70% of routing decisions correctly. For the other 30%, override with a prefix:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
typescript
await route("[force:opus] design a permissions model for ...");


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add a one liner to &lt;code&gt;pickIntent&lt;/code&gt; to read the prefix. I left it out to keep the example tight.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Routing rules that actually work
&lt;/h2&gt;

&lt;p&gt;The naive approach is to send a tiny classifier call to a cheap model and have it pick the route. That sounds smart and costs more than it saves, because every request now eats two API calls. The cost of pickIntent must be zero.&lt;/p&gt;

&lt;p&gt;Five regex rules cover most of my workload:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Short and ends in a question mark: trivial&lt;/li&gt;
&lt;li&gt;Starts with "what is", "define", "fix typo", "rename": trivial&lt;/li&gt;
&lt;li&gt;Contains "refactor", "design", "architect", "migrate", "plan": plan&lt;/li&gt;
&lt;li&gt;Contains code fence or function keyword: code&lt;/li&gt;
&lt;li&gt;Starts with "CLASSIFY:" prefix: embed (cheap classifier)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Default to code. A wrong route from trivial to code costs maybe 4x more on that one request. A wrong route from code to opus costs 1.6x. Neither is a disaster. The bug to avoid is sending Haiku a multi step plan it cannot hold context for, which means default conservatively.&lt;/p&gt;

&lt;p&gt;I also log every miss. After two weeks I had a small CSV of "this prompt routed to X but should have been Y". I added two regex rules and the miss rate dropped from 8% to under 2%.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The 41% number, broken down
&lt;/h2&gt;

&lt;p&gt;March bill, no router:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Calls&lt;/th&gt;
&lt;th&gt;Spend&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cursor seat&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;$20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot seat&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic direct&lt;/td&gt;
&lt;td&gt;4,200&lt;/td&gt;
&lt;td&gt;$87&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI direct&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;$14&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$131&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;April bill, with router (cancelled Cursor, kept Copilot for IDE inline only):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Calls&lt;/th&gt;
&lt;th&gt;Spend&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cursor seat&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot seat&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;$10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic via router&lt;/td&gt;
&lt;td&gt;5,100&lt;/td&gt;
&lt;td&gt;$54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI via router&lt;/td&gt;
&lt;td&gt;1,400&lt;/td&gt;
&lt;td&gt;$13&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$77&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That is 41% lower on 30% more total calls. The router shifted 62% of calls onto Haiku, which was eating workloads Sonnet had been handling. Average cost per call dropped from $0.024 to $0.013.&lt;/p&gt;

&lt;p&gt;The Cursor cancel did the headline saving. The router did the smaller, repeating, compounding saving. Both come from the same idea: the wrapper is hiding decisions you could make better yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. What this does not do
&lt;/h2&gt;

&lt;p&gt;This is not an agent framework. It does not stream. It does not retry. It does not cache. It does not handle rate limits. It does not do tool use. It does not know about your codebase.&lt;/p&gt;

&lt;p&gt;Adding any of those takes work. Streaming is two changes. Caching with the Anthropic prompt cache is one extra header on each call. Retries with exponential backoff is 20 lines. Tool use requires schema plumbing you would write anyway.&lt;/p&gt;

&lt;p&gt;If you need all of that, use a real framework. If you want to stop paying the orchestration tax on 80% of your calls, the 200 lines above will do it. Add the rest as you actually hit each problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Wrappers exist because routing AI calls is annoying. It is also the highest leverage thing you can own in your own code. The 200 lines above are not a moat. They are a Tuesday afternoon. The reason to write them is that you cannot improve a bill you cannot see.&lt;/p&gt;

&lt;p&gt;What is your current ratio of cheap model to expensive model calls? If you do not know, that is the first thing to fix. Wire up cost logging before you wire up the router. The numbers will surprise you.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GDS K S&lt;/strong&gt; · &lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt; · building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt; · follow on X &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;@thegdsks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The orchestration tax is the part of the AI bill that does not show up on the pricing page.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>typescript</category>
      <category>ai</category>
      <category>webdev</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Cloudflare and Stripe just let agents buy domains and ship code. Here is the API.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Thu, 07 May 2026 03:27:56 +0000</pubDate>
      <link>https://dev.to/thegdsks/cloudflare-and-stripe-just-let-agents-buy-domains-and-ship-code-here-is-the-api-59pb</link>
      <guid>https://dev.to/thegdsks/cloudflare-and-stripe-just-let-agents-buy-domains-and-ship-code-here-is-the-api-59pb</guid>
      <description>&lt;p&gt;On April 30, Cloudflare quietly published a blog post and a docs page that nobody outside the agentic AI crowd noticed for about 24 hours. The headline buried a real shift. AI agents can now provision a Cloudflare account, register a domain, attach a Stripe payment method, deploy a Worker, and hand you back a live URL. With a single CLI command. With no human in the loop after you accept terms and add a card.&lt;/p&gt;

&lt;p&gt;Stripe Projects is the open beta side. The Cloudflare Agents SDK is the runtime side. The protocol underneath is called Machine Payments Protocol (MPP), co authored by Tempo Labs and Stripe, with x402 (HTTP 402 Payment Required) doing the actual money plumbing. The whole thing went live this week.&lt;/p&gt;

&lt;p&gt;This is the first time the major infra players have published a real specification for "agent buys thing, deploys thing, pays for thing" without manual intervention. The spec is short. The implications are not.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Who provides it&lt;/th&gt;
&lt;th&gt;What it costs&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent provisions Cloudflare account&lt;/td&gt;
&lt;td&gt;Cloudflare Agents SDK&lt;/td&gt;
&lt;td&gt;Free in beta&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent registers domain&lt;/td&gt;
&lt;td&gt;Cloudflare Registrar via MPP&lt;/td&gt;
&lt;td&gt;At cost, $100/month default cap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent attaches payment&lt;/td&gt;
&lt;td&gt;Stripe Projects&lt;/td&gt;
&lt;td&gt;Card on file at parent Stripe account&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent deploys Worker, Pages, R2&lt;/td&gt;
&lt;td&gt;Workers platform&lt;/td&gt;
&lt;td&gt;Per usage, agent metered&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Spending guardrail&lt;/td&gt;
&lt;td&gt;Per provider cap, default $100/month&lt;/td&gt;
&lt;td&gt;Caller controls&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  1. The protocol in one paragraph
&lt;/h2&gt;

&lt;p&gt;Three components. Discovery is a REST/JSON catalog where each provider publishes what an agent can buy from them and at what price. Authorization uses identity attestation plus OAuth, so the agent calls under a delegated identity scoped to your Stripe project. Payment uses tokenization through Stripe with a default $100/month per provider spending cap, which you can raise or lower per project.&lt;/p&gt;

&lt;p&gt;That is the whole protocol. The cleverness is that x402 turns "you owe money for this API call" into a normal HTTP response status. The agent sees a 402, looks up the price, decides if it can pay, and either pays or asks for a higher cap. No new wire format. No new auth scheme. Just HTTP.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The CLI command that started it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;stripe projects init my-side-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command does five things in sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creates a Stripe Project (a sandboxed sub account scoped to one agent run)&lt;/li&gt;
&lt;li&gt;Provisions API credentials and an OAuth token for the agent&lt;/li&gt;
&lt;li&gt;Connects the project to your Stripe account's payment methods&lt;/li&gt;
&lt;li&gt;Sets a default monthly spend cap (you override with &lt;code&gt;--cap 200&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Outputs a &lt;code&gt;.stripe-project.json&lt;/code&gt; that the Cloudflare Agents SDK reads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After that, an agent calls Cloudflare with the OAuth token, gets a workers account, registers &lt;code&gt;myproject.dev&lt;/code&gt;, deploys a Worker, and emits the live URL. End to end, in the demos, around 90 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The code an agent runs
&lt;/h2&gt;

&lt;p&gt;Here is roughly what the Cloudflare Agents SDK invocation looks like for an agent that builds and deploys a small TypeScript service. I have stripped error handling so the shape is visible.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;CloudflareAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@cloudflare/agents-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;StripeProjectAuth&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@stripe/agents&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;StripeProjectAuth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEnv&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CloudflareAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 1. Get or create an account&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;account&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;accounts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ensure&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;demo-account&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Buy a domain (this hits MPP under the hood, returns 402 first call)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;registrar&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;purchase&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my-side-project.dev&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;budgetUsd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// hard ceiling for this call&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Deploy a Worker with the source&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;cf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;account&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;account&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`
    export default {
      async fetch(request) {
        return new Response("hi from an agent");
      }
    };
  `&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/*`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;zone_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;zone_id&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Tell the human&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`live at https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three points worth pulling out.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;budgetUsd&lt;/code&gt; parameter is per call. The Stripe Project's monthly cap is the global ceiling. So you have two layers: a per request budget (so the agent cannot accidentally spend $50 on a single domain it should not have bought) and a per provider cap (so the agent cannot spend more than $100 in a month total at Cloudflare). Both are enforced server side on Stripe's end. The agent cannot bypass them by editing client code.&lt;/p&gt;

&lt;p&gt;The OAuth token from &lt;code&gt;StripeProjectAuth.fromEnv()&lt;/code&gt; is scoped to the project. If you revoke the project, the token dies. There is no way for the agent to reach into your main Stripe account or your other projects.&lt;/p&gt;

&lt;p&gt;The MPP &lt;code&gt;402&lt;/code&gt; flow is invisible in the snippet above. Behind the scenes, the SDK retries the registrar call with a payment authorization header after seeing the 402. You can hook into that with a &lt;code&gt;onPayment&lt;/code&gt; callback if you want a confirmation step. Most agents will not.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Where the safety story is real and where it is not
&lt;/h2&gt;

&lt;p&gt;Real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The $100 default cap is server enforced. Agent cannot raise it from inside the agent runtime.&lt;/li&gt;
&lt;li&gt;OAuth scoping is per project, not per Stripe account. Compromise of a single agent run does not reach your other money.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;budgetUsd&lt;/code&gt; per call is a hard ceiling, not a hint.&lt;/li&gt;
&lt;li&gt;Stripe ledger logs every cent. You see exactly which agent run spent what.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent can still loop. A buggy agent that calls &lt;code&gt;domain.purchase&lt;/code&gt; in a retry loop will hit the cap, but it will hit it fast. The cap stops the bleeding, not the bug.&lt;/li&gt;
&lt;li&gt;The agent decides what to deploy. If the prompt was "build me a phishing kit", the agent will buy a domain and deploy a phishing kit. Cloudflare and Stripe screen for known abuse patterns, but the threat model is "agent acting on a legitimate user's behalf for an illegitimate purpose" and that is not solved.&lt;/li&gt;
&lt;li&gt;The "stripe projects init" gives the agent a card. You authorized that card. If you give the agent a card that is also your personal card, the agent can spend up to the cap on stuff you would not have bought.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The mitigation pattern that works today: create a Stripe Project per agent run, attach a virtual card with a $100 balance, give the agent the project, and burn the project when the run finishes. You can script this in 10 lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. What this changes in practice
&lt;/h2&gt;

&lt;p&gt;Three things become possible that were not yesterday.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agents that can finish.&lt;/strong&gt; Most coding agents today produce a PR or a local file and stop there because deploying needs human credentials. With this, the agent can ship. That changes how you think about iteration: the loop is no longer "agent writes code, human deploys, human tests" but "agent writes, agent deploys, human reviews live".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost as a programmable signal.&lt;/strong&gt; The 402 response with a price is data the agent can reason about. An agent that wants to register a domain can compare prices across registrars and pick the cheapest. An agent that needs storage can decide between R2 and S3 based on the per request fee. This is the first time price has been a first class API parameter in a way agents can actually use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per agent budgets as a defense layer.&lt;/strong&gt; Today most teams running agents in production have a single API key with a monthly cap on the underlying model provider. With this, you can give each agent run its own scoped budget across compute, storage, and DNS at the same time. That is a much sharper control than you had before.&lt;/p&gt;

&lt;h2&gt;
  
  
  At Glincker we wired this in last night
&lt;/h2&gt;

&lt;p&gt;Our internal agent that drafts and tests code for our content engine now has its own Stripe Project with a $25 weekly cap. It can register subdomains for staging tests, deploy preview Workers, and tear them down when the test finishes. Before, those steps required a human to provision and manually delete. The cap is the only thing standing between us and a runaway loop, and so far the cap has done its job.&lt;/p&gt;

&lt;p&gt;The thing I was nervous about before flipping it on was a buggy retry loop burning $25 in 30 seconds. That has not happened, but I want to log a few hundred runs before I trust it. If you do this, log every 402 and every successful payment with the agent run ID. You will catch the loop the first time it happens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;This is the first agentic payment protocol from infrastructure providers that ships with real guardrails and a real spec. The spec is small enough to read on a Tuesday lunch. The implementation already works. The cap is not a substitute for thinking about the threat model, but it is a real cap.&lt;/p&gt;

&lt;p&gt;If you build with agents, this is a control plane you have wanted for a year. If you do not, the existence of this protocol is a sign that the question "should agents spend money" has officially shifted to "how do we put the right limits on what they spend".&lt;/p&gt;

&lt;p&gt;What are you about to give an agent a card for? I want to read the answer in the comments.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GDS K S&lt;/strong&gt; · &lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt; · building &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt; · follow on X &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;@thegdsks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Agents with a card change the threat model from "what can it do" to "what can it spend before you notice".&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>webdev</category>
      <category>cloudflare</category>
    </item>
    <item>
      <title>Giving an AI agent permission to spawn sub-agents (without losing control)</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Sun, 03 May 2026 22:14:05 +0000</pubDate>
      <link>https://dev.to/thegdsks/giving-an-ai-agent-permission-to-spawn-sub-agents-without-losing-control-5901</link>
      <guid>https://dev.to/thegdsks/giving-an-ai-agent-permission-to-spawn-sub-agents-without-losing-control-5901</guid>
      <description>&lt;h1&gt;
  
  
  Giving an AI agent permission to spawn sub-agents (without losing control)
&lt;/h1&gt;

&lt;p&gt;A reader asked me last week: "If my main agent spawns a sub-agent, what permissions does the sub-agent get? How do I make sure it cannot do more than the parent?"&lt;/p&gt;

&lt;p&gt;This is the agent delegation problem. It comes up the moment you have agents that work in tandem. A planner that hands off to a coder. An orchestrator that fans out to specialists. An MCP server that calls another MCP server on a user's behalf.&lt;/p&gt;

&lt;p&gt;The naive answer is: give the sub-agent the same API key as the parent. This is wrong. Once you do that, the sub-agent can do everything the parent can. If it goes off the rails, you cannot kill it without killing the parent. There is no audit trail per agent. You cannot apply different rate limits.&lt;/p&gt;

&lt;p&gt;The right answer is scoped delegation with revocation. Here is what that looks like.&lt;/p&gt;

&lt;h2&gt;
  
  
  What scoped delegation means
&lt;/h2&gt;

&lt;p&gt;When the parent agent spawns a sub-agent, it issues the sub-agent a token that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is its own credential, not a copy of the parent's&lt;/li&gt;
&lt;li&gt;Inherits a &lt;em&gt;subset&lt;/em&gt; of the parent's permissions, never more&lt;/li&gt;
&lt;li&gt;Has a parent reference, so revoking the parent revokes everything downstream&lt;/li&gt;
&lt;li&gt;Has its own expiry, separate from the parent&lt;/li&gt;
&lt;li&gt;Has a depth limit, so a sub-agent of a sub-agent of a sub-agent eventually hits a wall&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;KavachOS calls this a delegation chain. The data model is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;parent_agent (token A)
  └── sub_agent_1 (token B, parent=A, scopes ⊆ A)
        └── sub_agent_2 (token C, parent=B, scopes ⊆ B)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Revoking A revokes B and C. Revoking B revokes only C. Each token has its own audit trail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createKavach&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavachos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;delegate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;revoke&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavachos/agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKavach&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sqlite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavach.db&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// parent agent has a token with these scopes&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parentToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;planner-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:read&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:write&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deploy:staging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1h&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// parent spawns a coder sub-agent&lt;/span&gt;
&lt;span class="c1"&gt;// it can read and write GitHub, but not deploy&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;coderToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;delegate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;coder-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:read&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:write&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;// dropped deploy:staging&lt;/span&gt;
  &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;30m&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                                &lt;span class="c1"&gt;// shorter than parent&lt;/span&gt;
  &lt;span class="na"&gt;maxDepth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                               &lt;span class="c1"&gt;// coder cannot spawn its own&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;delegate&lt;/code&gt; call:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verifies the requested scopes are a subset of the parent's&lt;/li&gt;
&lt;li&gt;Throws &lt;code&gt;ScopeEscalationError&lt;/code&gt; if the sub-agent asks for a scope the parent does not have&lt;/li&gt;
&lt;li&gt;Sets &lt;code&gt;parent_token_id&lt;/code&gt; so revocation cascades&lt;/li&gt;
&lt;li&gt;Issues a fresh token with its own JTI
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// this throws: ScopeEscalationError&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;badToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;delegate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentToken&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rogue-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:read&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deploy:production&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;// parent does not have it&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// kavach.error.code === "SCOPE_ESCALATION"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The library refuses to issue a token with scopes the parent does not have. This is enforced at issue time, not at validation time, so a misbehaving caller cannot sneak it through.&lt;/p&gt;

&lt;h2&gt;
  
  
  Revocation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;revoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;parentToken&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// every descendant token is also revoked&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Behind the scenes, KavachOS marks the parent token id as revoked. Any token with &lt;code&gt;parent_token_id&lt;/code&gt; matching it (recursively) is rejected on next validation. The agent table has an index on &lt;code&gt;parent_token_id&lt;/code&gt; so this stays fast.&lt;/p&gt;

&lt;p&gt;You can also revoke a single sub-agent without touching the parent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;revoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;coderToken&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// parent and other siblings keep working&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Audit
&lt;/h2&gt;

&lt;p&gt;Every action an agent takes through KavachOS goes into the audit log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool.call&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;agent_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;coder-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;parent_agent_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;planner-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:write&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;repos/kavachos/kavachos/contents/README.md&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2026-04-29T11:42:13Z&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;request_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;req_abc123&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;allowed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When something goes wrong at 3am, you can trace exactly which sub-agent of which parent of which root user did what. This is the answer to "the agent merged a bad PR" type incidents. Without it, you have a Slack message with no fingerprint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why depth limits matter
&lt;/h2&gt;

&lt;p&gt;A planner spawns a coder. The coder spawns a tester. The tester spawns a fixer. The fixer spawns a re-tester. At some point, the chain has to stop. Otherwise an agent stuck in a loop can consume all your tokens and rate limits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parentToken&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;planner-001&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:read&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;github:write&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deploy:staging&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;maxDepth&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1h&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fourth-level descendant fails to issue. You catch this in your orchestrator and decide how to handle it: escalate to a human, fail the task, or split the work differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this matters most
&lt;/h2&gt;

&lt;p&gt;This pattern is what enterprise will demand once agents are doing real work. Right now most teams ship with shared API keys and a hope. That works until it does not. When it does not, you are trying to reconstruct what happened from log fragments.&lt;/p&gt;

&lt;p&gt;Building this from scratch is doable but boring. Token issuance is straightforward. Cascading revocation is not. Depth tracking takes care to get right. The audit log is its own project.&lt;/p&gt;

&lt;p&gt;If you do not want to build it yourself, KavachOS handles all of it as a primitive in the same library that does human auth.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;kavachos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source: &lt;a href="https://github.com/kavachos/kavachos" rel="noopener noreferrer"&gt;github.com/kavachos/kavachos&lt;/a&gt;. MIT.&lt;/p&gt;




&lt;p&gt;If you run multi-agent systems in production today, how do you scope sub-agent permissions? Shared API keys? Per-agent tokens? Something stricter? Curious where the pain is for you.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>typescript</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Free SVG brand icons in 2026: thesvg.org vs svgl vs Simple Icons</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Sat, 02 May 2026 18:01:02 +0000</pubDate>
      <link>https://dev.to/thegdsks/free-svg-brand-icons-in-2026-thesvgorg-vs-svgl-vs-simple-icons-44od</link>
      <guid>https://dev.to/thegdsks/free-svg-brand-icons-in-2026-thesvgorg-vs-svgl-vs-simple-icons-44od</guid>
      <description>&lt;p&gt;If you need brand logos for a dashboard, a marketing page, or a "trusted by" section, the answer in 2026 is no longer just Simple Icons. Three libraries now cover most real-world needs, and they win on different brackets.&lt;/p&gt;

&lt;p&gt;This post is the comparison I wish existed when picking one for a recent build. Real numbers, no invented benchmarks, honest tradeoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Icons&lt;/th&gt;
&lt;th&gt;Color&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Simple Icons&lt;/td&gt;
&lt;td&gt;3000+&lt;/td&gt;
&lt;td&gt;Monochrome only&lt;/td&gt;
&lt;td&gt;CC0&lt;/td&gt;
&lt;td&gt;Footer rows, social icons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;svgl&lt;/td&gt;
&lt;td&gt;657&lt;/td&gt;
&lt;td&gt;Full color, dual variants&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Curated brand grids, design references&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;thesvg.org&lt;/td&gt;
&lt;td&gt;5600+&lt;/td&gt;
&lt;td&gt;Full color, multi-variant per brand&lt;/td&gt;
&lt;td&gt;Free, no attribution&lt;/td&gt;
&lt;td&gt;Brand dashboards, logo walls, marketing pages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you need an aggregator across 200+ icon sets (Lucide, Heroicons, Material, plus brand sets), &lt;a href="https://iconify.design" rel="noopener noreferrer"&gt;Iconify&lt;/a&gt; is the long-tail option. It is not in the same category as the three above, so I leave it out of the head-to-head.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Simple Icons
&lt;/h2&gt;

&lt;p&gt;Simple Icons is the historical default for free SVG brand icons. Three thousand plus brands, all monochrome, CC0 license. You import what you need.&lt;/p&gt;

&lt;p&gt;The catch hides in the word "monochrome." Every icon ships as a single-fill SVG. Stripe renders purple in real life. Visa is blue and yellow. Slack uses four colors. Simple Icons gives you the silhouette and a hex code in the metadata. You apply the brand color yourself.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;siStripe&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;simple-icons/icons&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;StripeIcon&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;size&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;svg&lt;/span&gt; &lt;span class="na"&gt;width&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="na"&gt;viewBox&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"0 0 24 24"&lt;/span&gt; &lt;span class="na"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;`#&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;siStripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;hex&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;path&lt;/span&gt; &lt;span class="na"&gt;d&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;siStripe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;svg&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That works for footer rows where everything renders at one size with one color. It does not work when a designer asks for the actual Stripe gradient, or when you need both the wordmark and the symbol.&lt;/p&gt;

&lt;p&gt;The bundle question matters too. The full Simple Icons npm package weighs around 13 MB unpacked. Modern bundlers strip what you do not import, but a Next.js build can pull in the whole package because of a barrel file. Import per-icon paths to be safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. svgl
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://svgl.app" rel="noopener noreferrer"&gt;svgl&lt;/a&gt; is a curated collection of brand SVGs that has gained traction over the past two years. As of writing, the site shows 657 logos across categories like AI, Analytics, Browser, Database, Devtool, Framework, Hosting, Payment, and Social.&lt;/p&gt;

&lt;p&gt;The strength is curation. Every entry looks production-ready. The Adobe entries, for example, ship both color and grayscale variants, so you can match the surrounding UI without hunting for an alternate file. The MIT license keeps usage simple.&lt;/p&gt;

&lt;p&gt;The tradeoff is breadth. 657 logos cover popular SaaS, dev tools, and consumer brands, but if your fintech dashboard needs every regional bank or every payment processor in a niche market, you will hit gaps. svgl also leans on a copy-from-site model rather than a heavy npm install, which is good for bundle size and works as long as you have network at build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. thesvg.org
&lt;/h2&gt;

&lt;p&gt;Disclosure: thesvg.org is my project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://thesvg.org" rel="noopener noreferrer"&gt;thesvg.org&lt;/a&gt; is a free brand SVG library focused on coverage and clean source files. The current catalog is 5600+ icons, with several variants per brand where the brand ships them (typically a wordmark, a symbol, and a monochrome version). License: free, no attribution required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;thesvg
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Browse and download from &lt;a href="https://thesvg.org" rel="noopener noreferrer"&gt;thesvg.org&lt;/a&gt;. The site supports category filtering (AI, Analytics, Browser, Database, Devtool, Framework, Hosting, Payment, Social, plus dozens more) and alphabetical sort, so finding a specific brand takes seconds.&lt;/p&gt;

&lt;p&gt;The pitch is breadth at a strong fidelity bar. 5600+ vs 657 is the headline number. The structural choice that matters for production work is consistent variant coverage: when you build a logo wall, you want every brand to ship a symbol and a wordmark in the same SVG style, not a mix of "we got this one as a square avatar and that one as a horizontal logo."&lt;/p&gt;

&lt;p&gt;The tradeoff is curation overhead. Adding a brand correctly means sourcing the original, optimizing through SVGO with a per-brand config, naming the variants, and verifying the output renders the same on dark and light backgrounds. The library trades faster growth for cleaner files.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. The structural comparison
&lt;/h2&gt;

&lt;p&gt;I will not invent benchmarks I have not run. Instead, here is how the three stack up structurally on the dimensions that actually matter on a real project.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Simple Icons&lt;/th&gt;
&lt;th&gt;svgl&lt;/th&gt;
&lt;th&gt;thesvg.org&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Smallest bundle (monochrome)&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best curation per icon&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Most full-color brands&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Variant coverage (symbol + wordmark + mono)&lt;/td&gt;
&lt;td&gt;n/a&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Easiest CDN drop-in&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;td&gt;Wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License simplicity&lt;/td&gt;
&lt;td&gt;CC0&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Free, no attribution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you want hard byte numbers, run the same component three ways on your own build. Tree-shaking and your framework choice matter more than what any blog post claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The license question nobody reads
&lt;/h2&gt;

&lt;p&gt;This part bit me on a freelance project two years ago. Free does not mean unrestricted.&lt;/p&gt;

&lt;p&gt;Simple Icons is CC0 for the SVG paths. The brand names and logos themselves remain trademarks of their owners. CC0 covers the SVG file, not the trademark right to use it.&lt;/p&gt;

&lt;p&gt;svgl is MIT for the file collection, with the same trademark caveat per brand.&lt;/p&gt;

&lt;p&gt;thesvg.org is free with no attribution required for the file collection, again with trademark rules applying per brand.&lt;/p&gt;

&lt;p&gt;Practical rule: free SVG brand icons libraries give you the artwork, not the right to imply endorsement. If you build a "trusted by" marketing page or a competitor comparison, check the brand's actual logo guidelines. Most companies allow editorial and "as a customer" use, but not rebranding.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. When to pick which
&lt;/h2&gt;

&lt;p&gt;Three patterns I keep falling back to.&lt;/p&gt;

&lt;p&gt;If you need a footer row of twenty social or tech logos, all monochrome, all the same size, Simple Icons is the right answer. Smallest bundle, simplest API.&lt;/p&gt;

&lt;p&gt;If you want a curated set of around 600 popular brands and value tight curation over raw count, svgl is the right answer. Strong category organization, MIT license, easy CDN drop-in.&lt;/p&gt;

&lt;p&gt;If you need the broadest catalog of full-color brand SVGs with consistent variant coverage, &lt;a href="https://thesvg.org" rel="noopener noreferrer"&gt;thesvg.org&lt;/a&gt; is what I built for that case. 5600+ icons, multi-variant, free with no attribution.&lt;/p&gt;

&lt;p&gt;If you need icons across 200+ sets and brands matter only as part of a wider icon system, look at Iconify as a separate option. It serves a different search intent and a different bundling story.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bottom line
&lt;/h2&gt;

&lt;p&gt;No single library wins every bracket because the brackets are different. Match the tool to the job. Audit your trademark exposure before launch. And do not assume tree-shaking will save you, check the bundle output of your own build.&lt;/p&gt;

&lt;p&gt;What is your default brand icon library, and what made you switch? Drop a comment with your stack.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GDS K S&lt;/strong&gt; · &lt;a href="https://thegdsks.com" rel="noopener noreferrer"&gt;thegdsks.com&lt;/a&gt; · building &lt;a href="https://thesvg.org" rel="noopener noreferrer"&gt;thesvg.org&lt;/a&gt; and &lt;a href="https://glincker.com" rel="noopener noreferrer"&gt;Glincker&lt;/a&gt; · follow on X &lt;a href="https://x.com/thegdsks" rel="noopener noreferrer"&gt;@thegdsks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Free SVG brand icons are a solved problem only if you pick the right library for the bracket you are in.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>design</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>javascript</category>
    </item>
    <item>
      <title>I rewrote my auth library to run on Cloudflare Workers. Here is what broke.</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:48:36 +0000</pubDate>
      <link>https://dev.to/thegdsks/i-rewrote-my-auth-library-to-run-on-cloudflare-workers-here-is-what-broke-5ceh</link>
      <guid>https://dev.to/thegdsks/i-rewrote-my-auth-library-to-run-on-cloudflare-workers-here-is-what-broke-5ceh</guid>
      <description>&lt;p&gt;Most TypeScript auth libraries assume Node.js. They reach for &lt;code&gt;crypto.randomBytes&lt;/code&gt;, &lt;code&gt;Buffer&lt;/code&gt;, the Node &lt;code&gt;fs&lt;/code&gt; module, sometimes &lt;code&gt;process.env&lt;/code&gt; directly. That works on Vercel serverless, AWS Lambda with the Node runtime, Railway, Fly. It does not work on Cloudflare Workers.&lt;/p&gt;

&lt;p&gt;Workers do not have Node. They have Web APIs. &lt;code&gt;crypto&lt;/code&gt; means Web Crypto, not the Node &lt;code&gt;crypto&lt;/code&gt; module. &lt;code&gt;Buffer&lt;/code&gt; is gone. &lt;code&gt;fs&lt;/code&gt; is gone. &lt;code&gt;process.env&lt;/code&gt; does not exist. Bindings are injected into a request handler.&lt;/p&gt;

&lt;p&gt;I rewrote KavachOS to run on Workers in February. Here is what I had to change, in case you are going through the same migration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why bother
&lt;/h2&gt;

&lt;p&gt;Workers are cheap. They run close to the user. They cold-start in under 5ms. For an auth library that gets called on every authenticated request, that latency floor matters. If your auth library adds 80ms of cold start every time someone hits an endpoint, your app feels slow even when the actual logic is fast.&lt;/p&gt;

&lt;p&gt;Most AI agent infrastructure is also moving to the edge. MCP servers, agent runtimes, function-call handlers, they all want to be near the user. If your auth layer cannot follow them there, it becomes the slow link.&lt;/p&gt;

&lt;h2&gt;
  
  
  Buffer is gone. Use Uint8Array.
&lt;/h2&gt;

&lt;p&gt;This was the biggest change. I had &lt;code&gt;Buffer.from(x)&lt;/code&gt; in maybe 60 places. All of it had to go.&lt;/p&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;utf-8&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextEncoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;padStart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is more verbose. It is the price of running on Web APIs.&lt;/p&gt;

&lt;p&gt;For base64 I added a small helper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;bytesToBase64Url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;binary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;binary&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromCharCode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;btoa&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;binary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\+&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\/&lt;/span&gt;&lt;span class="sr"&gt;/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/=+$/&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That replaced every &lt;code&gt;Buffer.from(bytes).toString("base64url")&lt;/code&gt; call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Node crypto out, Web Crypto in
&lt;/h2&gt;

&lt;p&gt;Node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;randomBytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;createHash&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node:crypto&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;randomBytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createHash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sha256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Web Crypto:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getRandomValues&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytesToHex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextEncoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hashBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subtle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SHA-256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hash&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bytesToHex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hashBuffer&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Web Crypto is async. Node &lt;code&gt;createHash&lt;/code&gt; is sync. That cascaded through the library because functions that called &lt;code&gt;createHash&lt;/code&gt; had to become async too. About 20 internal helpers gained an &lt;code&gt;await&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For JWT signing I switched from &lt;code&gt;jsonwebtoken&lt;/code&gt; (Node-only) to &lt;code&gt;jose&lt;/code&gt;, which has a Web Crypto path. Both work the same way at the API surface, but &lt;code&gt;jose&lt;/code&gt; runs on Workers, Bun, and Deno without a polyfill.&lt;/p&gt;

&lt;h2&gt;
  
  
  SQLite to D1
&lt;/h2&gt;

&lt;p&gt;Cloudflare D1 is SQLite, but the API is different. You cannot pass a &lt;code&gt;url&lt;/code&gt; string. You pass a &lt;code&gt;binding&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKavach&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;d1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// injected by the Workers runtime&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I added a &lt;code&gt;d1&lt;/code&gt; provider next to the existing &lt;code&gt;sqlite&lt;/code&gt; provider. They share most of the schema and migration code. The difference is the prepared statement API. D1 uses &lt;code&gt;db.prepare(sql).bind(...).run()&lt;/code&gt; instead of the &lt;code&gt;better-sqlite3&lt;/code&gt; style &lt;code&gt;db.prepare(sql).run(...)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The migration utility had to ship two paths. On Node, it reads the schema file from disk. On Workers, the schema is bundled at build time as a string. The library imports it conditionally based on which provider is active.&lt;/p&gt;

&lt;h2&gt;
  
  
  process.env is gone
&lt;/h2&gt;

&lt;p&gt;Workers do not have &lt;code&gt;process.env&lt;/code&gt;. Bindings come in on the request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKavach&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AUTH_SECRET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;d1&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;binding&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DB&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I had to push every secret read into a config object instead of having the library reach for &lt;code&gt;process.env.X&lt;/code&gt; internally. This was good for testing too. It made the library easier to use in tests where you do not want real env vars.&lt;/p&gt;

&lt;h2&gt;
  
  
  TypeScript 5.8 ArrayBuffer / Uint8Array typing
&lt;/h2&gt;

&lt;p&gt;This was the most annoying change. After upgrading to TypeScript 5.8, code like this stopped compiling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subtle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SHA-256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;crypto.subtle.digest&lt;/code&gt; returns &lt;code&gt;ArrayBuffer&lt;/code&gt;, not &lt;code&gt;Uint8Array&lt;/code&gt;. You have to wrap it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subtle&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SHA-256&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There were 30 places where I was implicitly casting. Each one needed an explicit &lt;code&gt;new Uint8Array(...)&lt;/code&gt; constructor.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I gained
&lt;/h2&gt;

&lt;p&gt;Bundle size dropped from 240KB to 90KB after pulling out Node-only dependencies. Cold start on Workers is around 4ms for a typed handler. The library now runs on Workers, Bun, Deno, and Node from the same source tree, with no polyfills.&lt;/p&gt;

&lt;p&gt;The "no Node-specific APIs" rule is also why KavachOS works in Vercel Edge runtime, AWS Lambda@Edge, Netlify Edge, and Deno Deploy. One auth library, every edge runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I would skip if I did this again
&lt;/h2&gt;

&lt;p&gt;I spent two days on a custom AES-GCM helper before realizing Web Crypto already has it. Read the Web Crypto API docs first. Most of what you reach for in Node &lt;code&gt;crypto&lt;/code&gt; has a direct counterpart. You just have to learn the new API surface.&lt;/p&gt;

&lt;p&gt;I also wrote my own &lt;code&gt;bytesToBase64&lt;/code&gt; before realizing &lt;code&gt;btoa&lt;/code&gt; works fine for ASCII. Use the platform.&lt;/p&gt;

&lt;p&gt;If you are migrating and want a reference, KavachOS is open source. The PR that introduced D1 support is small enough to read in 10 minutes. The PR that switched away from &lt;code&gt;node:crypto&lt;/code&gt; is bigger but documented commit by commit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/kavachos/kavachos
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;If you have ported a Node library to Workers or Bun, what was the gnarliest API surface you ran into? &lt;code&gt;Buffer&lt;/code&gt; was easy in retrospect. &lt;code&gt;process.env&lt;/code&gt; and the async cascade from Web Crypto cost me real days.&lt;/p&gt;

</description>
      <category>cloudflare</category>
      <category>typescript</category>
      <category>webdev</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Adding OAuth 2.1 to your MCP server in TypeScript</title>
      <dc:creator>GDS K S</dc:creator>
      <pubDate>Wed, 29 Apr 2026 17:40:05 +0000</pubDate>
      <link>https://dev.to/thegdsks/adding-oauth-21-to-your-mcp-server-in-typescript-4ap9</link>
      <guid>https://dev.to/thegdsks/adding-oauth-21-to-your-mcp-server-in-typescript-4ap9</guid>
      <description>&lt;p&gt;If you're building an MCP server, sooner or later someone is going to ask: how does authentication work?&lt;/p&gt;

&lt;p&gt;The MCP spec leaves this open. Most early servers shipped with no auth at all, or a hardcoded API key in an environment variable. That's fine for local Claude Desktop use. It falls apart the moment you publish a remote MCP server that real users connect to.&lt;/p&gt;

&lt;p&gt;The right answer is OAuth 2.1 with PKCE, plus four RFCs that nobody enjoys reading: 9728 (Protected Resource Metadata), 8707 (Resource Indicators), 8414 (Authorization Server Metadata), and 7591 (Dynamic Client Registration). I know that sounds like a lot. Let me show you what it looks like in practice.&lt;/p&gt;

&lt;p&gt;I'll use KavachOS, the auth library I built for AI agents. You don't have to use it. The point of this post is to show what a compliant MCP OAuth setup actually requires, and why each piece exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you'll build
&lt;/h2&gt;

&lt;p&gt;A Hono-based MCP server that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exposes &lt;code&gt;/.well-known/oauth-protected-resource&lt;/code&gt; (RFC 9728)&lt;/li&gt;
&lt;li&gt;handles dynamic client registration so Claude Desktop can register itself (RFC 7591)&lt;/li&gt;
&lt;li&gt;runs the authorization code flow with PKCE S256 (OAuth 2.1)&lt;/li&gt;
&lt;li&gt;issues access tokens scoped to a specific resource indicator (RFC 8707)&lt;/li&gt;
&lt;li&gt;validates incoming MCP requests against those tokens
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client                Auth Server          MCP Server
  │                       │                    │
  │  GET .well-known/...  │                    │
  ├──────────────────────────────────────────► │
  │  401 + auth metadata pointer               │
  │ ◄──────────────────────────────────────────┤
  │                       │                    │
  │  POST /register       │                    │
  ├──────────────────────►│                    │
  │  client_id            │                    │
  │ ◄─────────────────────┤                    │
  │                       │                    │
  │  /authorize (PKCE)    │                    │
  │ ◄────────────────────►│                    │
  │  code                 │                    │
  │ ◄─────────────────────┤                    │
  │                       │                    │
  │  /token (code+verif)  │                    │
  ├──────────────────────►│                    │
  │  access_token         │                    │
  │ ◄─────────────────────┤                    │
  │                       │                    │
  │  POST /mcp + bearer   │                    │
  ├──────────────────────────────────────────► │
  │                       │   200 + result     │
  │ ◄──────────────────────────────────────────┤
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;kavachos @kavachos/hono hono
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;index.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createKavach&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavachos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;mcpOAuth&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavachos/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createHonoAdapter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@kavachos/hono&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Hono&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hono&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;kavach&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createKavach&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sqlite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavach.db&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nf"&gt;mcpOAuth&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://your-mcp-server.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://your-mcp-server.com/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Hono&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/auth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;createHonoAdapter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;kavach&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the auth surface. You still need an MCP handler. Wire it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;requireToken&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;kavachos&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nf"&gt;requireToken&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;mcp:tools&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="c1"&gt;// route the MCP request to your tool implementation&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;requireToken&lt;/code&gt; validates the bearer, checks scope, and attaches the agent identity to the request context. If the token is missing, expired, revoked, or has the wrong scope, the middleware rejects the request before your handler runs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What each RFC actually does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  RFC 9728: Protected Resource Metadata
&lt;/h3&gt;

&lt;p&gt;When Claude Desktop tries to connect to a remote MCP server, it does not know where the OAuth endpoints are. RFC 9728 fixes that. You expose a &lt;code&gt;.well-known/oauth-protected-resource&lt;/code&gt; endpoint that points the client at the authorization server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /.well-known/oauth-protected-resource
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://your-mcp-server.com/mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorization_servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"https://your-mcp-server.com"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scopes_supported"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"mcp:tools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp:resources"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, Claude Desktop has no way to discover the auth flow. The connection just fails with a confusing error.&lt;/p&gt;

&lt;h3&gt;
  
  
  RFC 7591: Dynamic Client Registration
&lt;/h3&gt;

&lt;p&gt;You do not want every MCP user filing a ticket to get a &lt;code&gt;client_id&lt;/code&gt;. RFC 7591 lets clients register themselves at runtime. Claude Desktop calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;POST /register
Content-Type: application/json

{ "redirect_uris": ["http://localhost:3334/callback"] }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You return a &lt;code&gt;client_id&lt;/code&gt; (and optionally a &lt;code&gt;client_secret&lt;/code&gt;). The client uses it for the auth code flow. With this in place, your MCP install is one command from the user's side, not a support email.&lt;/p&gt;

&lt;h3&gt;
  
  
  OAuth 2.1 with PKCE S256
&lt;/h3&gt;

&lt;p&gt;OAuth 2.1 is essentially OAuth 2.0 minus the mistakes. PKCE is mandatory. The implicit flow does not exist anymore. Public clients must rotate refresh tokens.&lt;/p&gt;

&lt;p&gt;The flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Client generates a &lt;code&gt;code_verifier&lt;/code&gt; (random) and a &lt;code&gt;code_challenge&lt;/code&gt; (SHA-256 of the verifier, base64url-encoded)&lt;/li&gt;
&lt;li&gt;Client redirects to &lt;code&gt;/authorize?code_challenge=...&amp;amp;code_challenge_method=S256&amp;amp;...&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;User approves&lt;/li&gt;
&lt;li&gt;Client receives an authorization code at the redirect URI&lt;/li&gt;
&lt;li&gt;Client exchanges the code plus the original &lt;code&gt;code_verifier&lt;/code&gt; at &lt;code&gt;/token&lt;/code&gt; for an access token&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;KavachOS handles this end to end. You write zero lines of OAuth code.&lt;/p&gt;

&lt;h3&gt;
  
  
  RFC 8707: Resource Indicators
&lt;/h3&gt;

&lt;p&gt;Without resource indicators, an access token is valid for "anything this auth server protects." That is a recipe for token leakage. RFC 8707 binds a token to a specific resource (your MCP server URL).&lt;/p&gt;

&lt;p&gt;When Claude Desktop requests a token, it includes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;https://your-mcp-server.com/mcp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The token comes back with that resource baked in. Your MCP server validates it. A token meant for one MCP server will not work against another. This matters more than people realize once a Claude user juggles ten MCP servers from one client.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In another terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Discover the auth server&lt;/span&gt;
curl http://localhost:3000/.well-known/oauth-protected-resource

&lt;span class="c"&gt;# 2. Register a client&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/register &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"redirect_uris":["http://localhost:3334/callback"]}'&lt;/span&gt;

&lt;span class="c"&gt;# 3. Walk the auth code flow&lt;/span&gt;
&lt;span class="c"&gt;# The MCP Inspector from Anthropic does this end to end:&lt;/span&gt;
&lt;span class="c"&gt;# https://github.com/modelcontextprotocol/inspector&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your server passes the inspector, Claude Desktop connects cleanly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get
&lt;/h2&gt;

&lt;p&gt;Once the server ships:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An audit log of which agent called which tool, with which scope, at which timestamp&lt;/li&gt;
&lt;li&gt;Per-agent revocation, so you can kill one client without affecting others&lt;/li&gt;
&lt;li&gt;Rate limits per agent, not per IP&lt;/li&gt;
&lt;li&gt;A path to enterprise SSO later, since KavachOS supports SAML and OIDC providers as the upstream identity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common pitfalls I keep seeing
&lt;/h2&gt;

&lt;p&gt;A few things I have watched teams trip on while wiring this up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skipping the &lt;code&gt;.well-known&lt;/code&gt; discovery endpoint.&lt;/strong&gt; Without it, Claude Desktop has no idea where to send the user. It returns a generic "could not connect" error and the user blames the MCP server. This endpoint costs you four lines of code; ship it first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardcoding &lt;code&gt;client_id&lt;/code&gt; for testing, then forgetting to swap it for dynamic registration.&lt;/strong&gt; The single-tenant test setup looks identical to a production setup until the second user shows up and steamrolls the first user's client. Add dynamic registration before you share the server with anyone else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ignoring the &lt;code&gt;resource&lt;/code&gt; indicator.&lt;/strong&gt; I have seen teams treat OAuth as a generic login layer and issue tokens with no resource binding. The token then works against any MCP server in the same auth realm, which means a compromised server gets credentials valid against every other server you protect. Always bind tokens to the specific resource URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not setting up token revocation early.&lt;/strong&gt; Most teams add it after the first incident. By then they have already issued thousands of tokens, none of which they can recall cleanly. The KavachOS revocation API runs in milliseconds. Wire it before you need it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Skipping audit logs because "we will add observability later."&lt;/strong&gt; When the first agent does something weird, the only way to find out which token it used is the audit log. Six lines of middleware now save you a week of forensic SQL later.&lt;/p&gt;

&lt;h2&gt;
  
  
  A note on where MCP auth is going
&lt;/h2&gt;

&lt;p&gt;The MCP spec is still moving. Recent drafts formalize the discovery flow and add elicitation (asking the user to confirm an action mid-flow). KavachOS tracks the spec, so updates land in the SDK without changes on your side.&lt;/p&gt;

&lt;p&gt;If you want to skip the build and try it, install is one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-kavachos-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source: &lt;a href="https://github.com/kavachos/kavachos" rel="noopener noreferrer"&gt;github.com/kavachos/kavachos&lt;/a&gt;. MIT.&lt;/p&gt;




&lt;p&gt;If you ship an MCP server, what's the auth setup you actually use today: hardcoded key, an Auth0/Clerk pass-through, or something custom? Curious which trade-offs you've made and where it's biting you.&lt;/p&gt;

</description>
      <category>tutorial</category>
      <category>oauth</category>
      <category>typescript</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
