<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ahmad ammar</title>
    <description>The latest articles on DEV Community by Ahmad ammar (@ahmadammar).</description>
    <link>https://dev.to/ahmadammar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4008655%2F02a38ff7-7a96-4951-ace5-49b80fbf78ab.jpeg</url>
      <title>DEV Community: Ahmad ammar</title>
      <link>https://dev.to/ahmadammar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ahmadammar"/>
    <language>en</language>
    <item>
      <title>Multi-agent fleets burn ~15x the tokens — here's the budget layer the playbooks skip</title>
      <dc:creator>Ahmad ammar</dc:creator>
      <pubDate>Mon, 29 Jun 2026 19:41:26 +0000</pubDate>
      <link>https://dev.to/ahmadammar/multi-agent-fleets-burn-15x-the-tokens-heres-the-budget-layer-the-playbooks-skip-4i9e</link>
      <guid>https://dev.to/ahmadammar/multi-agent-fleets-burn-15x-the-tokens-heres-the-budget-layer-the-playbooks-skip-4i9e</guid>
      <description>&lt;p&gt;Give ten agents a shared, metered tool — a paid search or research API where every call is real&lt;br&gt;
money — and you've handed ten of them the same company credit card. Each one reasons "I'll just run&lt;br&gt;
a quick search." You find out the total on the invoice.&lt;/p&gt;

&lt;p&gt;Anthropic's own multi-agent write-up clocks a fleet at roughly 15x the tokens of a single chat.&lt;br&gt;
That's the token pool. The paid external tools are the line item the orchestration playbooks skip —&lt;br&gt;
and it's the one that shows up in actual dollars. I run four production projects on Claude Code with&lt;br&gt;
no external agent framework, and this is the pattern that keeps that line item from surprising me.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix that doesn't work
&lt;/h2&gt;

&lt;p&gt;You can't solve this by asking agents to be frugal. "Be mindful of the budget" in a prompt is a&lt;br&gt;
wish, and an LLM will sail past "max 8 searches" the moment the task still feels unfinished. If your&lt;br&gt;
budget lives in the prompt, you don't have a budget — you have a hope.&lt;/p&gt;

&lt;h2&gt;
  
  
  The split most write-ups collapse
&lt;/h2&gt;

&lt;p&gt;Cost governance is two layers, and conflating them is why prompt-level "budgets" fail:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Enforcement is deterministic and lives below the model.&lt;/strong&gt; A hard counter in the harness owns the&lt;br&gt;
paid credential and denies the call after N. The model can't argue with it, jailbreak it, or sneak&lt;br&gt;
"just one more search" past it. This is what actually bounds the dollars — and it's plain middleware,&lt;br&gt;
not an agent. A dumb metered proxy in front of the API does this fine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Judgment is the half a proxy can't do.&lt;/strong&gt; A gateway can cap spend, but it can't decide whether&lt;br&gt;
&lt;em&gt;this task deserves to spend at all&lt;/em&gt;. That decision is model-shaped, and it's where the agent earns&lt;br&gt;
its place:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A novelty-gate.&lt;/strong&gt; Most tasks don't qualify to spend: CRUD, mechanical edits, known facts → zero
research. The biggest budget win isn't a smaller cap — it's that the majority of work never
reaches the paid tool. A proxy can't make that call; it has no notion of "is this architectural or
trivial." The agent does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A tier + cap per qualifying call&lt;/strong&gt; (quick-fact vs deep-dive), declared as policy that the
harness then enforces mechanically — the agent proposes the tier, the counter imposes the limit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honest degradation.&lt;/strong&gt; When the paid source fails, the agent falls back to a free one and flags
the reliability downgrade in its output — so a weaker citation never silently poses as
authoritative.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the division is clean: &lt;strong&gt;the gateway enforces, the agent judges.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The novel part isn't "centralize the credential" — that's least-privilege, decades old, and&lt;br&gt;
centralizing alone doesn't cap anything (N callers can still demand N searches through one door). The&lt;br&gt;
leverage is pairing a deterministic cap with model-shaped judgment about &lt;em&gt;whether to spend at all&lt;/em&gt; —&lt;br&gt;
with the gate defaulting to "no."&lt;/p&gt;

&lt;h2&gt;
  
  
  The tradeoff nobody mentions
&lt;/h2&gt;

&lt;p&gt;If the budget is a single shared number across parallel agents, you're holding a distributed-counter&lt;br&gt;
problem: two agents can both read "budget remaining," both decide they're clear, and both spend —&lt;br&gt;
blowing the cap. You have two honest options. The simplest is to serialize every paid call through&lt;br&gt;
the one budgeted agent: correct and easy to reason about, but that agent becomes a throughput&lt;br&gt;
bottleneck for the whole fleet. The other is an atomic reservation — each call reserves its slice of&lt;br&gt;
the quota &lt;em&gt;before&lt;/em&gt; spending and releases the remainder after, which keeps agents parallel at the cost&lt;br&gt;
of a little more plumbing. For metered money, paying either price beats discovering the race&lt;br&gt;
condition on the invoice — just don't pretend it isn't there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Most "agent budget" advice optimizes the cap. The leverage is one level up: a gate that makes most&lt;br&gt;
work never spend at all, and a hard counter the model can't talk its way past. Stop asking your fleet&lt;br&gt;
to be frugal. Give exactly one agent a budget the harness enforces — and the judgment to know when&lt;br&gt;
not to use it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
