<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Taha Baş</title>
    <description>The latest articles on DEV Community by Taha Baş (@rtahabas).</description>
    <link>https://dev.to/rtahabas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3996401%2Fe1dcf932-0716-44d6-ac3c-0bff4ec8f90f.jpg</url>
      <title>DEV Community: Taha Baş</title>
      <link>https://dev.to/rtahabas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rtahabas"/>
    <language>en</language>
    <item>
      <title>Observability told me exactly how much money my agents wasted. I wanted something that says no.</title>
      <dc:creator>Taha Baş</dc:creator>
      <pubDate>Mon, 22 Jun 2026 07:49:25 +0000</pubDate>
      <link>https://dev.to/rtahabas/observability-told-me-exactly-how-much-money-my-agents-wasted-i-wanted-something-that-says-no-4176</link>
      <guid>https://dev.to/rtahabas/observability-told-me-exactly-how-much-money-my-agents-wasted-i-wanted-something-that-says-no-4176</guid>
      <description>&lt;p&gt;Most AI cost tooling is an autopsy. It tells you, in detail, what you already spent — token counts, per-call traces, a&lt;br&gt;
  dashboard that turns red after the bill is locked in. None of it does the one thing I kept wanting: refuse the call before&lt;br&gt;
  it goes out.&lt;/p&gt;

&lt;p&gt;I ran into this building agent tooling. Once I had more than a couple of agents hitting paid APIs on a schedule, two&lt;br&gt;
  problems showed up that nothing off the shelf solved cleanly.&lt;/p&gt;

&lt;p&gt;Problem 1: observability is not control&lt;/p&gt;

&lt;p&gt;Watching spend and stopping spend are different systems, and every tool I tried lived on the watching side. I could&lt;br&gt;
  reconstruct, after the fact, that agent 4 had a bad night. What I couldn't do was tell agent 4 "you're done for today"&lt;br&gt;
  without a hard limit that fires before the request leaves.&lt;/p&gt;

&lt;p&gt;The closest thing providers offer is per-key budgeting. That sounds right until you run more than one agent. Keys get&lt;br&gt;
  shared, and the moment three agents share an API key a per-key cap can't tell them apart — you've lost the unit that&lt;br&gt;
  actually matters, which is the agent.&lt;/p&gt;

&lt;p&gt;So the cap I wanted was specific:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;per agent, not per key&lt;/li&gt;
&lt;li&gt;enforced in the request path — over budget means the call is refused before it goes out, not logged after it returns&lt;/li&gt;
&lt;li&gt;two dimensions: calls/day and a max per single call&lt;/li&gt;
&lt;li&gt;a kill-switch on call-rate spikes, because the runaway-loop case is the one that hurts at 3am&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Problem 2: I didn't want to hand over my keys&lt;/p&gt;

&lt;p&gt;Plenty of "AI gateway" products will do governance for you — by becoming the thing that holds your API keys and signs&lt;br&gt;
  requests on your behalf. For a fleet that touches real money, handing custody of credentials to a third party is a hard no.&lt;br&gt;
  I wanted enforcement without custody: keep my own keys, let something in front of the fleet enforce the rules.&lt;/p&gt;

&lt;p&gt;What I ended up building&lt;/p&gt;

&lt;p&gt;Couldn't find a drop-in that did per-agent, request-path enforcement without taking custody, so I built one. It's a proxy&lt;br&gt;
  you point agents at. They keep their own keys. No rewrite, no framework lock-in — LangChain, CrewAI, or a raw script all&lt;br&gt;
  talk to the same proxy.&lt;/p&gt;

&lt;p&gt;The integration is boring on purpose:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;import { createPaymentClient } from "@gatewards/agent-sdk";&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;const client = createPaymentClient({&lt;br&gt;
    apiKey: process.env.GATEWARDS_AGENT_KEY, // identifies THIS agent&lt;br&gt;
    proxy: true,&lt;br&gt;
  });&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;// your agent's calls go through the proxy unchanged&lt;br&gt;
  const res = await client.get("https://api.example.com/data");&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;You set the cap per agent (calls/day + max per call). When an agent goes over, the proxy returns a refusal in the request&lt;br&gt;
  path — your call gets a 429, not a silent overage you discover tomorrow. When an agent's rate spikes into loop territory,&lt;br&gt;
  the pipeline auto-pauses instead of grinding through your budget.&lt;/p&gt;

&lt;p&gt;Because every call is already tagged by agent identity, attribution stops being a grep session. You get "which agent spent&lt;br&gt;
  what" for free, as a side effect of the thing that enforces the caps.&lt;/p&gt;

&lt;p&gt;The one that surprised me: cross-agent dedup&lt;/p&gt;

&lt;p&gt;This one I didn't plan for. Several agents poll the same endpoints — same GET, same params, different agents. The proxy&lt;br&gt;
  caches identical GET responses across the whole fleet, so five agents making the same call pay for one. On a polling-heavy&lt;br&gt;
  fleet that turned out to be a bigger line-item win than the caps.&lt;/p&gt;

&lt;p&gt;What it deliberately doesn't do&lt;/p&gt;

&lt;p&gt;Honesty matters more than a clean pitch, so the limits up front:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It doesn't estimate dollar caps. Caps are calls/day and max-per-call, not "$5/day". Estimating real-time per-call cost
across arbitrary upstream APIs is a guess, and I'd rather give you a primitive that's exact than a dollar figure that's
wrong. If you genuinely need a $ cap, I want to hear it — that's an open design question for me.&lt;/li&gt;
&lt;li&gt;Dedup is GET-only by default. POST caching is opt-in per pipeline, because deduping a non-idempotent call is how you ship
a bug.&lt;/li&gt;
&lt;li&gt;It's a proxy in your request path. That's a dependency. It's built to fail open on its own errors rather than take your
fleet down, but you should know it's there.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it is&lt;/p&gt;

&lt;p&gt;It's live at &lt;a href="https://gatewards.com/" rel="noopener noreferrer"&gt;gatewards.com&lt;/a&gt;, and the SDK is open source (Apache-2.0): &lt;strong&gt;npm i @gatewards/agent-sdk&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you're running a fleet and fighting the same thing, I'd genuinely like to compare notes — especially on the cap-primitive&lt;br&gt;
  question. Is calls/day + max-per-call enough, or does the lack of a dollar cap break it for you? Tell me where this falls&lt;br&gt;
  short.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>devops</category>
      <category>typescript</category>
    </item>
  </channel>
</rss>
