<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ADARSH PRASHAR</title>
    <description>The latest articles on DEV Community by ADARSH PRASHAR (@prashar32).</description>
    <link>https://dev.to/prashar32</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3971653%2Fe56245e3-6baa-4e60-8815-94a62ec4e9b8.jpg</url>
      <title>DEV Community: ADARSH PRASHAR</title>
      <link>https://dev.to/prashar32</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prashar32"/>
    <language>en</language>
    <item>
      <title>The $47K agent loop: why logging, monitoring, and max_tokens all failed to stop it</title>
      <dc:creator>ADARSH PRASHAR</dc:creator>
      <pubDate>Sun, 07 Jun 2026 15:28:36 +0000</pubDate>
      <link>https://dev.to/prashar32/the-47k-agent-loop-why-logging-monitoring-and-maxtokens-all-failed-to-stop-it-19ch</link>
      <guid>https://dev.to/prashar32/the-47k-agent-loop-why-logging-monitoring-and-maxtokens-all-failed-to-stop-it-19ch</guid>
      <description>&lt;p&gt;In November 2025, four AI agents ran for eleven days and produced a $47,000 bill.&lt;/p&gt;

&lt;p&gt;You've probably seen the story. A market-research pipeline: four LangChain agents coordinating over A2A. Two of them — an Analyzer and a Verifier — started ping-ponging. The Analyzer produced analysis, the Verifier asked for more, the Analyzer produced more. No termination condition, no budget cap. Eleven days later the invoice showed up.&lt;/p&gt;

&lt;p&gt;The number is what makes it travel. But the number is not the interesting part. This is the interesting part, from the post-mortem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;They had logging. They had monitoring. They did not have a hard limit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Sit with that. The team was not flying blind. The data was all there — every call, every token, every dollar, streaming into a dashboard the whole time. None of it mattered, because &lt;strong&gt;observability is a witness, not a circuit breaker.&lt;/strong&gt; It can tell you the building is on fire. It cannot close the gas valve.&lt;/p&gt;

&lt;p&gt;I want to walk through &lt;em&gt;why&lt;/em&gt; the usual defenses don't catch this, and what the thing that actually catches it has to look like. Not the product pitch — the mechanism. If you run agents in production, this is the failure mode that should keep you up at night, and it's more structural than it looks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The failure mode: a thousand perfectly valid calls
&lt;/h2&gt;

&lt;p&gt;A runaway agent is not one big anomalous request. It's a thousand small, individually reasonable ones.&lt;/p&gt;

&lt;p&gt;Every single call the Analyzer and Verifier made was well-formed. Each was under its &lt;code&gt;max_tokens&lt;/code&gt;. Each returned a 200. Each looked, in isolation, exactly like a healthy agent doing its job. The pathology only exists at the level of the &lt;em&gt;run&lt;/em&gt; — the loop that never closes — and almost nothing in a normal stack is watching at that level.&lt;/p&gt;

&lt;p&gt;That's why the obvious guards slide right off it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;max_tokens&lt;/code&gt; is per-call, not per-run.&lt;/strong&gt; It bounds the size of one response. It has nothing to say about ten thousand responses. A loop is the size of a tweet, ten thousand times.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost dashboards are post-hoc.&lt;/strong&gt; They render spend after the calls have already happened and the money is already gone. By design they trail reality. The fire has to start before the smoke detector has anything to detect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alerts need a human in the loop, awake, watching.&lt;/strong&gt; "Spend exceeded $X" fires into a Slack channel at 3am on a Saturday. Nobody saw it for eleven days because seeing it was a person's job, and people sleep, and go on vacation, and assume the thing that's been fine is still fine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these is useful. None of them is a stop. They are all instruments on the dashboard; not one of them is the brake pedal.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a stop actually requires
&lt;/h2&gt;

&lt;p&gt;If you want to &lt;em&gt;stop&lt;/em&gt; a runaway run — not narrate it — the enforcement has to satisfy three properties. Miss any one and you're back to writing post-mortems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. It has to be deterministic.&lt;/strong&gt; No model in the decision path. The whole problem is an unbounded non-deterministic system; you do not get to bound it with another non-deterministic system and call it safe. "We added an LLM that decides when to stop the LLM" is not a control, it's a second thing that can fail. The limit is &lt;code&gt;total_cost &amp;gt; ceiling&lt;/code&gt; evaluated in compiled code, or it is not a limit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. It has to be pre-call.&lt;/strong&gt; The check runs &lt;em&gt;before&lt;/em&gt; the next request leaves your process, and refuses it. Anything that runs after the call has, by definition, already let the call happen and the dollars leave. Post-hoc enforcement is a contradiction — the enforcement and the spend race, and the spend wins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. It has to be per-run, not per-call.&lt;/strong&gt; The unit that goes wrong is the run: total dollars, total loop iterations, total wall-clock, accumulated across every call the run makes — plus a kill switch you can pull from outside. That's the altitude the pathology lives at, so that's the altitude the budget has to live at.&lt;/p&gt;

&lt;p&gt;Deterministic, pre-call, per-run. That's the shape of a brake pedal. I'll say the obvious thing: this is not novel computer science. It's a hard-coded resource governor, the kind of thing operating systems and databases and trading systems have had for decades. The novelty is purely that the agent world skipped it.&lt;/p&gt;

&lt;h2&gt;
  
  
  I've built this before, in a less forgiving domain
&lt;/h2&gt;

&lt;p&gt;I've spent years building deterministic risk engines that wrap non-deterministic systems — the kind where a mistake costs real money, in real time, with no undo. And the lesson was always the same: the thing that kept those systems safe was never the smart part. It was a dumb, hard-coded, deterministic layer wrapped around the smart part, with the authority to say &lt;em&gt;no&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The smart part proposes. The deterministic layer disposes. Every irreversible action is gated. You do not let the clever, probabilistic component hold the kill switch, because the whole reason you need a kill switch is that the clever component is the thing that goes wrong.&lt;/p&gt;

&lt;p&gt;Agents are exactly this pattern wearing new clothes. The LLM is the strategy. The risk engine belongs in compiled, statically-typed code that the LLM cannot talk its way past.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like in practice
&lt;/h2&gt;

&lt;p&gt;That conviction is why I've been building &lt;a href="https://github.com/prashar32/riskkernel" rel="noopener noreferrer"&gt;RiskKernel&lt;/a&gt; — an open-source, self-hosted runtime that puts that deterministic layer in front of an agent you already have. I'll keep this concrete rather than salesy, because the &lt;em&gt;shape&lt;/em&gt; is the point and you could build a version of it yourself.&lt;/p&gt;

&lt;p&gt;You point an existing agent at it with one environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;OPENAI_BASE_URL&lt;/span&gt;=&lt;span class="n"&gt;http&lt;/span&gt;://&lt;span class="n"&gt;localhost&lt;/span&gt;:&lt;span class="m"&gt;7070&lt;/span&gt;/&lt;span class="n"&gt;v1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now every call routes through a governor. You set a per-run budget — dollars, loop count, wall-clock seconds — and the moment the run crosses a ceiling, the &lt;em&gt;next&lt;/em&gt; call is refused with an HTTP 402 instead of being forwarded. Enforced in Go, never by a model. Bring your own provider key; nothing leaves your machine except the call you were already making.&lt;/p&gt;

&lt;p&gt;One detail I think is worth stealing regardless of what you use: &lt;strong&gt;a call that already reached the provider is never silently discarded.&lt;/strong&gt; You paid for it, so it's returned to you — and it's the &lt;em&gt;following&lt;/em&gt; call that gets refused. The ledger stays honest; the budget never double-counts the request that tipped it over. The brake engages on the next rotation of the loop, not by throwing away work you already paid for.&lt;/p&gt;

&lt;p&gt;And because the failure mode I actually lose sleep over is a long, legitimate run dying halfway through, it checkpoints: you can &lt;code&gt;kill -9&lt;/code&gt; a run and resume it without re-spending or restarting from zero. That's the part that makes a long agent run &lt;em&gt;safe to leave running&lt;/em&gt;, which is the whole game.&lt;/p&gt;

&lt;h2&gt;
  
  
  The honest part
&lt;/h2&gt;

&lt;p&gt;The honest edges, today: single instance on SQLite, one API token, no streaming yet (the proxy returns a clean 501 for &lt;code&gt;stream: true&lt;/code&gt; — mid-stream enforcement is genuinely hard and I'd rather ship it right than ship it loud); native providers are Anthropic and OpenAI, with the long tail via an upstream gateway. It's Apache-2.0, and it phones home to no one — the only outbound traffic is to your provider and the backends you point it at. I'd rather you know the edges than discover them.&lt;/p&gt;

&lt;p&gt;It is also not trying to be your observability stack or your policy firewall. It emits OpenTelemetry to whatever you already run; it interoperates with the gateways and the dashboards. It competes on exactly one thing: deterministically stopping a run before it hurts you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;If you run agents, the question to ask isn't &lt;em&gt;"will I find out when one goes runaway?"&lt;/em&gt; You will — eventually, in the logs, in the bill, in the post-mortem. The question is &lt;em&gt;"what stops it before I find out?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Logging is not that thing. Monitoring is not that thing. &lt;code&gt;max_tokens&lt;/code&gt; is not that thing. A deterministic, pre-call, per-run limit with a kill switch is — and it's a few hours of work to put one in front of an agent you already have, whether you use what I built or roll your own.&lt;/p&gt;

&lt;p&gt;Eleven days. Forty-seven thousand dollars. A dashboard that saw all of it. Don't be the dashboard.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/prashar32/riskkernel" rel="noopener noreferrer"&gt;RiskKernel&lt;/a&gt; is open-source (Apache-2.0) and self-hosted — &lt;code&gt;pip install riskkernel&lt;/code&gt; or &lt;code&gt;docker run&lt;/code&gt;. If you put it in front of an agent and the guardrails are too strict or too loose, I'd genuinely like to hear where — that feedback is what the next release is made of.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>devops</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
