DEV Community: Joe Carpenter

How an AI Agent Ran Up a $47,000 Bill in 11 Days (And How to Stop It)

Joe Carpenter — Sat, 25 Apr 2026 07:52:13 +0000

How an AI Agent Ran Up a $47,000 Bill in 11 Days (And How to Stop It)

Published by Innovative Systems Global — April 2026

In November 2025, four AI agents entered an infinite retry loop.

Nobody noticed for 11 days.

When the bill arrived, it was $47,000. All of it from LLM API calls. All of it preventable. The team had logging. They had monitoring. They did not have a hard limit.

This is not a unique incident. It's becoming a rite of passage for engineering teams running agents in production.

Why this keeps happening

Every major LLM provider — OpenAI, Anthropic, Google — charges per token. The more your agent runs, the more you pay. This is the correct model. The problem is that agents don't know how much they're spending, and nothing stops them when they exceed a budget.

Current "solutions":

Spend alerts — fire after the damage is done. An alert at $1,000 doesn't help when an agent burns $4,700 per day.
API rate limits — these throttle requests per minute, not total spend.
Observability platforms (Helicone, LangSmith) — they show you what happened. They don't prevent it.
Cloud billing alerts — by the time AWS or OpenAI sends an alert, the loop has been running for days.

What's missing: a hard gate that runs before the LLM call, checks the budget, and refuses to proceed if the limit is exceeded.

The two-line problem

Here's what most agent code looks like:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

There is no cost tracking here. No budget check. No receipt. If this code runs 50,000 times in an infinite loop, you find out when the bill arrives.

The fix: meter every call, enforce every limit

We built dingdawg-governance to solve this. Three new MCP tools in v2.1.0:

meter_llm_call — call this after every LLM response. Pass the model, tokens in, tokens out, and your agent ID. Get back the cost, your cumulative spend, and your budget status.

{
  "receipt_id": "mtr_abc123_def456",
  "agent_id": "my-research-agent",
  "provider": "openai",
  "model": "gpt-4o",
  "prompt_tokens": 1200,
  "completion_tokens": 800,
  "cost_usd": 0.018,
  "cumulative_spend_usd": 12.43,
  "budget_status": "ok",
  "budget_limit_usd": 50.00
}

set_llm_budget — set a hard limit for any agent. Daily or monthly. Warning fires at 80% by default.

{
  "agent_id": "my-research-agent",
  "limit_usd": 50.00,
  "period": "daily"
}

get_spend_report — query spend by agent, model, and date range. See exactly which agents cost what.

How the $47K incident gets prevented

With dingdawg-governance wired:

Day 1: Agent starts loop. meter_llm_call tracks each call.
Day 1, ~$40 in: budget_status flips to "warning". Your code can log, alert, or throttle.
Day 1, $50 in: budget_status flips to "exceeded". Your code stops the agent.
Total damage: $50, not $47,000.

The enforcement is in YOUR code — you decide what to do when the budget is exceeded. The meter gives you the signal.

Installation

# As an MCP server (Claude Desktop, Cursor, any MCP-compatible client)
npx dingdawg-governance

# Claude Code
claude mcp add dingdawg-governance npx dingdawg-governance

Free tier: unlimited meter_llm_call and set_llm_budget calls. Local filesystem storage. No API key required.

Paid tier ($19/month): cloud receipt storage, team dashboards, cross-session spend history, PDF export. API key at dingdawg.com/developers.

Price table

Built in. Covers 30+ models across OpenAI, Anthropic, Google, Groq, Mistral, Cohere, and DeepSeek. Updated with each release.

If your model isn't in the table, it returns cost_usd: 0 with a note — it never silently miscalculates.

Works with any agent framework

dingdawg-governance is an MCP server. Any agent that can call MCP tools can use it — LangChain, AutoGen, CrewAI, custom agents, Claude Code, Cursor. No SDK required. No framework lock-in.

The broader problem

The $47K incident is the visible symptom. The real problem is that enterprises are deploying agents with no spend governance at all. Every dollar an agent spends is invisible until it's gone.

As agents become more autonomous — running overnight, chaining into other agents, operating without human supervision — the spend problem compounds. A single misconfigured retry policy can turn a $50 research job into a $50,000 infrastructure incident.

Budget enforcement isn't a nice-to-have. It's the seatbelt.

Get started

npx dingdawg-governance

Source: github.com/dingdawg/governance-sdk
Pricing: dingdawg.com/developers

Innovative Systems Global builds AI governance infrastructure for teams running agents in production. Based in the Rio Grande Valley, Texas.

I built a governance layer for AI agents after watching them fail silently in production

Joe Carpenter — Tue, 07 Apr 2026 15:03:33 +0000

Picture this: a healthcare AI agent is triaging patient intake. It's running on a solid model, well-prompted, tested in staging. In production, a patient describes symptoms that match two possible care pathways — one urgent, one routine. The agent picks routine. No error is thrown. No log entry flags it. No human is notified. The patient waits three days for a callback that should have been a same-day referral.

Nobody finds out until a follow-up call two weeks later.

I'm not describing a real incident. But I've talked to enough people shipping agents into healthcare, fintech, and legal workflows to know this scenario isn't hypothetical — it's a near-miss waiting in every ungoverned production agent.

The actual problem

When we started shipping AI agents into regulated environments, the agents themselves weren't the problem. The problem was what surrounded them. Or didn't.

No audit trail. When something went wrong, we had inference logs at best — token inputs and outputs, no semantic record of why a decision was made or what policy it touched.

No rollback. If an agent executed a bad action — sent a message, wrote a record, triggered a workflow — we had no native mechanism to undo it or even flag it for review.

No explainability. When a compliance officer asked "why did your agent do that?", the honest answer was "we don't know, here's the prompt."

No governance gate. Actions executed on intent match. There was no intercept layer that could say: this action requires human review before proceeding.

In consumer apps, that's a bad UX. In regulated industries, that's liability.

What we built

DingDawg is a governance layer that wraps any AI agent and intercepts every action before it executes. It's MCP-native, which means it slots directly into Claude Code, Codex, and Cursor without custom middleware. It also works with any Python agent via a two-line install.

pip install dingdawg-loop

from dingdawg import schedule_governed

schedule_governed(agent_id="@hipaa-intake", cron="0 9 * * *")

That's it. Every action the agent takes is now routed through a governance gate before execution.

What the governance receipt looks like

Every governed action produces a receipt:

{
  "action_id": "act_9f3a21bc",
  "agent_id": "@hipaa-intake",
  "timestamp": "2026-04-06T09:00:14Z",
  "action": "route_patient",
  "policy_result": "BLOCKED",
  "lnn_trace": {
    "features": [
      { "name": "symptom_urgency_score", "weight": 0.84, "direction": "ESCALATE" },
      { "name": "prior_visit_flag", "weight": 0.61, "direction": "ESCALATE" },
      { "name": "routing_decision", "weight": -0.91, "direction": "CONFLICT" }
    ],
    "explanation": "Agent routing conflicts with urgency signal at 0.84 confidence. Human review required before execution."
  },
  "ipfs_cid": "bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi",
  "policy_version": "hipaa-v2.1"
}

The LNN causal trace is not a black-box score. It's a weighted feature explanation — you can see exactly which signals triggered the block and why. The ipfs_cid is a content-addressed, immutable proof stored on IPFS. Your regulator can verify it. You cannot alter it after the fact.

The open-core model

The SDK, governance primitives, LNN trace engine, and MCP integration are Apache 2.0. Free. Open on GitHub at github.com/dingdawg/governance-sdk.

The cloud tier adds multi-agent orchestration, managed IPFS pinning, enterprise policy management, and a creator marketplace where governance plugins can be published and monetized. We think the core infrastructure should be auditable. You shouldn't have to take our word for it on something this critical.

The regulatory window is closing

EU AI Act enforcement starts August 2026. It requires audit trails, explainability, and human oversight mechanisms for high-risk AI systems — healthcare, hiring, credit, law enforcement, critical infrastructure.

Colorado SB 205 hits June 30 2026. Narrower but sharper — specifically targeting consequential automated decisions with a right-to-explanation requirement.

If you're shipping agents in any of these domains and you don't have governance infrastructure in place, you're building technical debt that will be expensive to retrofit under deadline pressure.

Try it

Free harness score — 2 minutes, shows exactly where your agent governance gaps are: dingdawg.com/harness

Free compliance scan:

pip install dingdawg-compliance

GitHub: github.com/dingdawg/governance-sdk
npm: npmjs.com/package/dingdawg-governance

If you're shipping agents in regulated environments, I'd genuinely like to hear what you're running into. The governance problem is underspecified and we're building in public.