Patrick Hughes

Posted on May 5 • Originally published at bmdpat.com

When a $100B company burns its 2026 AI budget by April

#agentguard #costcontrol #aiagents

Uber burned its entire 2026 internal AI tooling budget on Claude Code by the end of April. Four months. A budget sized for twelve.

This is a $100B company with a real finance org. They still missed by 3x.

That data point matters. Not because Uber is incompetent. Because if their planning model breaks, every smaller team's planning model breaks the same way and faster.

The actual numbers

The story landed on Hacker News at 347 points in four hours. The shape of it:

2026 budget sized for ~12 months of Claude Code use across engineering.
Real burn rate ran ~3x projection.
Budget hit zero end of April 2026.
Adoption was bottom-up. Engineers self-onboarded. Finance had no per-team caps.
Internal review now considering hard per-team monthly caps and per-repo token budgets.
Triggered an explicit FinOps workstream around AI agent spend.

If you run any team that lets engineers use coding agents, this is your near future. The only question is whether you instrument before or after the bill arrives.

Why this happens

Three things stacked.

1. Bottom-up adoption. Coding agents do not roll out from a procurement deck. Engineers try them on a Tuesday and ship a feature on Wednesday. By Friday a team is dependent. There is no PO, no per-seat license review, no central tracking. Just personal API keys and a credit card on file.

2. Non-linear cost surface. A coding agent does not cost X dollars per task. It costs X dollars per token, and a single agent run can range from 5K tokens to 5M depending on what the engineer asks. You cannot extrapolate a month from a week. You cannot extrapolate next quarter from this quarter. Adoption goes up. Average task complexity goes up. Context window usage goes up. Each one multiplies the others.

3. No per-team caps. Most teams instrument the entire org against a single billing key. One runaway agent on one engineer's machine in one repo can spike the whole org's monthly spend before anyone notices. There is no circuit breaker. The first signal is the invoice.

Uber hit all three. Most teams will too.

The pattern beneath it

Coding agents are the first deployed product where the cost is not bounded by the user's intent. A human writing code generates roughly the same amount of code per hour every day. An agent generates as much code as you let it. Tell it to refactor a directory and it might burn 2M tokens. Tell it to refactor a repo and it might burn 50M.

That is a different thing than SaaS pricing. It is closer to electricity. You do not pay per click. You pay per kilowatt. And like electricity, the only sane defense is a meter and a breaker.

The meter tells you what you are using right now. The breaker stops you from burning the building down.

What instrumentation actually looks like

Not a dashboard. Dashboards are a lagging indicator. By the time you read the chart, the money is gone.

The minimum viable controls:

Per-agent budget cap. Each agent run knows its own token budget. When it hits the cap, it stops. No exceptions. No override flag that becomes the default.
Hard kill. Not a soft warning. Not an email to the team channel. A process termination when the cap hits. The agent dies. The engineer sees the cap was hit. They go raise it intentionally if they want to keep going.
Audit log. Every run writes a row. Tokens used, model, repo, time, cost. You can answer the question 'who burned $4K last Tuesday' in five seconds, not five days.
Per-team rate limits. A team's monthly token allocation is fixed. The team can spend it fast or slow. They cannot spend more.
Alerts at thresholds, not at zero. 50%, 75%, 90% of cap. Not just 'budget exhausted.'

These are not novel ideas. Every cloud platform has run them on compute for fifteen years. The pattern just has not made it to AI agent runtime yet because the platforms shipping the agents have no incentive to add it. Their revenue is your token usage.

What we built

AgentGuard is the open-source pattern for this. Pip install agentguard. Wrap your agent. Set a budget. The runtime enforces it.

It is the smallest possible thing that solves the problem. One Python package. No SaaS dependency. No data leaves your machine. Your agent does not run if the budget is exhausted. That is the whole pitch.

We built it because we were burning real money on our own coding agents and the cloud-native solutions all wanted us to pipe agent calls through a proxy. We did not want a proxy. We wanted a guardrail in the same process as the agent.

GitHub: github.com/bmdhodl/agentguard

It is MIT-licensed. Take it. Fork it. Ship it. Do not let your team be the next Uber.

What to do this week

If you have any coding agent in production:

Pull last month's actual spend. Not the projection. The real number.
Multiply by 3. That is your ceiling for next year if nothing changes.
If that number scares you, instrument before the bill scares you instead.

If you do not have a coding agent in production yet but you are evaluating one:

Pick the agent.
Pick the budget cap.
Pick the cap before you pick the rollout plan.

The cheapest moment to instrument is before adoption. The most expensive is after the budget is gone and finance has questions.

Uber's 2026 review is going to land on per-team caps and audit logs. Save yourself the cycle.

Try AgentGuard for the runtime cost-control pattern this post describes.

Top comments (1)

keesan.eth • May 22

This is exactly why token alerts are not budget enforcement. The failure usually happens earlier: no stop rules, no per-loop ceilings, and no receipt layer that shows which agent or retry actually created spend.

Large teams can absorb the waste longer, so the problem stays invisible until it explodes. In practice the control-plane layer matters at least as much as model price.