Token-based billing exposed AI's ROI problem: what the real numbers say

#ai #devops #architecture #webdev

In Q1 2026, OpenAI and Anthropic moved enterprise customers from flat-rate plans to token-based billing. The change looks administrative, but it had a direct consequence for engineering teams: the real cost of AI became visible for the first time. The market's reaction over the following two months was enough to reopen a question many considered settled: does AI actually deliver measurable ROI?

What happened when the bill arrived

The most documented case is Uber. The company had encouraged all employees to use agentic tools as much as possible and even ranked AI usage internally on leaderboards. The result: the entire annual budget was consumed in four months. The response was a $1,500/month cap per employee per agentic coding tool (Claude Code, Cursor, and similar). At Brex, engineers were limited to $500/week in tokens; employees outside engineering received a $5/week cap. T-Mobile temporarily capped usage at $2,000/month per user with plans to migrate to a tiered system. One unnamed company, according to Ed Zitron in "AI Is Slowing Down" (June 2026), spent $500 million on Anthropic models in a single month due to absent spend controls.

These are not isolated cases. A KPMG survey reported by the Wall Street Journal in June 2026 found that only 26% of companies have a comprehensive view of their AI costs; 50% have partial visibility; and 22% only find out what they owe after the bill arrives. Steve Chase, KPMG's global head of AI, told the Journal: "It's a new resource that needs to be managed that didn't exist quite that way, and we're seeing exponential growth."

The structural problem behind the spending caps

The spending caps are a symptom. The root cause, as Zitron details in the same article, is that the economics of generative AI require numbers that currently seem out of reach.

Anthropics has made over $330 billion in compute commitments with Google, Amazon, and Microsoft, plus another $45 billion with CoreWeave and SpaceX. To cover those commitments, it needs $174 billion in annual revenue by 2029. OpenAI is projected to burn at least $852 billion through the end of 2030 and has over $770 billion in compute commitments. The combined projected 2026 revenue for both companies sits around $60 billion, meaning they would need 496% growth by 2029.

According to The Information, cited in the article, OpenAI and Anthropic account for 89% of all AI startup revenues. Outside of NVIDIA, hyperscalers, OpenAI, and Anthropic, Zitron states he cannot identify any company spending more than a few hundred million on compute. The entire sector would need to generate over $2 trillion in annual revenue by 2030 to justify the infrastructure being built.

Microsoft AI CEO Mustafa Suleyman publicly stated that Anthropic's models are too expensive and that he intends to reduce Microsoft's use of them to zero — relevant given that Microsoft represents a significant share of Anthropic's customer base.

What token-based billing changed for engineering teams

Before the migration to token-based billing, model errors (loops, incorrect responses, reprocessing) effectively cost the end user nothing, since the cost was covered by the flat-rate plan. Starting in Q1 2026, every token consumed appears on the invoice.

Zitron addresses this directly:

"Think of it like this: if you're using an AI subscription with rate limits but no actual costs, any mistakes a model makes — such as getting stuck in a loop or just doing the wrong thing — can be dismissed as the troubled nature of early-stage technology, because the 'cost' was $20, $100, or $200 for the entire month. Anthropic, OpenAI and every other AI company deliberately obfuscated these costs because they knew that the second a user actually had to pay for the fuckups of an AI model they'd scream like they were being stung to death by bees."

Ed Zitron, "AI Is Slowing Down"

In other words, flat-rate billing acted as a buffer that made model inefficiency invisible. With token-based billing, every unnecessary agent iteration appears on the invoice.

This places new requirements on engineering teams adopting agentic tools:

Cost observability: real-time dashboards showing token consumption per team, per tool, per task type
Preventive limits: spending caps configured before scaling adoption, not after receiving the first surprise invoice
Output metrics: connecting token spend to measurable results (pull requests merged, tickets closed, features shipped to production)
Agentic workflow review: identifying unnecessary loops or reprocessing steps that consume tokens without producing different output

What to evaluate before scaling adoption

The shift to token-based billing does not make AI tools less useful. What it does is make the cost honest. For teams that do not yet have this instrumentation, the risk is the same as Uber's: rapid adoption growth, budget exhausted in months, and a forced decision to cut access rather than optimize usage.

We should evaluate, workflow by workflow, which use cases have verifiable ROI and which are in experimentation. The difference between the two is not a judgment call, but an operational one: experimentation needs a separate budget and explicit exit criteria.

How is your team handling cost visibility for agentic tools?

Fonte: AI Is Slowing Down — Ed Zitron