Everyone's cheering the latest token price drops from OpenAI and Anthropic. Great. But my cloud bill doesn't seem to care. It's still climbing.
What gives?
It's the "agentic" workflow trap. We've moved past simple text-in, text-out chatbots. Now we're building agents that think, loop, and run multiple steps to complete a task.
A simple chatbot call might use 2k tokens. An agent figuring out a multi-step problem? I've seen them burn through 50k-100k tokens for a single task. The reasoning loops, error correction, and tool usage stack up fast.
Gartner just put out a warning about this. They're saying agents can use 5x to 30x more tokens than a standard chatbot call. So while the per-token price is 80% lower, our usage is quietly exploding by 500% or more. The math isn't in our favor.
The second part of the problem is per-customer attribution. If you have a multi-tenant SaaS, how do you know which customer's agent just went rogue and spent $50? Most basic monitoring just shows a single, terrifying number going up. You can't bill it back, you can't warn the user, you can't do anything but pay it.
This is the stuff that kills margins in AI products.
fwiw, I've been dealing with this by building better monitoring. I built LLMeter to get per-user cost attribution. It's open-source (AGPL). It hooks into OpenAI, Anthropic, etc. and lets me see exactly which user ID is responsible for which costs.
At least now when the bill spikes, I know who to blame.
Top comments (0)