Everyone's shipping agents right now. ReAct, tool-calling loops, whatever.
Looks great in demos. But nobody mentions the billing dashboard the morning after.
Agent loops are entirely unpredictable. A simple task might take 2 LLM calls. Or the model gets confused, tries a failing tool 20 times, and burns 40 calls before timing out.
Local test: $0.05.
Prod: user triggers a loop, agent hallucinates, costs $4 for a single request. Multiply by 100 users.
Devs treat LLM calls like standard API calls. They aren't. They're variable-cost compute disguised as a REST endpoint.
If you run agents in prod, you need defensive monitoring:
- Hard iteration caps. Never let an agent run "until complete". max_iterations=5. Return an error instead of a massive bill.
- Per-tenant attribution. Global tracking is useless. When your Anthropic usage spikes 300%, you need to know exactly which
userIdcaused it so you can rate-limit them. - Budget alerts. Set up webhooks that fire the second a user crosses their daily quota.
tbh I got tired of building this from scratch for every project, so I built LLMeter.
It's an open-source dashboard (AGPL-3.0) for multi-tenant LLM cost tracking. Works with OpenAI, Anthropic, DeepSeek, and OpenRouter. You pass the user ID, it tracks the cost per user, per day, per model.
Code is on GitHub (https://llmeter.org). fwiw, running agents without per-user monitoring is just asking for a denial-of-wallet attack. ymmv.
Top comments (0)