It started with a Stripe notification.
$200 charged. Overnight. While I slept.
I opened my OpenAI dashboard expecting an anomaly. What I found was worse - everything looked normal. Thousands of API calls. Clean responses. No errors.
My AI agent had been working perfectly. It just hadn't stopped.
What Is a Token Loop
A token loop happens when your AI agent enters a cycle it can never escape. It calls a tool. The tool returns an ambiguous result. The agent retries. And again.
No exception is thrown. No alert fires. Your logs show healthy execution. Meanwhile, the meter is running.
Common triggers:
- Ambiguous tool outputs - the LLM can't decide if the result succeeded or failed
- Missing stop conditions - no maximum retry count
- Cost-unaware architecture - never designed to ask how much has this run cost so far
The Math That Should Scare You
A GPT-4o call costs roughly $0.005 per 1K input tokens. At 500 cycles per hour: 500 cycles x 4K tokens x $0.005 = $10/hour.
Let it run for 20 hours while you sleep: $200. Gone. No customer value delivered. Just a loop that did not know it was a loop.
What Monitoring Actually Looks Like
1. Execution Duration - If a run exceeds 2x the average, flag it immediately.
2. Token Count Per Run - Not per call but per run. A 10x spike is your early warning.
3. Cost Per Execution - $0.001 per run is fine. $4.50 per run is not. Set a threshold before damage is done.
4. Consecutive Failure Patterns - Three failed tool calls in a row is a loop signature. Halt automatically.
The Organizational Cost Nobody Calculates
The $200 is the visible damage. What does not show up on the invoice:
- Developer time debugging a run that left no useful trace
- Customer trust if the agent silently failed to complete their task
- Team morale - nobody ships AI features confidently if they might wake up to a Stripe surprise
The ROI of AI agent monitoring is not just cost savings. It is the ability to ship new agents without fear.
What I Built To Solve This
After the $200 incident, I built AI Agents Control Tower at https://agents.opsveritas.com - observability that sits outside your agent framework.
It tracks token usage and cost per execution, duration anomalies, and consecutive failure patterns. Real-time alerts to Slack or email when thresholds are breached. One SDK call at the start and end of each execution. Everything else is automatic.
Because the best time to add monitoring is before the $200 lesson. The second best time is right now.
Drop a comment if you are building AI agents in production - I would love to hear what has surprised you.
Top comments (0)