I've been building AI agents for a while now. A few weeks ago I left one running overnight. Woke up to a $200 Anthropic bill. The agent got stuck in a loop — same API call, over and over, for 6 hours. Nobody noticed.
After digging into it, I realized this wasn't just a "me problem." Agents get stuck. Retries cascade. Budgets blow up. And most of the time you only find out when the invoice arrives.
So I built ARIA — guardrails for AI agents.
It sits between your app and your AI provider (Claude, GPT, Gemini). It watches every call and catches:
Agent loops — blocked at call #3, not call #100
Cascade failures — retries that multiply your costs 5x
Budget overruns — hard stop when you hit $0
Duplicate calls — cached instantly, $0 cost
Corrupted input — blocked before the model hallucinates on garbage
It has two modes:
Detection (default) — just watches and gives you a health report showing what's going wrong. Zero risk, doesn't touch anything.
Prevention (beta) — actually blocks the failures in real-time. Your agent tries to loop? Blocked at call #3. Budget exhausted? Hard stop. Cascade starting? Paused before it spirals.
Here's what it looks like blocking a loop — watch the response times. Real API calls take 700ms+. Blocked calls return in 3ms (never reached the API):
Tested on 354 real API calls across 3 providers. 0 false positives. 12/12 stuck agents caught.
Works with Node.js and Python. Takes 5 minutes to set up.
GitHub: github.com/clutchitggs/ARIA
It's early — I'm a solo founder looking for feedback. If you're running agents and hitting cost issues, I'd love to know what failure types you see most. And if you want to try prevention mode, just open an issue and I'll send you a key.

Top comments (0)