AI agents are powerful, but they can also be expensive in a very quiet way.
When I use a normal chatbot, I send one message and get one answer. The cost is easy to understand. But when I let an AI coding agent work, it may read files, edit code, run tests, fail, retry, send more context, and call the model again and again.
Sometimes that is useful. Sometimes it is just stuck in a loop.
That made me think: most LLM dashboards only tell you how much money you spent after the money is already gone. I wanted something that could stop a dangerous agent run before the next provider call happens.
So I built AgentCostFirewall.
It is a local-first OpenAI-compatible proxy that sits between your AI agent and your model provider.
Cursor / Continue / OpenClaw / local agent
↓
AgentCostFirewall
↓
OpenAI-compatible provider
The idea is simple: detect risky or over-budget agent runs before they burn your API budget.
Right now it supports:
pre-call budget checks
over-budget blocking
basic runaway loop detection
exact cache
cache savings metrics
local dashboard
password auth
streaming passthrough
tool call passthrough
no-key demo mode
GitHub:
https://github.com/z13661122409-hub/AgentCostFirewall
I am looking for feedback from people using Cursor, Continue.dev, OpenClaw, Codex API-key mode, Cline, Roo Code, or custom local agents.
Would you put something like this in front of your AI agent?

Top comments (0)