A few weeks ago I woke up to a $100 charge from my AI provider.
For a lot of people that's nothing. For me, a solo dev who obsessively keeps infrastructure costs near zero, it genuinely stung. But that wasn't even the part that got me.
The part that got me was what I found when I went looking for answers.
The Actual Problem
Reddit threads. Developer forums. People waking up to $10,000. $30,000. $80,000 bills.
Three root causes, over and over:
- Leaked API keys scraped from public GitHub repos
- Autonomous agents stuck in retry loops, burning tokens all night
- Provider "budget alerts" that notify you after the money is already gone
That last one is what really broke my brain. The alerts are just dashboards with email attachments. They don't stop anything. You still get the bill.
The Thing I Built
I built Loopers- a reverse proxy that sits between your application and your LLM provider and enforces a hard dollar cap.
Not a soft alert. A kill switch.
# Your app talks to Loopers instead of OpenAI directly
curl http://localhost:8080/openai/v1/chat/completions \
-H "Authorization: Bearer lp-your-key" \
-H "X-Loopers-Provider-Key: sk-your-openai-key" \
-d '{"model": "gpt-4o-mini", "messages": [...]}'
If your budget is hit, the request dies right there. The provider is never called. No tokens burned. No bill.
The Interesting Engineering Bit
The hard part isn't blocking pre-call requests. That's easy. The hard part is streaming.
With SSE streaming, the provider is already sending you tokens by the time you realize cost is climbing. So Loopers intercepts the stream in real-time, counts tokens chunk-by-chunk, and severs the connection the moment cost crosses the reservation.
And when a client disconnects mid-generation (dropped connection, timeout, whatever), Loopers captures the exact token count generated up to that millisecond and refunds the remainder of the reservation back to Redis. No phantom charges.
The budget enforcement itself runs through Redis Lua scripts- single atomic transaction, no TOCTOU race conditions, even under heavy concurrent load.
What It Supports
- 6 providers: OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Mistral
- 5 budget windows: per-minute, hourly, daily, weekly, monthly - first limit hit wins
- Session budgets: cap an entire agent run across N steps
- Fail-closed: if Redis goes down, all requests are blocked. Your wallet is safe.
- MIT-licensed, self-hosted, Docker Compose
What I'm Still Figuring Out
The concurrent Lua atomicity holds up in tests (100 goroutines, same key), but I'd genuinely love a second pair of eyes on the scripts from anyone who's done serious Redis work.
And the streaming reconciliation pattern, I'm curious if others have solved mid-stream token accounting differently.
Try It / Rip It Apart
go run github.com/loopers-oss/loopers/cmd/loopers init
docker-compose up -d
→ github.com/CURSED-ME/loopers-oss
This is my first major Go project. I'd love brutal, honest feedback on the architecture, the code, the README clarity, anything. Drop it in the comments.
The core is fully MIT. I'm building a managed cloud version to fund continued OSS work but nothing is held back from the community repo.
Top comments (0)