I got a $100 AI bill. Then I found the $80,000 ones. So I built a kill switch.(2026)

#go #opensource #ai #devops

A few weeks ago I woke up to a $100 charge from my AI provider.

For a lot of people that's nothing. For me, a solo dev who obsessively keeps infrastructure costs near zero, it genuinely stung. But that wasn't even the part that got me.

The part that got me was what I found when I went looking for answers.

The Actual Problem

Reddit threads. Developer forums. People waking up to $10,000. $30,000. $80,000 bills.

Three root causes, over and over:

Leaked API keys scraped from public GitHub repos
Autonomous agents stuck in retry loops, burning tokens all night
Provider "budget alerts" that notify you after the money is already gone

That last one is what really broke my brain. The alerts are just dashboards with email attachments. They don't stop anything. You still get the bill.

The Thing I Built

I built Loopers- a reverse proxy that sits between your application and your LLM provider and enforces a hard dollar cap.

Not a soft alert. A kill switch.

# Your app talks to Loopers instead of OpenAI directly
curl http://localhost:8080/openai/v1/chat/completions \
  -H "Authorization: Bearer lp-your-key" \
  -H "X-Loopers-Provider-Key: sk-your-openai-key" \
  -d '{"model": "gpt-4o-mini", "messages": [...]}'

If your budget is hit, the request dies right there. The provider is never called. No tokens burned. No bill.

The Interesting Engineering Bit

The hard part isn't blocking pre-call requests. That's easy. The hard part is streaming.

With SSE streaming, the provider is already sending you tokens by the time you realize cost is climbing. So Loopers intercepts the stream in real-time, counts tokens chunk-by-chunk, and severs the connection the moment cost crosses the reservation.

And when a client disconnects mid-generation (dropped connection, timeout, whatever), Loopers captures the exact token count generated up to that millisecond and refunds the remainder of the reservation back to Redis. No phantom charges.

The budget enforcement itself runs through Redis Lua scripts- single atomic transaction, no TOCTOU race conditions, even under heavy concurrent load.

What It Supports

6 providers: OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Mistral
5 budget windows: per-minute, hourly, daily, weekly, monthly - first limit hit wins
Session budgets: cap an entire agent run across N steps
Fail-closed: if Redis goes down, all requests are blocked. Your wallet is safe.
MIT-licensed, self-hosted, Docker Compose

What I'm Still Figuring Out

The concurrent Lua atomicity holds up in tests (100 goroutines, same key), but I'd genuinely love a second pair of eyes on the scripts from anyone who's done serious Redis work.

And the streaming reconciliation pattern, I'm curious if others have solved mid-stream token accounting differently.

Try It / Rip It Apart

go run github.com/loopers-oss/loopers/cmd/loopers init
docker-compose up -d

→ github.com/CURSED-ME/loopers-oss

This is my first major Go project. I'd love brutal, honest feedback on the architecture, the code, the README clarity, anything. Drop it in the comments.

The core is fully MIT. I'm building a managed cloud version to fund continued OSS work but nothing is held back from the community repo.

Top comments (1)

Stephen Keegan • Jun 7

The 5-minute cache expiry change is brutal - it turns a well-behaved polling loop into a $6K incident through no fault of the developer. The dashboard lag makes it worse: by the time you see the spike, the damage is done. Real-time per-agent budget enforcement on the request path (429 before the provider) is the only mechanical fix I've seen work at scale.