Vildan Bina

Posted on Jul 5

Bursora, a tool that blocks AI spend before the call, not after the bill

#ai #showdev #opensource #typescript

One of my AI features got stuck in a loop and spent way more than it should have. I found out the next morning, from the provider dashboard.

That's the problem with every cost tool I tried. Helicone, Langfuse, the provider dashboards. They all show you the bill after it's already spent.

I wanted one that says no before the call goes out. So I built it and made it open source. It's called Bursora.

The idea

Check the budget first. If the call would go over your limit, block it. If not, let it through and record what it cost.

A traffic light, not a speed camera.

The whole setup

Install it:

npm i @bursora/sdk openai

Wrap your client once:

// lib/openai.ts
import OpenAI from "openai";
import { wrap } from "@bursora/sdk";

export const openai = wrap(new OpenAI(), {
  apiKey: process.env.BURSORA_API_KEY!,
  endpoint: process.env.BURSORA_ENDPOINT!,
});

That's it. You still call the API the same way you always did:

await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "hi" }],
});

Each call asks "can I spend this?" first. On yes, it goes straight to OpenAI. No proxy. Your traffic never touches my servers. After the call, it reports the real token cost back.

When you hit a limit

The call throws before it ever reaches the provider:

import { BudgetExceededError } from "@bursora/sdk";

try {
  await openai.chat.completions.create({ /* ... */ });
} catch (err) {
  if (err instanceof BudgetExceededError) {
    // handle it your way
  } else {
    throw err;
  }
}

So a runaway loop stops itself instead of billing you for 10,000 calls.

Finding who spent it

One big number tells you nothing. Tag your calls, and spend gets grouped by customer, agent, or workflow:

import { withTags } from "@bursora/sdk";

await withTags({ tenant_id: "acme", agent_id: "support-bot" }, async () => {
  await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "hi" }],
  });
});

Now a spike has a name.

If Bursora is down

Your call still goes through. I never want this to be the reason your app can't reach OpenAI. It just misses tracking that one call.

The license

The dashboard is Apache-2.0, the SDK is MIT. You can self-host all of it on your own Postgres, no feature gates. One billing module is under a separate commercial license, but self-hosters don't need it and the open build leaves it out. Saying that plainly so nobody feels tricked when they read the tree.

Still rough

Pricing sync only covers the big providers right now. For a smaller model you might have to set the token cost by hand for now.

Self-host it: https://github.com/bursora/core
Docs: https://bursora.com/docs

If you've been hit by a surprise AI bill, I'd like to hear how you found out.

Top comments (2)

Raju Dandigam • Jul 7

“A traffic light, not a speed camera” is a strong framing because it gets to the real failure mode quickly: post-hoc dashboards are useful for attribution, but they do nothing for the runaway loop that is already spending. The preflight budget check is the interesting part, especially since you kept the provider call path direct instead of turning the whole thing into a traffic proxy. I also like the tagging angle because budget controls become much more actionable once spend is attributable to a customer, workflow, or agent boundary instead of one big total. This feels especially relevant for agent systems where retries and tool loops can compound quietly. Have you found that teams want hard fail-closed limits by default, or more “soft cap + degrade model” behavior once they wire this into production flows?

Vildan Bina • Jul 12

Thanks, you read it exactly how I hoped. On your last question: it splits by where the limit sits, not by team type. Most teams start soft almost everywhere because nobody wants their first day with Bursora to be "it blocked real production traffic." Alert-only for a week, build trust in the numbers, then turn on hard limits in the two spots that can actually blow up: per-customer caps and the agent/tool-loop boundary you mentioned. One customer or one looping agent hitting a wall is fine; the whole app going dark is not. Usually there's also a single workspace-wide hard cap sitting way above normal spend as the "this should never happen" backstop.

The "soft cap + degrade" pattern is real, but it's less about Bursora and more about what the app does on a block. We just return a clean block signal; the interesting part lives on their side, catching it and dropping to a cheaper model, shrinking context, or queuing the request. So the same hard limit becomes a graceful degrade or a hard stop depending on how they handle it. The traffic light only says red. The app decides whether red means stop or take the side road