How to Actually Cap AI Spend for Your Users: 3 Edge Cases Everyone Misses

#ai #opensource #webdev #infrastructure

The spend cap for my first AI project was just a limit on an Anthropic API key.

Crossing your fingers that users don't find your limit, and that it's high enough for their usage anyway, is a sign that you need a real control layer.

AI spend control is not trivial

It seems like it would be easy, right? It's just a meter per user and a few if statements.

Then the questions and edge cases start. And don't really stop.

Is that meter counting a credit balance burning down, or raw spend counting up?
Does it include usage the customer's plan already pays for, or only the part that spills over?
If they've got a promo credit grant active, does that count against the cap too, or is the cap only for money actually billed?
Is the meter in my own abstract units or tokens, and how do I map that to all of my vendor costs?
Does the cap reset on its own schedule, or does it follow the plan, billing period, or token reset schedule?

None of these are exotic. They're among the first questions any real customer conversation raises, and they apply anywhere AI usage gets billed.

Limitr Open-Source Project

Defining spend well enough to cap it touches your pricing (vendors, tiers, entitlements), your credit system, and your billing period logic all at once.

That's why we added spend caps to the Limitr open-source project.

Limitr is a runtime for observing, enforcing, pricing, and optimizing usage-based software. It enables us to define plans, entitlements, usage limits, and credits as a single config document; managed in one place, enforced everywhere.

Using it to define, observe, and control costs will make your life much better, for free.

Limitr Cloud adds managed policies, real-time alerting, billing/payments integrations, analytics, and more on top of the same engine.

Edge 1: Overage-only vs. total spend

Say a customer's plan includes $50/month of AI usage before overage kicks in. You (or they) want to cap their overage at $20, not their total spend at $20, which would mean they can barely use the feature they're already paying for.

These are two different caps, and conflating them is a common DIY mistake. A cap that watches total spend will trip long before a cap that watches overage, and customers on a plan with generous included usage will hit a wall that makes no sense to them.

In Limitr, this is a flag on the cap itself:

await policy.addCustomerCap('cus_123', 20, {
  cap_id: 'ai_overage_cap',
  overage_only: true, // only counts spend beyond what the plan includes
});

Without overage_only, the cap counts from dollar one. With it, the first $50 of included usage doesn't touch the cap at all — only the spend that spills past the plan boundary does.

Edge 2: Does a credit grant count against the cap?

Say you gave this customer a $10 promo credit as a goodwill gesture after a support ticket. They start using it. Does that spend count against their $20 overage cap?

There's a real argument for both answers. If the cap exists to protect your margin, promo credit spend absolutely should count — you're still paying the model provider. If the cap exists to protect the customer from bill shock, credit-covered spend shouldn't count, because they're not being billed for it.

Most DIY spend-limit code picks one answer implicitly and never surfaces it as a decision — usually by accident, whichever way the meter happens to be wired. It's the kind of bug that doesn't show up until a customer emails asking why their "free" credits triggered a spending block.

await policy.addCustomerCap('cus_123', 20, {
  cap_id: 'ai_overage_cap',
  overage_only: true,
  ignore_grants: true, // spend covered by an active credit grant doesn't count
});

ignore_grants: true means the cap only tracks spend the customer is actually being billed for. Flip it off and grant-covered usage counts too, useful if the cap is really a margin guardrail rather than a customer-facing promise.

Edge 3: Independent reset schedules

Plans reset monthly. Caps often need to reset on a completely different cadence, a $15 weekly guardrail layered on top of a $50 monthly plan limit, for instance. If your reset logic is a single cron job that zeroes out "the counter," you don't have two limits, you have one limit with an identity crisis.

Caps in Limitr carry their own reset schedule, independent of the plan or credit they're watching:

await policy.addCustomerCap('cus_123', 15, {
  cap_id: 'ai_weekly_guardrail',
  overage_only: true,
  reset_sch: 'weekly:mon', // resets every Monday
});

That weekly guardrail and the customer's monthly plan meter now tick independently, on their own clocks, without needing to know about each other.

What this looks like end to end

Here's the shape of it, a customer on a plan with included Claude usage on a custom price, a monthly overage cap that ignores promo credits, and a tighter weekly guardrail underneath it:

// npm i @formata/limitr
import { Limitr } from '@formata/limitr';

// JSON, YAML, TOML, or STOF (default)
const doc = `
policy: {
    credits: {
        claude_sonnet_4: {
            description: 'Claude Sonnet 4 token'
            overhead_cost: 1.5e-7
            price: { amount: 0.0003 }
        }
    }
    plans: {
        starter: {
            label: 'Starter Plan'
            entitlements: {
                ai_chat: {
                    description: 'AI chat feature'
                    limit: {
                        credit: 'claude_sonnet_4'
                        mode: 'soft'  // allow overage & send overage events
                        value: 16667  // ~$50 included at $0.0003/token
                        resets: true
                        reset_sch: 'monthly:1'
                    }
                }
            }
        }
    }
}`;

const policy = await Limitr.new(doc);
await policy.createCustomer('cus_123', 'starter', 'user', 'Jane Doe', [], [], {
  email: 'jane@example.com',
});

// $20/month overage cap - doesn't count included usage or promo credits
await policy.addCustomerCap('cus_123', 20, {
  cap_id: 'ai_overage_cap',
  overage_only: true,
  ignore_grants: true,
  reset_sch: 'monthly:1',
});

// $15/week guardrail - independent clock, same overage-only logic
await policy.addCustomerCap('cus_123', 15, {
  cap_id: 'ai_weekly_guardrail',
  overage_only: true,
  ignore_grants: true,
  reset_sch: 'weekly:mon',
});

// Spend! Call into LLMs, upload files, run GPU jobs - Limitr handles it all
// Vendor and price agnostic - change in the config without touching code
if (await policy.allow('cus_123', 'ai_chat', 6420)) {
  // within plan + caps, allowed and recorded
} else {
  // spend capped, hard limit hit, usage governed, etc.
}

// Get current state information
const cap = await policy.customerCap('cus_123', 'ai_overage_cap');
console.log(`Current customer overage (USD): $${cap?.meter_value ?? 0}`);

Three flags, two independent clocks, one call to allow(). Everything upstream of that call — plan boundaries, grant interactions, reset timing — is already resolved by the time you get a yes or no.

Three easy decisions, one hard system

Each of those flags is a seemingly five-minute internal conversation on its own with a clear answer in the moment. Overage-only or total? Grants in or out? Reset on its own clock or the plan's?

But in a real system, both sides of each question turn out to have a real customer behind them, and reversing a decision later isn't free. Flip overage_only after an enterprise customer is already under contract, and you've quietly changed what they agreed to pay for. Ship the wrong ignore_grants default, and support finds out from an angry email, not a design review.

There's more to spend than a single dollar figure, too. Vendors don't all meter the same way, and mapping tokens, GPU-seconds, and flat fees onto one number is its own can of worms. That's a topic for another post.

If you're building metered AI features and want to poke at this yourself, make sure to check out Limitr.