DEV Community

Topher Ross
Topher Ross

Posted on • Originally published at github.com

The Quote-as-Ceiling Billing Pattern

AI agent APIs bill by the step. You dispatch a job and pay for however many tool calls, reasoning steps, or browser actions the model decides to take. Before dispatch, you don't know the final bill — and neither does your user.

That's a problem if you're building a product on top of one of these APIs. "This run will cost about $X" in a product UI is a promise, and you don't want to break it. The gap between your estimate and the actual cost has to go somewhere — either you eat it, or your user does.

Most billing code I've seen hopes the estimate is close and just bills whatever came back. That's a bug. It shows up the first time a single run costs 3× what you quoted, and your support inbox lights up.

The pattern

The estimate shown before dispatch is the maximum ever charged. Overruns are absorbed by the platform, not passed to the user.

It's a correctness guarantee for usage-metered APIs — not a suggestion, not a dashboard metric. A CI-enforced ceiling.

Three invariants, asserted at runtime in the billing path:

1. Double-entry      billed_credits + platform_absorbed_credits == actual_credits
2. Ceiling           billed_credits <= quoted_credits
3. Idempotency       retries on the same run_id are a no-op
Enter fullscreen mode Exit fullscreen mode

Every call to apply_quote_ceiling proves these. They don't live in tests — they live in assert statements on the hot path. A bug in the billing slice crashes loud in CI instead of silently overcharging a subscriber.

What it looks like

I extracted the pattern from a production product I'm building (Sentimentary) into a tiny Apache-2.0 Python package: quote-ceiling-estimator. A FastAPI demo exposes two endpoints.

First, get a quote. The returned credits field is the ceiling.

curl -s -X POST http://127.0.0.1:8000/estimate \
  -H "Content-Type: application/json" \
  -d '{"target_population_size": 25,
       "channels": [{"slug":"wsj","category":"news"}],
       "tier_name": "Investigator"}'
# => {"credits": 5.0, "cost_usd": 0.0625, ...}
Enter fullscreen mode Exit fullscreen mode

Then dispatch the job. When the actuals come back, you record them against the quote:

# A run that overran 3×
curl -s -X POST http://127.0.0.1:8000/run/record \
  -H "Content-Type: application/json" \
  -d '{"run_id":"demo-1",
       "quote_credits":5.0,
       "actual_credits":15.0,
       "sale_usd_per_credit":0.0125}'
# => {"inserted": true, "entry": {
#      "billed_credits": 5.0,              # capped at the quote
#      "platform_absorbed_credits": 10.0,  # the overrun, absorbed
#      "platform_absorbed_usd": 0.125,     # accrued to the platform
#      "enforced": true
#    }}
Enter fullscreen mode Exit fullscreen mode

The subscriber pays the quote — not a penny more. The overrun is still there, but instead of being an invisible gap between what the agent vendor charged you and what you charged your user, it's now a visible number: platform_absorbed_usd. Track it, alert on it, reconcile it at close.

The accounting angle (skip if you're not in finance)

Once you start absorbing overruns you've made an accounting decision, whether you meant to or not. The money the platform eats has to land somewhere on the books — as a cost, a discount, or just a mysterious gap. This library picks the answer most revenue accountants would give for usage-metered products:

  • Revenue = what the subscriber pays (the quote, or the actual if it came in under).
  • Overrun = a cost of goods sold, same category as the agent-vendor bill itself. Not a discount off revenue.

That lines up with US GAAP's ASC 606 standard (IFRS 15 if you're not in the US) — specifically the part about "variable consideration," which is a formal name for "the price might move after the sale." A short memo in the repo (docs/BILLING_CONTROLS.md) walks through the reasoning.

The practical win: platform_absorbed_usd is a real number on every run, per tier. You can alert on it, graph it, and hand your finance team a clean cost line — instead of reconstructing it from Stripe exports three weeks after the quarter closed.

Safe-by-default rollout

You can't ship a hard ceiling on top of an existing billing path overnight. The library ships three phases behind flags:

Phase Flag Behavior
0 — Shadow (default) all flags off Quote is computed and persisted; nothing enforces it yet. Compare in logs.
1 — Deadline estimator_drive_deadline=True Orchestrator uses pipeline_seconds × 1.05 as the wall-clock ceiling.
2 — Billing estimator_enforce_quote_ceiling=True + estimator_absorb_variance=True billed_credits capped; overruns accrue to platform_absorbed_credits and platform_absorbed_usd.

You turn Phase 0 on for a week and watch how often your estimator misses — the logs show the drift (actual - quote per run). Tune until the misses are reasonable, then flip Phase 2 and start honoring the ceiling.

How this library got hardened: extraction + AI test generation

This repo isn't a clean-room implementation — it's an extraction from a working product (Sentimentary). The quote-ceiling pattern had been live in production for months, wired into an orchestrator, a ledger service, a subscriber UI, and a metering path. It worked in its home.

The moment custom code leaves that home, though, every implicit contract becomes a real contract. Error shapes, schema field names, zero-value semantics — things the original callers knew from context are suddenly the API. You can copy-paste and hope, or you can find those assumptions systematically.

I ran TestSprite (an AI test-generation agent) against the extracted FastAPI demo immediately after publishing it. Round 1 passed 3 of 8 tests. The five failures fell into three categories:

  • Contract stability — my Unknown tier_name 400 response included the sorted tier list inside the detail string. No client could write a stable assertion against "unknown tier_name: 'X'. Known tiers: ['Curious', 'Investigator', …]". Moved the list to an X-Known-Tiers header; the detail became the stable literal "Unknown tier_name".
  • Documentation drift — TestSprite generated tests from a code summary I'd written by hand; I'd hallucinated a target_samples field on the response that doesn't exist (the real field is per_agent[].target_size). Fixed the summary.
  • Semantic inference — TestSprite inferred from my docs that quote_credits == 0, actual_credits > 0 should return an error. In the pattern, that case is full platform absorption — which is the point. I hadn't written that down anywhere. Documentation fix + explicit tests.

Round 2, after fixes: 10/10. The 59 pytest cases that back the ceiling invariants grew out of the same loop — each Round 1 finding added a local regression test, so the contract is now nailed down at both the HTTP and function level.

The part I didn't expect: some of the same issues existed in the source platform's production code too, hidden because the existing test suite never exercised those paths. Extracting the pattern to share it ended up being a way to audit the production implementation by proxy.

That's a workflow I didn't plan — "extract, let an AI agent probe the contract, use the findings to audit the source" — but it seems broadly useful. Any time you have a useful pattern locked inside a larger system and you're considering open-sourcing it, the extraction itself becomes a forcing function for the contract hygiene you've been getting away without.

When to use this

Ordered most-specific-first — if any of these sound like you, drop the module in:

  1. AI agent APIs that charge by the step. TinyFish, OpenAI Agents SDK, Anthropic tool use, Gemini function calls — any provider where you dispatch a job and pay for however many steps the model takes.
  2. Usage-metered SaaS. Any product where a backend call fan-outs to an unbounded number of downstream billable units.
  3. Per-run billing with SLA budgets. Anywhere you've written "this run cost ~$X" in a product UI and don't want to be wrong.

Links

If you're shipping on step-metered APIs and want a hand wiring this into your stack, I'm around — the full orchestrator (estimator, ceiling, accrual, investor reconciliation) runs the billing path for Sentimentary today.

Top comments (0)