I charge per invoice extracted. One upload of a five-invoice PDF should cost five credits, a failed extraction should cost nothing, and a user with one credit left must never get two. That last rule is where it got interesting.
This is the billing design behind GSTExtract — a tool that reads Indian GST invoice PDFs into Excel — and the concurrency bug a second-pass audit found in it. Writing it up because "meter a paid API correctly" is one of those problems that looks trivial until you hold it up to the light.
The shape of the problem
Each extraction calls a vision model, which costs me real money. So the user spends a credit per invoice. The catch: I don't know how many invoices are in a PDF until after the model reads it. A single page can hold three invoices; a "10-page" file might be one. I can't charge up front because I don't know the count, and I can't charge after without a window where a user with a zero balance has already burned my API budget.
The answer most billing systems land on is reserve → settle → refund:
- Reserve one credit before doing the expensive work. This is the gate — if it fails, the user is out of credits and nothing runs.
- Settle after success: charge the actual count minus the one already reserved.
- Refund the reserve if the extraction failed, so failures are never billed.
Money and credits are integers. Every change is also written to an append-only ledger with an idempotency key, so a retry can never double-charge. Balance is derived: the sum of remaining credits across non-expired "lots" (a lot is one grant — a signup bonus, a purchase). Consumption is FIFO by soonest expiry, so credits get used before they lapse.
That part worked. The bug was in the word "atomic."
The bug: two uploads, one credit
The original charge function did the obvious thing:
# read the balance
bal = current_balance(db, user_id)
if bal < n:
return False, 0
# ... then walk the lots and decrement in Python
for lot in lots:
take = min(lot.remaining, remaining)
lot.remaining -= take # <-- writes an ABSOLUTE value
Read the balance, decide, write the new value. Single-threaded, this is fine. Now picture a user with exactly one credit firing two uploads at the same moment (a double-click, or two tabs):
- Request A reads
remaining = 1. Decides it's allowed. - Request B reads
remaining = 1. Also decides it's allowed. - A writes
remaining = 0, inserts a-1ledger row. - B writes
remaining = 0, inserts a-1ledger row.
Both succeed. The user spent one credit and got two extractions. The idempotency key didn't save me — the two requests had different keys, because they're two genuinely different uploads. The read-then-write-absolute is the classic lost-update race.
The fix: let the database do the deciding
The cure is to stop computing the new value in Python and let the write itself be the gate — a conditional UPDATE that only fires if there's still enough left, checked under the row lock:
res = db.execute(
update(CreditLot)
.where(CreditLot.id == lot_id, CreditLot.remaining >= take)
.values(remaining=CreditLot.remaining - take)
)
if res.rowcount == 1:
taken += take # we actually got it
# rowcount 0 → someone else took it first; move on or roll back
remaining = remaining - take instead of remaining = <number I computed>. The WHERE remaining >= take re-checks the balance at write time, and SQLite serializes writers, so the second request blocks, re-evaluates against the now-committed remaining = 0, gets rowcount = 0, and is correctly told it can't reserve. Exactly one upload wins. Balance can never go negative.
I proved it with a test that fires N concurrent reservers at a one-credit account on a real file-backed SQLite DB (WAL mode, busy timeout — same as production, because an in-memory shared-connection DB won't reproduce writer contention). Exactly one succeeds, every time.
The second bug: a retry that costs me money
There was a sneakier one, found in the same audit. The workspace retries an upload once on a dropped connection, reusing the same request_id so it doesn't double-charge. And it didn't double-charge — the ledger keys saw to that. But the idempotency cache was written after the model call, not before. So a retry that arrived while the first call was still in flight sailed past the cache check and fired a second vision-model call. No double charge, but I paid for two API calls and logged a duplicate usage event.
Fix: mark the request in-flight before the expensive call, while everything above it is still synchronous (so two requests can't both pass the check), and have a retry return 409 instead of re-running:
cached = _REQUESTS.get(rid)
if cached:
if cached.get("in_flight"):
return JSONResponse({"code": "in_progress"}, status_code=409)
return JSONResponse(cached["response"])
# ... reserve the credit ...
_REQUESTS[rid] = {"in_flight": True} # before the model call
The lesson that keeps repeating: idempotency has to cover the side effect, not just the database row. The charge was idempotent. The expensive API call wasn't.
Why an append-only ledger earns its keep
The thing that made all of this debuggable: nothing is ever updated or deleted in the transaction log. Every grant, charge, and refund is an immutable row with a signed delta. That gives you a reconciliation invariant you can assert on a schedule:
the sum of all ledger deltas for a user == the sum of
remainingacross all their lots
If those two numbers ever disagree, a write went wrong, and an alert fires. (I also learned to not fold a cache-freshness check into that alert — a balance cache that goes stale when a credit quietly expires is not corruption, and it had me chasing phantom "drift" emails for a day.)
Takeaways
- Don't read-modify-write a balance. Make the write conditional (
WHERE remaining >= n) and trustrowcount. - Reserve → settle → refund cleanly separates "can they pay" from "what did it actually cost."
- Idempotency must guard the expensive side effect, not only the ledger insert.
- An append-only ledger gives you a cheap, always-true invariant to reconcile against.
- Write the concurrency test against a real on-disk DB, or it won't reproduce the race you're trying to kill.
The hosted tool has a free daily tier and twenty free credits on signup, with prepaid packs for heavier use — so this ledger runs on every extraction, and I'd rather it be boringly correct. If you want to see it in action, it's at gstextract.com; the extraction engine is open source, though the billing layer above is part of the hosted product.
If you've metered a paid API differently — especially how you handle the count-unknown-until-after problem — I'd genuinely like to hear it.
Top comments (0)