Most Stripe billing bugs get described as webhook bugs.
Did the event arrive? Did the signature verify? Did the handler return 200? Is the handler idempotent? Can we replay failed events?
Those are good questions. But they miss another one:
Does the access state in your app match the billing state in Stripe right now?
That check is final-state reconciliation.
The failure mode
A common SaaS setup looks like this:
- Stripe is the billing source of truth.
- Your database decides who gets access.
- Webhooks and custom code keep the two in sync.
That works until it doesn't.
A few normal ways it breaks:
-
customer.subscription.deletedarrives during a deploy and the handler fails. - The webhook handler returns
200, but the database write rolls back. - Support manually enables or disables access in the admin panel.
- A migration changes plan/status fields and misses old rows.
- A customer cancels and later resubscribes, leaving multiple subscription records.
- Lazy sync only runs when someone opens the billing page, but the user keeps hitting API endpoints.
Now Stripe says one thing and your app says another.
There are two different problems here:
Unpaid but active
Stripe says canceled, unpaid, or past due, but the app still grants access. This is usually silent. Nobody opens a support ticket to say they are still getting free compute.Paid but blocked
Stripe says active or paid, but the app blocks or downgrades the customer. This is urgent because the customer will probably notice before your cron does.
Those two cases should not be handled the same way.
Webhook reliability is not the same check
A reliable webhook pipeline asks:
- Did we receive the event?
- Did we process it once?
- Can we retry failed deliveries?
- Can we inspect what happened?
Final-state reconciliation asks:
- What does Stripe say now?
- What does the app grant now?
- Do those states agree?
- If not, which side needs review?
You probably need both.
Webhook infrastructure prevents a lot of failures. Reconciliation catches the ones that still escape. It also catches problems that never went through the webhook path: admin overrides, migrations, backfills, legacy status fields, and access logic that drifted over time.
What a small reconciliation check needs
You do not need a huge system to start.
From Stripe, export or query:
- customer ID
- subscription ID
- subscription status
- product or plan
- amount/MRR, if you want exposure estimates
- current period end / cancel-at-period-end, if relevant
From your app, export:
- internal user or workspace ID
- Stripe customer ID
- access flag or entitlement status
- plan/tier, if your app stores it
- any field your request path actually reads
That last part matters.
If your middleware checks access_enabled, your rate limiter checks plan_tier, and your billing page checks subscription_status, your first drift problem might be inside your own database.
Start with the fields that actually grant or deny access.
Do not auto-fix on day one
It is tempting to auto-suspend every unpaid-but-active account.
I would not start there.
A first reconciliation job should usually flag, not fix. There are too many legitimate edge cases: trials, grace periods, dunning windows, enterprise comps, test accounts, manual support exceptions, and custom contracts.
A safer first workflow:
- Run the comparison nightly or weekly.
- Split findings by direction.
- Treat paid-but-blocked as urgent.
- Put unpaid-but-active and ambiguous cases into review.
- Add notes for known exceptions.
- Only automate actions after you trust the classification.
The first version should help you see drift, not create a new production incident.
I built a small local-first prototype
I built EntitleGuard to test this workflow as a free local audit.
It compares:
- a Stripe CSV export
- a minimal app users/workspaces CSV export
The comparison runs in the browser.
No Stripe API key. No database credentials. No account. No upload.
It flags:
- unpaid-but-active
- paid-but-blocked
- missing billing links
- orphaned Stripe subscriptions
- ambiguous cases that need review
The source is public, so the local-only claim is easy to inspect.
Live audit:
https://entitleguard.amertech.online/audit
Source:
https://github.com/impara/EntitleGuard
The product question I am testing now is whether this should stay as a one-time diagnostic or become recurring monitoring: nightly diff, alerting, review history, and an evidence trail for each mismatch.
My guess is that most teams only care about this after they have seen drift once.
If you run a Stripe SaaS
A practical first check:
- Export active and non-active Stripe subscriptions.
- Export the app table that controls access.
- Join on
stripe_customer_idif you store it. - Treat customer ID as more stable than subscription ID for access-level reconciliation.
- If a customer can have multiple subscriptions, rank by status instead of assuming one row.
- Keep the first version read-only.
- Review both directions separately.
This is not a replacement for correct webhook handling.
It is a backstop for the final state your users actually experience.
If Stripe and your app disagree, the user does not care that the webhook pipeline looked healthy.
Top comments (1)
This is the trap: webhooks are deltas, but access is state. You're reconstructing current state from a stream that can arrive out of order, drop an event, or replay one. Even with a perfectly idempotent handler, one missed event means your DB's access flag diverges from Stripe forever, because nothing ever re-checks.
What finally stopped the drift for us was to stop treating the webhook as the source of truth. Treat it as a "go look" trigger: on any subscription event, re-fetch the subscription from the API and reconcile access against its real current status, instead of mutating access straight from the event payload. Then run a scheduled reconciliation (nightly, or hourly for higher stakes) that pulls active subscriptions and heals whatever slipped through.
Webhooks tell you "something changed." They're a bad place to store what the state now is. The final-state check you're describing is exactly what makes the system self-correcting.