Amer tech

Posted on Jul 2

Stripe webhooks can work and your app access can still be wrong

#saas #stripe #webdev #startup

Most Stripe billing bugs get described as webhook bugs.

Did the event arrive? Did the signature verify? Did the handler return 200? Is the handler idempotent? Can we replay failed events?

Those are good questions. But they miss another one:

Does the access state in your app match the billing state in Stripe right now?

That check is final-state reconciliation.

The failure mode

A common SaaS setup looks like this:

Stripe is the billing source of truth.
Your database decides who gets access.
Webhooks and custom code keep the two in sync.

That works until it doesn't.

A few normal ways it breaks:

customer.subscription.deleted arrives during a deploy and the handler fails.
The webhook handler returns 200, but the database write rolls back.
Support manually enables or disables access in the admin panel.
A migration changes plan/status fields and misses old rows.
A customer cancels and later resubscribes, leaving multiple subscription records.
Lazy sync only runs when someone opens the billing page, but the user keeps hitting API endpoints.

Now Stripe says one thing and your app says another.

There are two different problems here:

Unpaid but active

Stripe says canceled, unpaid, or past due, but the app still grants access. This is usually silent. Nobody opens a support ticket to say they are still getting free compute.
Paid but blocked

Stripe says active or paid, but the app blocks or downgrades the customer. This is urgent because the customer will probably notice before your cron does.

Those two cases should not be handled the same way.

Webhook reliability is not the same check

A reliable webhook pipeline asks:

Did we receive the event?
Did we process it once?
Can we retry failed deliveries?
Can we inspect what happened?

Final-state reconciliation asks:

What does Stripe say now?
What does the app grant now?
Do those states agree?
If not, which side needs review?

You probably need both.

Webhook infrastructure prevents a lot of failures. Reconciliation catches the ones that still escape. It also catches problems that never went through the webhook path: admin overrides, migrations, backfills, legacy status fields, and access logic that drifted over time.

What a small reconciliation check needs

You do not need a huge system to start.

From Stripe, export or query:

customer ID
subscription ID
subscription status
product or plan
amount/MRR, if you want exposure estimates
current period end / cancel-at-period-end, if relevant

From your app, export:

internal user or workspace ID
Stripe customer ID
access flag or entitlement status
plan/tier, if your app stores it
any field your request path actually reads

That last part matters.

If your middleware checks access_enabled, your rate limiter checks plan_tier, and your billing page checks subscription_status, your first drift problem might be inside your own database.

Start with the fields that actually grant or deny access.

Do not auto-fix on day one

It is tempting to auto-suspend every unpaid-but-active account.

I would not start there.

A first reconciliation job should usually flag, not fix. There are too many legitimate edge cases: trials, grace periods, dunning windows, enterprise comps, test accounts, manual support exceptions, and custom contracts.

A safer first workflow:

Run the comparison nightly or weekly.
Split findings by direction.
Treat paid-but-blocked as urgent.
Put unpaid-but-active and ambiguous cases into review.
Add notes for known exceptions.
Only automate actions after you trust the classification.

The first version should help you see drift, not create a new production incident.

I built a small local-first prototype

I built EntitleGuard to test this workflow as a free local audit.

It compares:

a Stripe CSV export
a minimal app users/workspaces CSV export

The comparison runs in the browser.

No Stripe API key. No database credentials. No account. No upload.

It flags:

unpaid-but-active
paid-but-blocked
missing billing links
orphaned Stripe subscriptions
ambiguous cases that need review

The source is public, so the local-only claim is easy to inspect.

Live audit:

https://entitleguard.amertech.online/audit

Source:

https://github.com/impara/EntitleGuard

The product question I am testing now is whether this should stay as a one-time diagnostic or become recurring monitoring: nightly diff, alerting, review history, and an evidence trail for each mismatch.

My guess is that most teams only care about this after they have seen drift once.

If you run a Stripe SaaS

A practical first check:

Export active and non-active Stripe subscriptions.
Export the app table that controls access.
Join on stripe_customer_id if you store it.
Treat customer ID as more stable than subscription ID for access-level reconciliation.
If a customer can have multiple subscriptions, rank by status instead of assuming one row.
Keep the first version read-only.
Review both directions separately.

This is not a replacement for correct webhook handling.

It is a backstop for the final state your users actually experience.

If Stripe and your app disagree, the user does not care that the webhook pipeline looked healthy.

Top comments (2)

Mihir kanzariya • Jul 2

This is the trap: webhooks are deltas, but access is state. You're reconstructing current state from a stream that can arrive out of order, drop an event, or replay one. Even with a perfectly idempotent handler, one missed event means your DB's access flag diverges from Stripe forever, because nothing ever re-checks.

What finally stopped the drift for us was to stop treating the webhook as the source of truth. Treat it as a "go look" trigger: on any subscription event, re-fetch the subscription from the API and reconcile access against its real current status, instead of mutating access straight from the event payload. Then run a scheduled reconciliation (nightly, or hourly for higher stakes) that pulls active subscriptions and heals whatever slipped through.

Webhooks tell you "something changed." They're a bad place to store what the state now is. The final-state check you're describing is exactly what makes the system self-correcting.

Amer tech • Jul 2

That “webhooks are deltas, access is state” framing is exactly it.

I like the “go look” trigger pattern too. It avoids treating an event payload as the final answer, especially when events arrive late or out of order.

The piece I’m trying to validate is the scheduled backstop around it: not just re-fetching after events, but periodically asking “does Stripe’s current state still match what the app is granting?” That’s where missed events, manual overrides, migrations, and old access flags show up.

Did you end up keeping a review queue/history for mismatches, or did the scheduled job just heal known cases automatically?