Your Stripe webhooks are probably dropping events and you do not know it.
I built a payment recovery system that processes Stripe webhooks for SaaS companies. In the first week, I discovered three ways webhooks silently fail. Here is what to watch for and how to fix each one.
The Problem Nobody Talks About
Stripe sends a webhook. Your server is restarting. The webhook fails. Stripe retries. Your server is back but your database has a lock. The retry fails. Stripe retries again 1 hour later. By then your customer's payment has failed, they got no email, and they churned.
This happens more often than you think. Stripe retries up to 3 times over 72 hours, but if your endpoint is flaky during that window, the event is gone forever.
Pattern 1: Idempotent Event Processing
Stripe can send the same event multiple times. If your handler is not idempotent, you will send duplicate emails, create duplicate subscriptions, or double-charge customers.
async function handleWebhook(event: Stripe.Event) {
// Check if we already processed this event
const existing = await db.processedEvents.findUnique({
where: { stripeEventId: event.id }
});
if (existing) {
console.log(`Event ${event.id} already processed, skipping`);
return { status: 'duplicate' };
}
// Process the event
const result = await processEvent(event);
// Mark as processed AFTER successful handling
await db.processedEvents.create({
data: {
stripeEventId: event.id,
type: event.type,
processedAt: new Date(),
result: JSON.stringify(result)
}
});
return result;
}
The key detail: record the event ID AFTER processing, not before. If your handler crashes mid-processing, the event will be retried. If you record it before processing and then crash, the event is lost.
Pattern 2: Respond 200 First, Process Later
Stripe expects a 2xx response within 20 seconds. If your handler takes longer (sending emails, updating multiple database tables, calling external APIs), the webhook times out and Stripe marks it as failed.
export async function POST(req: Request) {
const event = stripe.webhooks.constructEvent(
await req.text(),
req.headers.get('stripe-signature')!,
process.env.STRIPE_WEBHOOK_SECRET!
);
// Store the raw event immediately
await db.webhookQueue.create({
data: {
eventId: event.id,
type: event.type,
payload: JSON.stringify(event),
status: 'pending'
}
});
// Return 200 IMMEDIATELY
// Process asynchronously via cron or queue worker
return new Response('OK', { status: 200 });
}
Then process the queue separately:
// Cron job runs every minute
async function processWebhookQueue() {
const pending = await db.webhookQueue.findMany({
where: { status: 'pending' },
orderBy: { createdAt: 'asc' },
take: 10
});
for (const item of pending) {
try {
const event = JSON.parse(item.payload);
await handleEvent(event);
await db.webhookQueue.update({
where: { id: item.id },
data: { status: 'completed' }
});
} catch (error) {
await db.webhookQueue.update({
where: { id: item.id },
data: {
status: 'failed',
error: error.message,
retryCount: { increment: 1 }
}
});
}
}
}
This pattern eliminates timeout failures entirely. Your webhook endpoint becomes a simple queue writer that always responds in under 100ms.
Pattern 3: Reconciliation Cron (The Safety Net)
Webhooks are push-based. They will eventually fail. You need a pull-based safety net.
Run a reconciliation job daily that fetches recent events directly from the Stripe API and checks them against your database:
async function reconcile() {
// Fetch last 24 hours of relevant events from Stripe
const events = await stripe.events.list({
type: 'invoice.payment_failed',
created: {
gte: Math.floor(Date.now() / 1000) - 86400
},
limit: 100
});
for (const event of events.data) {
const processed = await db.processedEvents.findUnique({
where: { stripeEventId: event.id }
});
if (!processed) {
console.warn(`Missed event ${event.id} (${event.type})`);
await handleEvent(event);
}
}
}
This catches every event your webhook handler missed. Stripe keeps events for 30 days via the API, so even if your webhook endpoint was down for a week, the reconciliation cron will catch up.
The Events That Actually Matter
Not all webhook events are equal. For a SaaS handling subscriptions, these are the ones that directly affect revenue:
| Event | What happened | What to do |
|---|---|---|
invoice.payment_failed |
Card declined or expired | Send dunning email, retry payment |
customer.subscription.deleted |
Customer churned | Send win-back email, log for analytics |
customer.subscription.updated |
Plan change | Update entitlements, send confirmation |
invoice.paid |
Payment succeeded | Clear any dunning flags, send receipt |
customer.source.expiring |
Card expiring soon | Send "update your card" email BEFORE it fails |
The customer.source.expiring event is the most underused. Stripe sends it 30 days before a card expires. That is 30 days to get the customer to update their card BEFORE the payment fails. Most SaaS companies ignore this event entirely, then wonder why they have high involuntary churn.
What This Looks Like in Production
At Rebill, we process these events for SaaS companies that do not want to build their own dunning system. The stack is:
- Stripe Connect receives webhooks for all connected accounts
- Events go into a processing queue (Pattern 2)
- Failed payments trigger a 4-step email sequence via Resend
- Expiring cards trigger a 2-step reminder sequence
- Daily reconciliation cron catches anything missed (Pattern 3)
- Dashboard shows recovery rate and MRR impact
The whole system costs customers $19/month. The average SaaS loses 9% of MRR to failed payments. For a company doing $10K MRR, that is $900/month recovered for $19/month spent.
Quick Checklist
Before you ship your next Stripe integration:
- [ ] Webhook handler is idempotent (check event ID before processing)
- [ ] Handler responds 200 within 5 seconds (queue for async processing)
- [ ] Reconciliation cron runs daily to catch missed events
- [ ] You handle
customer.source.expiring(30-day early warning) - [ ] Failed events are logged with full payload for debugging
- [ ] Webhook signature verification is enabled (never skip this)
I build payment systems and full-stack SaaS applications. Currently working on Rebill, which handles failed payment recovery so you do not have to. Questions? theagentthatcould@gmail.com or book a call.
Top comments (0)