Webhook Best Practices: Retry Logic, Idempotency, and Error Handling
Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.
Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.
Understand the Delivery Model
Before building handlers, understand what you are dealing with:
- Providers send webhook events as HTTP POST requests
- They expect a 2xx response within a timeout (typically 5-30 seconds)
- If they do not receive 2xx, they retry on a schedule (often exponential backoff over hours or days)
- Most providers have a maximum retry count after which the event is dropped
- Some providers allow you to manually retry from their dashboard
Stripe retry schedule:
Attempt 1: immediate
Attempt 2: 5 minutes
Attempt 3: 30 minutes
Attempt 4: 2 hours
Attempt 5: 5 hours
Attempt 6: 10 hours
Attempt 7: 24 hours
... continues for ~72 hours total
This retry behavior is your safety net -- but only if your handler is idempotent.
Rule 1: Respond Fast, Process Async
Your webhook handler should acknowledge receipt immediately and do the actual work in the background. If you do database writes, call external APIs, or send emails synchronously inside the handler, you risk timing out.
// BAD: synchronous processing risks timeout
app.post('/webhook/stripe', async (req, res) => {
const event = JSON.parse(req.body);
if (event.type === 'payment_intent.succeeded') {
// This could take several seconds
await fulfillOrder(event.data.object);
await sendConfirmationEmail(event.data.object.metadata.email);
await updateInventory(event.data.object.metadata.items);
}
res.json({ received: true }); // might never get here if above throws
});
// GOOD: acknowledge immediately, process async
app.post('/webhook/stripe', async (req, res) => {
const event = JSON.parse(req.body);
// Queue the work — respond in milliseconds
await queue.add('stripe-webhook', { event });
res.json({ received: true }); // always returns 200 fast
});
// Worker processes the queue
queue.process('stripe-webhook', async (job) => {
const { event } = job.data;
if (event.type === 'payment_intent.succeeded') {
await fulfillOrder(event.data.object);
await sendConfirmationEmail(event.data.object.metadata.email);
await updateInventory(event.data.object.metadata.items);
}
});
The queue gives you retry logic, failure visibility, and async processing without blocking the HTTP response.
Rule 2: Make Handlers Idempotent
Since providers retry webhooks, your handler may receive the same event multiple times. You must make your handler safe to run more than once with the same event ID.
Without idempotency, a network blip that causes Stripe to retry a payment_intent.succeeded event could charge a customer twice, create duplicate orders, or send duplicate emails.
Track Processed Event IDs
The simplest approach: store event IDs and skip events you have already processed.
async function handleStripeEvent(event) {
// Check if we already processed this event
const existing = await db.query(
'SELECT id FROM processed_webhooks WHERE event_id = $1',
[event.id]
);
if (existing.rows.length > 0) {
console.log(`Skipping duplicate event: ${event.id}`);
return; // idempotent: no-op on duplicate
}
// Process the event
await processEvent(event);
// Record that we processed it
await db.query(
'INSERT INTO processed_webhooks (event_id, processed_at) VALUES ($1, NOW())',
[event.id]
);
}
Upsert Instead of Insert
When creating records from webhook data, use upsert (insert-or-update) instead of plain insert:
-- BAD: fails or creates duplicate on retry
INSERT INTO subscriptions (stripe_id, user_id, status, plan)
VALUES ($1, $2, $3, $4);
-- GOOD: idempotent, safe to run multiple times
INSERT INTO subscriptions (stripe_id, user_id, status, plan)
VALUES ($1, $2, $3, $4)
ON CONFLICT (stripe_id)
DO UPDATE SET status = EXCLUDED.status, plan = EXCLUDED.plan;
Use Database Transactions with Idempotency Key
For more complex operations, wrap the idempotency check and business logic in a transaction:
async function handleWebhookIdempotent(eventId, operation) {
return await db.transaction(async (trx) => {
// Atomic check-and-insert prevents race conditions on concurrent retries
const result = await trx.raw(`
INSERT INTO processed_webhooks (event_id, processed_at)
VALUES (?, NOW())
ON CONFLICT (event_id) DO NOTHING
RETURNING id
`, [eventId]);
if (result.rows.length === 0) {
// Already processed — skip
return null;
}
// Run business logic inside the same transaction
return await operation(trx);
});
}
Rule 3: Return the Right HTTP Status Codes
Your response code tells the provider whether to retry. Use it correctly:
| Status | Meaning | Provider behavior |
|---|---|---|
| 200-299 | Success | No retry |
| 400 | Bad request (your choice not to process) | Providers usually stop retrying |
| 401/403 | Unauthorized | Providers usually stop retrying |
| 500-503 | Your server error | Provider retries |
| Timeout | No response in time | Provider retries |
The key distinction: use 5xx when the error is transient (database temporarily down, external API timeout) and 4xx when the error is permanent (invalid payload format, unsupported event type).
app.post('/webhook', async (req, res) => {
let event;
// Signature verification failure: return 400, don't want retry
try {
event = verifyAndParseWebhook(req.body, req.headers);
} catch (err) {
return res.status(400).json({ error: 'Invalid signature' });
}
// Unknown event type: return 200, don't retry
if (!supportedEvents.includes(event.type)) {
return res.status(200).json({ received: true, skipped: true });
}
// Queue for async processing, return 200 fast
try {
await queue.add(event);
return res.status(200).json({ received: true });
} catch (err) {
// Queue is down: return 503 so provider retries later
return res.status(503).json({ error: 'Service unavailable' });
}
});
Rule 4: Handle Out-of-Order Delivery
Providers do not guarantee that webhooks arrive in the order events occurred. A customer.subscription.updated event might arrive before the customer.subscription.created event for the same subscription.
Design your handlers to work regardless of order:
async function handleSubscriptionEvent(event) {
const sub = event.data.object;
if (event.type === 'customer.subscription.updated') {
// Don't assume the subscription already exists in your DB
await db.query(`
INSERT INTO subscriptions (stripe_id, status, plan, updated_at)
VALUES ($1, $2, $3, NOW())
ON CONFLICT (stripe_id)
DO UPDATE SET
status = EXCLUDED.status,
plan = EXCLUDED.plan,
updated_at = EXCLUDED.updated_at
WHERE subscriptions.updated_at < EXCLUDED.updated_at
`, [sub.id, sub.status, sub.items.data[0].price.id]);
}
}
The WHERE subscriptions.updated_at < EXCLUDED.updated_at clause handles the case where an older event arrives after a newer one — it will not overwrite newer data with stale data.
Rule 5: Log Everything
Log enough to reconstruct what happened to any webhook event without going back to the provider's dashboard:
const logger = require('pino')();
app.post('/webhook', async (req, res) => {
const eventId = req.headers['stripe-event-id'] ?? 'unknown';
const eventType = req.body?.type ?? 'unknown';
logger.info({ eventId, eventType }, 'Webhook received');
try {
await queue.add({ event: req.body });
logger.info({ eventId, eventType }, 'Webhook queued');
res.json({ received: true });
} catch (err) {
logger.error({ eventId, eventType, err }, 'Failed to queue webhook');
res.status(503).json({ error: 'Unavailable' });
}
});
// In your queue worker
queue.process(async (job) => {
const { event } = job.data;
logger.info({ eventId: event.id, type: event.type, attempt: job.attemptsMade }, 'Processing webhook');
try {
await processEvent(event);
logger.info({ eventId: event.id }, 'Webhook processed successfully');
} catch (err) {
logger.error({ eventId: event.id, err }, 'Webhook processing failed');
throw err; // let the queue retry
}
});
Rule 6: Monitor Webhook Health
Failed webhooks are silent by default. Set up monitoring:
Check provider dashboards — Stripe, GitHub, and Shopify all show webhook delivery history. Check them regularly or set up alerts.
Alert on queue depth — If your webhook queue grows, something is wrong upstream.
Track error rates — Log a counter whenever a webhook handler fails. Alert if the error rate spikes.
Set up dead letter queues — Events that fail after all retries should go to a dead letter queue for manual inspection, not disappear silently.
// BullMQ dead letter queue example
const queue = new Queue('webhooks');
const worker = new Worker('webhooks', processWebhook, {
attempts: 5,
backoff: { type: 'exponential', delay: 1000 },
});
worker.on('failed', (job, err) => {
if (job.attemptsMade >= job.opts.attempts) {
// Move to dead letter queue
deadLetterQueue.add('failed-webhook', {
event: job.data.event,
error: err.message,
failedAt: new Date().toISOString(),
});
}
});
Testing Webhook Handling with HookCap
HookCap makes it easy to test these patterns before production:
Capture real webhook payloads — Point your provider to a HookCap endpoint to collect real events. Inspect headers, body structure, and signature format.
Test retry handling — Use HookCap's replay feature to send the same event to your handler multiple times. Verify that your idempotency logic prevents duplicate processing.
Test error recovery — Replay a captured event to a handler you deliberately break (return 500). Watch how your queue retries it. Fix the handler and replay again.
Simulate out-of-order delivery — Capture a sequence of related events and replay them in reverse order to verify your handler processes them correctly.
The replay feature is especially useful for idempotency testing: you can replay the same event ID dozens of times and confirm your database shows exactly one processed record each time.
Summary
Production webhook handlers need:
- Fast acknowledgment — Return 200 immediately, process async
- Idempotency — Track event IDs, use upserts, handle duplicate deliveries
- Correct status codes — 5xx for transient errors (retry-worthy), 4xx for permanent errors
- Order independence — Design DB writes to handle out-of-order events
- Comprehensive logging — Log receipt, queuing, processing, and failures
- Dead letter queues — Capture events that exhaust all retries
Most webhook failures come down to missing one of these. Add them to your integration checklist before going to production.
Top comments (0)