A payment webhook fires once. You miss it. The customer thinks they paid. Your dashboard says they didn't.
Welcome to my Tuesday morning, two years ago.
I've shipped four payment webhook systems in my career. The first three taught me everything I now refuse to do again. The fourth — the one running inside Atoa today — handles open banking payment notifications across our Node.js services without a single missed event in the last 14 months.
Here's the boring, opinionated, production-tested pattern.
The lie webhooks tell you
Every payment platform sells webhooks the same way:
"We'll notify your endpoint the moment the payment status changes."
What they don't sell you on:
- Webhooks retry. Sometimes 8 times. Sometimes never.
- Webhooks arrive out of order.
failedcan land beforepending. - Webhooks lie about idempotency. Two
succeededevents for the same payment is normal, not a bug. - Webhooks drop. Network blip, your pod restart, a bad DNS lookup — one missed delivery and your reconciliation is wrong.
If your webhook handler is a 30-line controller that updates a row in your database, you don't have a payment system. You have a hope.
The four-layer pattern
Every webhook flow we run at Atoa has four layers. Skip any one and you'll be reconciling spreadsheets at midnight.
1. Verify the signature before you parse the body
The most common bug I see in code reviews from junior devs: parsing the JSON before checking the HMAC.
// webhook.controller.ts
@Post('atoa')
async handle(
@Headers('x-atoa-signature') signature: string,
@RawBody() body: Buffer, // raw, not parsed
) {
if (!this.crypto.verify(body, signature, this.secret)) {
throw new UnauthorizedException();
}
const event = JSON.parse(body.toString());
await this.queue.enqueue(event);
return { received: true };
}
Two non-negotiables:
- Use the raw body for HMAC verification. NestJS's default JSON parser will mutate whitespace and break your signature check. Enable
rawBody: trueon the app. - Reject before you do anything else. No DB hits, no logging the payload at info level, nothing.
2. Acknowledge fast. Process slow.
The webhook controller does two things: verify, enqueue. That's it.
async handle(...) {
// verify (above)
await this.queue.enqueue('payment.webhook', event);
return { received: true }; // 200 within ~50ms
}
If your handler takes 8 seconds because you're hitting Stripe + your DB + sending an email, the sender will time out and retry. Now you have two events. Then four. Then the on-call engineer.
We use BullMQ on Redis. You can use SQS, NATS, Kafka — pick your poison. The point is: the HTTP response is decoupled from the work.
3. Idempotency keys are not optional
Every event has an event_id. Before you do anything in your worker:
@Processor('payment.webhook')
export class WebhookProcessor {
async process(job: Job<WebhookEvent>) {
const { event_id, payment_id, status } = job.data;
const seen = await this.events.firstSeen(event_id);
if (!seen) {
this.logger.log(`Duplicate event ${event_id} — skipping`);
return;
}
await this.applyStatus(payment_id, status, event_id);
}
}
firstSeen is a write to a Postgres table with event_id as the primary key. If the insert succeeds, this is the first time we've seen this event. If it conflicts, we've processed it before. No race conditions, no Redis dance — just let the database do the work it's good at.
4. State machines, not status updates
This is the one that took me three failed payment systems to learn.
A payment doesn't have a "status field." It has a state machine. Some transitions are legal. Most aren't.
const ALLOWED: Record<PaymentStatus, PaymentStatus[]> = {
initiated: ['authorising', 'failed'],
authorising: ['succeeded', 'failed'],
succeeded: [], // terminal
failed: [], // terminal
};
async applyStatus(id: string, next: PaymentStatus, eventId: string) {
const payment = await this.payments.findById(id);
if (!ALLOWED[payment.status].includes(next)) {
this.logger.warn(`Illegal transition: ${payment.status} → ${next}`);
return; // do not update, do not throw — this is normal
}
await this.payments.transition(id, next, eventId);
}
Why this matters: when failed arrives before pending (and it will), your code shouldn't downgrade a succeeded payment to failed. With a state machine, the invalid transition is dropped. The reconciler picks it up later. The customer's payment stays correct.
What we'd never do again
Three patterns I see in the wild that I had to unlearn:
- Polling instead of webhooks. "We'll just check the status every 30 seconds." Sure — and you'll burn rate limits, miss the 5-second window where a customer is staring at the spinner, and pay for compute that does nothing 99% of the time.
- Replaying webhooks by re-running the handler. If the handler does five things, replaying it does five things again. Idempotency keys mean replays are free.
- Logging the full payload at info level. PSD2 says your logs are PII now. Log the event_id and the status. Nothing else.
Where this gets you
We process open banking payment notifications across dozens of UK merchants on this exact pattern. Zero missed events in 14 months. Reconciliation runs once a day and finds nothing to reconcile.
The pattern doesn't care which payment provider you use. Stripe, GoCardless, Atoa — same four layers.
If you want to see what these webhooks look like on the open banking side, our API docs walk through the full payment lifecycle and the webhook events we fire: docs.atoa.me. Sandbox is free, no card needed.
Build the boring layers first. Sleep through Tuesday mornings.
Arun is co-founder & CTO of Atoa, a UK open banking payments platform. He's @mickyarun on X and dev.to. Driven by passion.
Top comments (0)