Dhruv Khara

Posted on Jan 14

The Race Condition You're Probably Shipping Right Now With Stripe Webhooks

#stripe #bullmq #typescript #architecture

A real-world case study on eliminating duplicate payments and race conditions in Stripe webhook architecture.

TL;DR: Stop doing eager sync. Seriously. Webhooks should only verify and enqueue—nothing else. Let a single worker handle all writes. 217 lines replaced 3,769.

By “single worker,” I mean a single logical writer per Stripe object, enforced via queue partitioning — not literally one process.

The Problem: Simple Payments, Complex Failures

A user buys credits. Payment succeeds. They get credited twice.

Another user subscribes. Webhook arrives late. Their subscription shows "pending" for 10 minutes while they refresh angrily.

A third user? Their purchase disappears entirely after a Redis blip.

We thought Stripe webhooks were simple. We were wrong.
Here's how 3,769 lines of "helpful" code created a race condition that could take down our payment system—and the boring fix that solved everything.

What We Were Building

Our platform serves a lot of users. We process payments through Stripe:

Credit purchases (one-time payments)
Subscriptions (recurring billing)

The flow seems straightforward:

Create a checkout session
Wait for payment confirmation
Credit the user's account

What could go wrong? Everything.

Inline Webhook Processing: The "Simple" Approach That Backfires

Like most teams, we started with the obvious approach:


// webhooks.controller.ts - The "simple" approach

app.post('/webhooks/stripe', async (req, res) => {
  // 1. Verify the webhook signature
  const event = stripe.webhooks.constructEvent(
    req.body,
    req.headers['stripe-signature'],
    process.env.STRIPE_WEBHOOK_SECRET
  );

  // 2. Process the event inline
  switch (event.type) {
    case 'checkout.session.completed':
      await handleCheckoutCompleted(event.data.object);
      break;
    case 'customer.subscription.created':
      await handleSubscriptionCreated(event.data.object);
      break;
    case 'invoice.payment_succeeded':
      await handleInvoicePayment(event.data.object);
      break;
    case 'charge.refunded':
      await handleRefund(event.data.object);
      break;
    // ... 15 more event types
  }

  // 3. Return 200 to acknowledge receipt
  res.status(200).json({ received: true });
});

Looks clean. Ships fast. Breaks at scale.

The Hidden Assumptions

This code assumes:

Assumption	Reality
Processing is fast	Stripe times out at 30s. Our handlers took 35s during traffic spikes.
Dependencies are available	Redis goes down. We crash. No 200. Stripe retries. Duplicate processing.
Order doesn't matter	invoice.payment_succeeded arrives before subscription.created. Handler fails.
We won't crash mid-processing	Commit to MongoDB, crash, no 200. Stripe retries. Duplicate credit.

Did you know? Stripe's retry schedule is: 1 hour, 6 hours, 48 hours. That "duplicate" you see at 3am? It's the retry from yesterday's timeout. Debugging webhook issues feels like time travel.

The Consequences

For a huge users base, these "edge cases" became daily incidents:

Duplicate credits from retry storms
Missing subscriptions from out-of-order events
Timeouts during high-traffic periods
Customers seeing inconsistent balances

The webhook controller bloated trying to handle every edge case inline.

The "Eager Sync" Optimization That Made Everything Worse

To improve UX, we added eager synchronization. The idea: don't make users wait for webhooks.

When a user completes checkout and returns to our app, we immediately check with Stripe:


// checkout-return.controller.ts - The "eager" approach

app.get('/checkout/return', async (req, res) => {
  const { session_id } = req.query;

  // Fetch the session from Stripe
  const session = await stripe.checkout.sessions.retrieve(session_id);

  // If payment succeeded, process immediately
  if (session.payment_status === 'paid') {
    await syncCheckoutSession(session);  // Credit the user NOW
  }

  res.redirect('/dashboard?purchase=success');
});

Instant feedback. Happy users. Right?

Why Eager Sync Felt Right

Instant feedback. User sees credits immediately, not "pending."
No webhook delays. Webhooks can lag by seconds or minutes.
Handles webhook failures. If our endpoint is down, eager sync still works.

The Hidden Problem: Two Writers, One Race Condition

Now we had two systems processing the same payment:


Timeline A (User is fast):
  0ms   - User completes payment
  100ms - User redirected to /checkout/return
  150ms - Eager sync processes payment ✓
  500ms - Webhook arrives
  550ms - Webhook processes payment ✓ (DUPLICATE!)

Timeline B (Race condition):
  0ms   - User completes payment
  50ms  - Webhook arrives, starts processing
  60ms  - User redirected to /checkout/return
  70ms  - Eager sync checks "already processed?" → No (webhook hasn't committed yet)
  80ms  - Eager sync starts processing
  90ms  - Webhook commits transaction
  100ms - Eager sync commits transaction (DUPLICATE!)

The idempotency check didn't help because both systems checked before either committed.

Three Failed Fixes (And Why They Failed)

Fix #1: Database Locks


async function syncCheckoutSession(session: Stripe.Checkout.Session) {
  const lock = await acquireLock(`checkout:${session.id}`);
  try {
    const existing = await Transaction.findOne({ stripeSessionId: session.id });
    if (existing) return;
    await creditUserAccount(session);
  } finally {
    await releaseLock(lock);
  }
}

Failed because: Distributed locks across two different code paths are error-prone. Lock contention, deadlocks, and expiration issues.

Fix #2: Unique Constraints


const transactionSchema = new Schema({
  stripeSessionId: { type: String, unique: true }
});

Failed because: Prevents duplicates but creates partial failures. Writer A creates transaction, crashes before crediting wallet. Writer B sees transaction exists, skips everything. User has record but no credits.

Fix #3: Redis Idempotency Keys


const wasSet = await redis.set(idempotencyKey, '1', 'NX', 'EX', 3600);
if (!wasSet) return; // Another process handling this

Failed because: Crash after setting key but before processing = payment stuck forever. Added cleanup jobs, TTLs, state tracking. Complexity exploded.

After three failed fixes, and one very long postmortem, we asked a different question.

The Root Cause: Two Writers, One Race Condition

We were solving the wrong problem.

The issue wasn't "how do we coordinate two writers?"

The issue was "why do we have two writers?"

The Two Generals Problem (1975): Two systems cannot reliably agree on shared state over an unreliable network. This is a proven impossibility in distributed systems. Our eager sync was literally trying to solve an unsolvable problem.

The fix? Don't have two generals. Have one general (the queue worker) and one messenger (the webhook endpoint).

Eager sync existed because we didn't trust webhooks. But instead of fixing webhook reliability, we added a second system that made everything worse.

Counter-intuitive: Showing "Processing..." for 2 seconds feels faster than showing "Success!" immediately and then correcting to "Actually, duplicate." Users trust systems that appear deliberate, not systems that appear to lie.

Queue-Based Webhook Processing with BullMQ

The Core Principles

Principle	Implementation
Webhooks are source of truth	Frontend only reads state, never writes
Webhook handlers do one thing	Verify signature, queue event, return 200
Single writer processes events	Worker with idempotency (actually works now)

The New Architecture

┌────────────────────────────────────────────────┐
│                   BEFORE                       │
├────────────────────────────────────────────────┤
│                                                │
│   Stripe                    User               │
│     │                         │                │
│     ▼                         ▼                │
│  [Webhook]             [Checkout Return]       │
│     │                         │                │
│     ▼                         ▼                │
│  [Verify + Process]      [Eager Sync]          │
│  (3,769 lines)                │                │
│     │                         │                │
│     └───────────┬─────────────┘                │
│                 │                              │
│                 ▼                              │
│        ┌────────────────┐                      │
│        │  SAME WALLET!  │ ← Both race here     │
│        └────────────────┘                      │
│                                                │
└────────────────────────────────────────────────┘

┌────────────────────────────────────────────────┐
│                    AFTER                       │
├────────────────────────────────────────────────┤
│                                                │
│   Stripe                    User               │
│     │                         │                │
│     ▼                         ▼                │
│  [Webhook]             [Checkout Return]       │
│     │                         │                │
│     ▼                         ▼                │
│  [Verify]─▶[Queue]─▶200  [Poll]─▶Dashboard     │
│               │           (read-only!)         │
│               ▼                                │
│        ┌────────────┐                          │
│        │Redis Queue │                          │
│        │  (BullMQ)  │                          │
│        └─────┬──────┘                          │
│              │                                 │
│              ▼                                 │
│        ┌────────────┐                          │
│        │   Worker   │ ← Single writer          │
│        │(217 lines) │                          │
│        └─────┬──────┘                          │
│              │                                 │
│              ▼                                 │
│        ┌────────────┐                          │
│        │   Wallet   │ ← No race                │
│        └────────────┘                          │
│                                                │
└────────────────────────────────────────────────┘

The New Webhook Controller (47 Lines)


// webhooks.controller.ts

import { queueService } from '@/shared/infrastructure/queue';
import { verifyStripeSignature } from './webhooks.service';

app.post('/webhooks/stripe', async (req, res) => {
  try {
    // 1. Verify signature (ONLY job of this endpoint)
    const event = verifyStripeSignature(req);

    // 2. Queue the event for async processing
    const { queued } = await queueService.addStripeEvent(event);

    // 3. Acknowledge receipt immediately
    return res.status(200).json({ received: true, queued });

  } catch (error) {
    if (error instanceof StripeSignatureVerificationError) {
      return res.status(400).json({ error: 'Invalid signature' });
    }

    // Redis/queue failure - return 503 so Stripe retries later
    logger.error('Webhook queue failure', { error });
    return res.status(503).json({ error: 'Service unavailable' });
  }
});

Verify. Queue. Return 200. That's it.

The Queue Service (217 Lines)

// queue.service.ts

import { Queue } from 'bullmq';

const QUEUES = {
  CREDIT_PURCHASE: new Queue('credit-purchase', { connection: redis }),
  SUBSCRIPTION: new Queue('subscription', { connection: redis }),
};

// Declarative event routing
const EVENT_QUEUE_MAP: Record<string, keyof typeof QUEUES> = {
  'checkout.session.completed': 'CREDIT_PURCHASE',
  'charge.refunded': 'CREDIT_PURCHASE',
  'customer.subscription.created': 'SUBSCRIPTION',
  'customer.subscription.updated': 'SUBSCRIPTION',
  'invoice.payment_succeeded': 'SUBSCRIPTION',
};

export async function addStripeEvent(
  event: Stripe.Event
): Promise<{ queued: boolean }> {
  const queueName = EVENT_QUEUE_MAP[event.type];

  if (!queueName) {
    logger.debug(`Unhandled event type: ${event.type}`);
    return { queued: false };
  }

  await QUEUES[queueName].add(event.type, event, {
    jobId: event.id,  // BullMQ deduplicates: same ID = no-op
    priority: queueName === 'CREDIT_PURCHASE' ? 1 : 5,
    attempts: 3,
    backoff: { type: 'exponential', delay: 1000 },
    removeOnComplete: { age: 86400 * 3 },  // Keep for 3 days (Stripe retry window)
  });

  return { queued: true };
}

Priority routing: credits process before subscriptions. Automatic retries with backoff.

Gotcha: BullMQ's jobId deduplication only works while the job exists in Redis. Once completed/removed, the same jobId can be re-added. Set removeOnComplete: { age: 86400 * 3 } to match Stripe's 3-day retry window, or your database idempotency check becomes the real safety net.

The Worker (Single Writer)


// credit-purchase.worker.ts

const worker = new Worker('credit-purchase', async (job) => {
  const event = job.data as Stripe.Event;

  switch (event.type) {
    case 'checkout.session.completed':
      await processCheckoutCompleted(event.data.object);
      break;
    case 'charge.refunded':
      await processRefund(event.data.object);
      break;
  }
}, { connection: redis, concurrency: 5 });

async function processCheckoutCompleted(session: Stripe.Checkout.Session) {
  // Idempotency check - NOW works because we're the only writer
  const existing = await Transaction.findOne({ stripeSessionId: session.id });
  if (existing) {
    logger.info('Already processed', { sessionId: session.id });
    return;
  }

  await creditUserAccount(session);
  await createTransaction(session);
  await sendConfirmationEmail(session);
}

Single writer = idempotency checks actually work.

Frontend: Read-Only Status Polling


// Checkout return - no more eager sync
app.get('/checkout/return', (req, res) => {
  res.redirect(`/dashboard?session_id=${req.query.session_id}`);
});

// Status endpoint - read only
app.get('/checkout/status', async (req, res) => {
  const transaction = await Transaction.findOne({
    stripeSessionId: req.query.session_id
  });

  return res.json({
    status: transaction ? 'completed' : 'pending',
    credits: transaction?.credits
  });
});


// React hook - polls until complete
export function useCheckoutPolling(sessionId: string | null) {
  return useQuery({
    queryKey: ['checkout-status', sessionId],
    queryFn: () => api.get(`/checkout/status?session_id=${sessionId}`),
    enabled: !!sessionId,
    refetchInterval: (data) =>
      data?.status === 'completed' ? false : 2000,
  });
}

Frontend polls status. Webhook is the only writer. No race condition.

The Results

Metric	Before	After
Webhook controller	3,769 lines	47 lines
Queue routing	N/A	217 lines
Duplicate transactions	Daily	Zero
Stripe timeouts	During traffic spikes	None
Debugging time	Hours	Minutes (queue inspection)
Race conditions	Constant	Eliminated

What This Doesn't Handle (Honest Assessment)

Limitation	Mitigation
Queue going down (Redis failure)	Return 503 → Stripe retries for up to 3 days
Poison messages (always fail)	Dead-letter after 3 attempts + alerting
Event ordering	Handlers are idempotent, check current state
Worker crashes mid-processing	Job returns to queue, next attempt reprocesses
Signature verification failures	Alert on failure rate > threshold (possible replay attack)

Key Takeaways

Two writers = race condition. Not redundancy. Coordination nightmare.
Stop doing eager sync. If you don't trust your webhooks, fix your webhooks—don't add another writer.
Webhooks should only enqueue. Verify signature. Queue event. Return 200. That's it. Nothing else.
Idempotency needs single writers. findOne → create isn't atomic.
When in doubt, queue it. Free retries, backpressure, observability.

The Diff


 src/features/billing/webhooks/webhooks.controller.ts |  -156 lines
 src/features/billing/webhooks/webhooks.service.ts    | -3,613 lines
 src/shared/infrastructure/queue/queue.service.ts    |   +217 lines

 47 files changed, 368 insertions(+), 3793 deletions(-)

The best code is the code you delete.

Irony: 217 lines is approximately the length of a single well-commented function in our old codebase. The entire queue architecture is smaller than the error handling we needed for one edge case.

Your Turn

If your webhook handler has more than 100 lines, you're probably doing too much inline.

Action items:

Count your webhook handler lines (be honest)
List every place that writes payment state
If you have two writers, pick one

The queue-based approach took 2 weeks to implement, but the resulting race conditions took exponentially longer to diagnose. Choose your battles.

Coming Next: The Wallet Race Condition

Fixing webhook duplicates was just the beginning.
We had another bug. A nastier one:

User A: Add 100 credits    (reads balance: 50, writes: 150)
User B: Deduct 30 credits  (reads balance: 50, writes: 20)
// Both operations race on the same balance
// Final balance: 20 or 150, depending on who commits last
// Correct answer: 120

Multiple concurrent writes and deducts. Credits being added and removed simultaneously. The classic lost update problem—and it happens even with a single writer queue when different event types modify the same resource.

The naive fixes that failed:

Mutex locks (deadlocks, performance cliffs)
Optimistic locking (retry storms under load)
Read-then-write patterns (the race IS the read-then-write)

The actual fix: Atomic balance operations that never read before writing.

// Wrong - read then write
const balance = await getBalance(userId);
await setBalance(userId, balance + credits);

// Right - atomic increment
await Wallet.updateOne(
  { userId },
  { $inc: { balance: credits } }
);

//There is a even better solution that we implemented (next post 🤫)

But it gets more complex with validation (can't go negative), multi-currency, and audit trails.

Next post: How to Fix Wallet Race Conditions: Atomic Operations Without Losing Your Audit Trail

How we made wallet operations atomic without sacrificing the ability to validate, audit, and roll back.

DEV Community