Why Your Webhook Endpoint Keeps Getting Duplicate Events (And How to Fix It)

#webdev #productivity

You checked the logs. The order was fulfilled. The confirmation email was sent. And yet the webhook provider is showing another retry in the delivery history, and now you have two fulfilled orders for one payment. The logic ran twice, and you have no idea when it started happening or how many times it has happened before.

Duplicate webhook event processing is one of those problems that sits invisible in production for weeks or months, then surfaces suddenly as a customer complaint or a billing discrepancy. Understanding exactly why it happens makes it straightforward to prevent.

Why Duplicate Events Happen

Webhook providers retry delivery when your endpoint does not return a successful response within their timeout window. The provider does not know whether the timeout happened because your server was down, because your processing was slow, or because your handler successfully ran but crashed before sending the response. From their perspective, the delivery failed. So they try again.

The specific scenario that causes most duplicate processing is this: your endpoint starts processing a payment event, completes the order fulfillment logic, and then your process crashes, restarts, or hits an uncaught exception before executing res.status(200).send(). The event processing succeeded. The response never went out. The provider retries. You process it again.

This is not a provider bug. Retry-on-failure is a core design feature of reliable event delivery systems. The problem is on your side: your endpoint is processing events without any protection against running the same logic twice.

Photo by Christina Morillo on Pexels

The Two-Part Fix

Solving duplicate events requires two things working together.

1. Return 202 Before You Process Anything

The first change is structural: your endpoint should not process the event synchronously at all. It should validate the incoming request, store the raw payload, and return HTTP 202 immediately. Processing happens in a background worker after the acknowledgment is sent.

This eliminates the window where a processing success can be paired with a failed response. The endpoint stores the event and returns. The provider receives its 202 and stops retrying. A worker handles the actual logic independently.

app.post('/webhooks/orders', async (req, res) => {
  if (!verifySignature(req)) {
    return res.status(401).end();
  }

  await queue.push({
    id: req.body.id,
    type: req.body.type,
    payload: req.body,
    received_at: new Date()
  });

  res.status(202).end(); // Respond before processing
});

Even with this change, the provider might still deliver the same event twice in some edge cases (concurrent retries, network partitions). The 202 pattern reduces the frequency significantly but does not eliminate duplicates entirely. That is where idempotency comes in.

2. Implement Idempotency Using the Event ID

Every webhook provider includes a unique event ID in the payload. Use it. Before your background worker processes any event, check whether that event ID already exists in a processed_events table. If it does, skip it. If it does not, process the event and record the ID.

def process_event(event_id, event_type, payload):
    # Skip if already processed
    existing = db.query(
        "SELECT 1 FROM processed_events WHERE event_id = %s",
        (event_id,)
    ).fetchone()

    if existing:
        return  # Silently skip duplicate

    # Run business logic
    execute_business_logic(event_type, payload)

    # Record completion after success
    db.execute(
        "INSERT INTO processed_events (event_id, processed_at) VALUES (%s, NOW())",
        (event_id,)
    )
    db.commit()

Record the event ID after the business logic succeeds, not before. If you record it first and processing fails, the event is permanently skipped on future retries.

The Most Common Implementation Mistake

Teams that implement idempotency checks often still see occasional duplicates. The most common cause is a race condition in the check-then-insert sequence: two worker processes both query for the event ID, both find nothing, both proceed to process, and both succeed before either inserts the record.

The fix is to make the check and the claim atomic. In PostgreSQL, INSERT ... ON CONFLICT DO NOTHING RETURNING event_id does this in a single operation. If the row already exists, the insert is silently skipped and RETURNING returns nothing, which your code treats as a duplicate signal. Two processes executing this simultaneously will have exactly one succeed and one skip.

This atomic approach is safer than any version of "check, then decide" because the database engine enforces the uniqueness constraint at the storage level, regardless of how many concurrent workers are running.

"Every duplicate event problem I have seen traces back to the same thing: the endpoint was doing too much before it returned a response. Responding fast and processing asynchronously eliminates the window where duplicates happen." - Dennis Traina, 137Foundry

Why This Problem Hides for So Long

Duplicate event processing is easy to miss because the system usually looks correct from the outside. The order was fulfilled - that part worked. The second fulfillment might succeed silently, or it might fail gracefully because the record already exists, or it might produce a visible error that gets swallowed by error handling. None of these outcomes necessarily produce an obvious alert.

The places to look for this problem proactively are your provider's delivery dashboard (do any events show multiple successful deliveries?) and any tables where your webhook logic writes data (are there duplicate records with identical payload IDs?). For payment integrations specifically, reconciling webhook-triggered transactions against payment provider records catches duplicates that application logs miss.

For a complete walkthrough of webhook reliability patterns, including async processing, idempotency, dead letter queues, and monitoring lag, the guide on building webhook integrations that handle failures gracefully covers the full architecture. 137Foundry works with engineering teams on exactly these integration challenges, particularly for systems where duplicates have financial or operational consequences.

Photo by luis gomes on Pexels

Testing Your Idempotency Implementation

Once you have idempotency in place, it is worth verifying it works correctly before you rely on it in production. The test is straightforward: trigger a real event from your provider's dashboard, let your handler process it, then use your provider's replay feature or a tool like Postman to resend the exact same event to your endpoint.

Your handler should return 200 or 202 on the second delivery without running any business logic. Check that no duplicate records were created in your database and that the idempotency table contains exactly one entry for the event ID. If your duplicate check works correctly, the second delivery is a no-op at the business logic level.

Also test the race condition path: send two deliveries of the same event in rapid succession before either has been processed. Both requests should resolve cleanly, with exactly one processed and one skipped. If you are using a read-then-write check rather than an atomic insert, this test will expose the race condition - you may see both requests proceed to processing. An atomic INSERT ... ON CONFLICT eliminates this window entirely.

A Quick Checklist Before Your Next Webhook Goes Live

Before any webhook integration reaches production, verify these three things:

Your endpoint returns 202 (or 200) immediately without waiting for processing to complete
Your processing logic checks the event ID before running any business logic
Your idempotency insert uses an atomic operation (ON CONFLICT or equivalent) rather than a separate read-then-write

These three changes transform a webhook handler from brittle to resilient. They take less than a day to implement correctly, and they prevent a category of production incidents that is genuinely difficult to debug after the fact.