Nishil Bhave

Posted on Apr 23 • Edited on May 26 • Originally published at maketocreate.com

Idempotency Key Guide for APIs, Webhooks, and Retries

#idempotency #apidesign #webhooks #firestore

What Is an Idempotency Key? How to Implement Idempotent APIs and Webhooks

You ship a payment flow. The provider sends a webhook. Your app does the work, but the network drops before the 200 gets back. The provider retries. Your handler runs again. Now the customer gets two credits, two emails, or two paid seats. That is the bug idempotency is supposed to kill.

An idempotency key is not abstract HTTP trivia. It's a concrete way to say, "If this exact operation shows up again, treat it as the same request, not a new one." If you build APIs, webhooks, or payment integrations, you need that guarantee early, not after the first duplicate incident.

Key Takeaways

An idempotency key is a stable identifier for one logical operation, not one transport attempt.

The pattern needs three pieces: a stable key, a dedup store with TTL, and an atomic claim step before side effects.

Verify webhook signatures first, then claim the key transactionally, then run the handler, then return 200.

Stripe's Idempotency-Key docs are the clearest industry reference point.

If you want the broader request-hardening context around signature checks and origin trust, start with API security layers including request authenticity and backend hardening.

What Is an Idempotency Key, Really?

Stripe's API docs define the pattern cleanly: clients send a unique key with a POST, Stripe stores the first result for that key, and later retries return the same outcome instead of doing the work again (Stripe, 2026). That is the practical definition you should keep in your head.

An idempotency key is a stable identifier for one logical mutation. The request may arrive one time or five times. The server should still apply the state change once.

That's why "same payload" is not always enough. Two identical-looking POST /charges requests might be accidental retries, or they might be two real purchases. The key tells the server which interpretation is correct.

For client-driven APIs, the key usually comes from the caller. For provider-driven webhooks, the key usually comes from the event ID or a deterministic identifier derived from the event payload. Either way, the job is the same: collapse retried delivery attempts into one side effect.

Why Does Idempotency Matter in Production?

Lemon Squeezy retries failed webhook deliveries up to three more times with exponential backoff, using intervals such as 5 seconds, 25 seconds, and 125 seconds, until your endpoint returns 200 (Lemon Squeezy, 2026). That behavior is correct. Your system has to be correct too.

Here are the failures idempotency prevents:

Duplicate charges or credits on webhook retry. Your first handler run succeeds, but the response is lost. The retry arrives and performs the same state change again.
Double-created resources after client timeout. A mobile app times out waiting for POST /orders, retries, and now you've created two orders for one tap.
Race conditions between workers. Two processes receive the same message near-simultaneously, both check "not seen yet," and both proceed.
Replay of previously accepted payloads. Idempotency does not replace signature verification, but it does stop the same signed event from being accepted repeatedly during your retention window.

What catches teams is that these bugs are intermittent. You won't see them in a happy-path local demo. You'll see them when latency spikes, a load balancer retries, or the provider redelivers events under pressure. That's why the pattern matters.

Most duplicate-processing bugs are not caused by "bad providers." They come from an application that treats delivery attempts as business events. Those are different things.

For the database angle behind atomic claim logic, see ACID transactions and atomic check-then-write behavior.

Where Do You Actually Need Idempotency?

RFC 7231 defines GET, HEAD, OPTIONS, and TRACE as safe methods, and it defines safe methods plus PUT and DELETE as idempotent methods by HTTP semantics (RFC 7231). POST is not idempotent by default. That's where most real bugs live.

You usually need an explicit idempotency pattern in these places:

POST endpoints that create orders, invoices, subscriptions, tasks, or users.
Some PATCH endpoints when the caller may retry a mutation and you need "apply once" semantics, not "re-apply blindly."
Webhook handlers because providers generally use at-least-once delivery.
Payment flows where duplicate execution has obvious customer impact.
Background jobs and queue consumers because "at least once" delivery shows up there too.

You usually do not need a custom idempotency key for ordinary GET requests. They are already idempotent by HTTP semantics. But "idempotent" and "safe" are not the same thing. DELETE is idempotent because deleting the same resource twice leaves the server in the same end state, even though it is definitely not read-only.

That distinction matters in API design. When developers say "make this endpoint idempotent," they usually mean "make retries safe for state changes," not "turn it into a safe method."

What Is the Core Pattern?

Firestore transactions run all reads before writes and automatically retry if a concurrently modified document invalidates the transaction's read set (Google Cloud, 2026). That is exactly the kind of atomic guard you want around duplicate suppression.

The pattern has three parts:

A stable unique key
For client APIs, that might be a caller-generated UUID. For webhooks, it should be a provider event ID if one exists. If the provider doesn't give you one, derive a deterministic key from fields that identify the logical event, not the raw transport attempt.
A dedup store with TTL
Store seen keys in a durable place such as Redis, Postgres, DynamoDB, or Firestore. Add a TTL so the store doesn't grow forever. Stripe notes that idempotency keys can be pruned after they are at least 24 hours old (Stripe, 2026). Your own window should match the provider's retry behavior plus your operational replay needs.
A transactional claim wrapper
Do an atomic "if key does not exist, create it and continue" step before the handler performs side effects. If the key already exists, you treat the request as a retry and short-circuit safely.

If you skip any one of those pieces, the pattern falls apart. A key without durable storage is memory. Storage without TTL becomes a forever-growing audit table. A dedup check without a transaction is a race condition waiting to happen.

What Does the Webhook Flow Look Like?

Lemon Squeezy signs webhook payloads with an HMAC-SHA256 digest in the X-Signature header, calculated from the raw body and your signing secret (Lemon Squeezy, 2026). That means signature verification is not optional ceremony. It is the first gate.

The sequence should look like this:

sequenceDiagram
    participant LS as Lemon Squeezy
    participant API as Your webhook endpoint
    participant FS as Firestore
    participant H as Business handler

    LS->>API: POST webhook + raw body + X-Signature
    API->>API: Verify HMAC signature
    API->>API: Build stable webhook_id
    API->>FS: Transactional claim(webhook_id)
    alt Already claimed
        FS-->>API: duplicate
        API-->>LS: 200 OK
    else New claim
        FS-->>API: claimed
        API->>H: Process event once
        H-->>API: success
        API->>FS: Mark processed
        API-->>LS: 200 OK
    end

The ordering matters more than the syntax:

Verify the signature before touching your dedup store.
Claim the key before side effects.
Return 200 only when you are satisfied the event was already processed or has just been processed successfully.

If you need a bigger payments context around retry-heavy flows, see payment gateway tradeoffs and integration concerns for developers.

How Do You Implement It With Python and Firestore?

Firestore's Python client exposes a @firestore.transactional decorator, and the docs call out an important detail: transaction functions may run more than once when there is contention (Google Cloud Python docs, 2026). That means the transaction function should claim metadata only. Do not put external side effects inside it.

Here's a production-friendly pattern for Lemon Squeezy webhooks:

import hashlib
import hmac
import json
from datetime import datetime, timedelta, timezone

from google.cloud import firestore
from flask import Request

db = firestore.Client()
WEBHOOK_SECRET = "replace-me"
TTL_DAYS = 7

def verify_lemon_squeezy_signature(raw_body: bytes, signature: str, secret: str) -> None:
    expected = hmac.new(secret.encode("utf-8"), raw_body, hashlib.sha256).hexdigest()
    if not hmac.compare_digest(expected, signature or ""):
        raise ValueError("invalid Lemon Squeezy signature")

def build_webhook_id(payload: dict) -> str:
    """
    Prefer a provider event id if one exists.
    Lemon Squeezy's basic webhook docs emphasize event name + resource payload,
    so this example derives a stable key from fields that identify the mutation.
    """
    event_name = payload["meta"]["event_name"]
    resource_id = payload["data"]["id"]
    updated_at = payload["data"]["attributes"].get("updated_at", "")
    return f"ls:{event_name}:{resource_id}:{updated_at}"

@firestore.transactional
def claim_webhook(
    transaction: firestore.Transaction,
    claim_ref,
    *,
    webhook_id: str,
    event_name: str,
    received_at: datetime,
    expires_at: datetime,
) -> bool:
    snapshot = claim_ref.get(transaction=transaction)
    if snapshot.exists:
        return False

    transaction.set(
        claim_ref,
        {
            "webhook_id": webhook_id,
            "event_name": event_name,
            "status": "processing",
            "received_at": received_at,
            "expires_at": expires_at,  # Firestore native TTL field
        },
    )
    return True

def mark_processed(claim_ref) -> None:
    claim_ref.update(
        {
            "status": "processed",
            "processed_at": datetime.now(timezone.utc),
        }
    )

def rollback_claim(claim_ref) -> None:
    # Simple recovery path: let provider retries try again on handler failure.
    # If your handler triggers irreversible external side effects, replace this
    # with a state machine or outbox pattern instead of deleting the claim.
    claim_ref.delete()

def handle_lemonsqueezy_event(payload: dict) -> None:
    event_name = payload["meta"]["event_name"]

    if event_name == "order_created":
        # Put your real business logic here:
        # - provision account access
        # - write billing records
        # - enqueue downstream jobs
        pass

def lemonsqueezy_webhook(request: Request):
    raw_body = request.get_data()
    signature = request.headers.get("X-Signature", "")

    # 1. Verify authenticity first.
    verify_lemon_squeezy_signature(raw_body, signature, WEBHOOK_SECRET)

    payload = json.loads(raw_body)
    event_name = payload["meta"]["event_name"]
    webhook_id = build_webhook_id(payload)

    # 2. Transactionally claim the event.
    now = datetime.now(timezone.utc)
    expires_at = now + timedelta(days=TTL_DAYS)
    claim_ref = db.collection("webhook_claims").document(webhook_id)

    claimed = claim_webhook(
        db.transaction(),
        claim_ref,
        webhook_id=webhook_id,
        event_name=event_name,
        received_at=now,
        expires_at=expires_at,
    )

    if not claimed:
        return {"status": "duplicate_ignored"}, 200

    # 3. Process once.
    try:
        handle_lemonsqueezy_event(payload)
    except Exception:
        rollback_claim(claim_ref)
        raise

    # 4. Mark success and acknowledge delivery.
    mark_processed(claim_ref)
    return {"status": "ok"}, 200

What should you notice here?

Signature verification happens first. You do not want unsigned garbage burning write capacity in your dedup collection.
The transaction only claims metadata. Firestore may retry the transaction function, so keep it free of side effects.
The business handler runs after claim. That is what makes the whole flow idempotent.
expires_at is a native TTL field. Firestore TTL policies automatically delete expired documents, though Google notes deletion is typically within 24 hours after expiration and is not instantaneous (Google Cloud, 2026).

In a real webhook system, the hardest part is rarely "how do I hash a key?" It's deciding what counts as the same business event when the provider doesn't hand you a perfect event ID.

What Mistakes Break the Pattern?

Stripe saves the first result associated with an idempotency key and replays that result for later retries of the same request (Stripe, 2026). That only works because the idempotency decision happens before the mutation, not after it.

The most common failures are predictable:

Checking outside the transaction
Two workers both read "missing," then both write. That's the classic race.
Checking after processing
If the email, charge, or provisioning step already happened, the dedup check is too late.
Processing before signature verification
This lets forged requests pollute your dedup store and maybe trigger business logic.
Using a TTL shorter than the retry window
If the key expires while the provider can still retry, you have recreated the duplicate bug you thought you solved.
Using unstable keys
Timestamps that change per delivery attempt, random UUIDs generated server-side for webhooks, or payload hashes over fields that legitimately vary between retries will all break deduplication.
Assuming TTL deletion is instant
Firestore TTL is automatic, but not synchronous. Expired documents can still appear before cleanup completes.

If you only remember one rule, remember this: idempotency is an ordering problem first and a storage problem second.

Why Is Stripe the Canonical Reference?

Stripe's docs are still the best public explanation of the pattern because they make the contract explicit: the client provides an Idempotency-Key, Stripe saves the first result, and identical retries get the same result back (Stripe, 2026). That is the industry-standard mental model.

Your implementation does not need to copy Stripe feature-for-feature. You probably won't store full response bodies for every internal webhook. But Stripe gets the core idea exactly right:

one logical mutation
one stable key
one stored first result or processing record
safe retries after transport failure

For external provider webhooks, you often cannot demand a header like Stripe's. So you adapt the same idea to the provider's event model instead.

What Should You Think About in Production?

Google's Firestore TTL docs note that expired documents are usually deleted within 24 hours after expiration, not at the exact expiration timestamp (Google Cloud, 2026). That small implementation detail affects real operating decisions.

Here is the production checklist I care about most:

Pick TTL based on retry reality, not gut feel.
Lemon Squeezy's documented retry window is short, but operationally you may still want several days of retention for manual redelivery, replay debugging, and lagging downstream jobs. Seven days is a practical default. For client-supplied idempotency keys, 24 hours is a reasonable minimum because Stripe explicitly documents pruning after that point.
Alert on duplicate-claim spikes.
A sudden jump in duplicate claims usually means an upstream timeout, a slow handler, or a networking issue. Hookdeck's webhook metrics guidance recommends watching delivery success rate, retry rate, and average attempts per event because these are early stress signals (Hookdeck, 2026).
Shard the dedup collection if it gets hot.
If all writes land in one narrow keyspace, you can create hot ranges. Prefix the document ID with a short hash or date bucket when volume gets high, for example 3f/ls:order_created:12345:....
Separate "claimed" from "processed" when the workflow gets more complex.
The example above is enough for many webhook handlers. For irreversible side effects or long-running work, move to a small state machine or outbox pattern instead of relying on a simple claim document.
Log the business identifier with the claim.
Store event name, provider object ID, and maybe tenant/store ID. When duplicates spike, you want to answer "which events?" immediately.

tradeoffs when choosing backend infrastructure and managed data stores

Frequently Asked Questions

What is idempotency?

Idempotency means repeating the same request should leave the server in the same end state as running it once. RFC 7231 defines idempotent HTTP methods as methods whose intended effect is unchanged by multiple identical requests, even if response details differ (RFC 7231, 2026).

Are GET requests idempotent?

Yes. GET is both safe and idempotent under RFC 7231 because it is defined as read-only from the client's perspective (RFC 7231, 2026). You normally do not need a custom idempotency key for ordinary GET requests.

What is the difference between idempotent and safe methods?

Safe methods are read-only by intent. Idempotent methods can change state, but doing them multiple times has the same intended effect as doing them once. DELETE is the classic example: it changes state, so it is not safe, but deleting the same resource twice is still idempotent by HTTP semantics (RFC 7231, 2026).

The Practical Rule to Keep

If a request or event can be retried, it must have a stable identity. If it changes state, that identity must be claimed atomically before the side effect runs. Everything else is implementation detail.

That is the idempotency pattern in one sentence.

When you implement it, keep the order brutally simple: verify authenticity, derive the stable key, claim it transactionally, run the handler once, then acknowledge success. If you do that consistently, duplicate charges, duplicate provisioning, and webhook replay bugs get much harder to ship.

API security layers that complement webhook signature verification
database transaction fundamentals behind atomic claim logic
payment integration tradeoffs for developers shipping billing systems

DEV Community