Gabriel Anhaia

Posted on Apr 27

Idempotency at Scale: The Pattern That Prevents Double-Charging

#architecture #distributedsystems #microservices #tutorial

Book: System Design Pocket Guide: Fundamentals
Also by me: Event-Driven Architecture Pocket Guide
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

A user taps "Pay $48.20" on flaky hotel Wi-Fi. The request leaves the phone, hits the API, the server charges the card, the response packet drops on the way back. The phone sees a timeout, the user taps again. Two charges land. Customer support spends 14 minutes refunding one. Multiply by every retry on every request a payments system handles in a day.

Idempotency is the load-bearing pattern that keeps that user from being double-charged, and the same pattern that keeps a write-heavy distributed system from creating duplicate orders, duplicate emails, or duplicate outbox jobs. The shape of the fix is well-understood. The details (where the key comes from, how long it lives, what you store, the intent vs result contract) are where most implementations are subtly wrong.

The contract, in one paragraph

A client generates a unique key per logical request (typically a UUIDv4) and sends it in a header (Idempotency-Key). On first arrival the server executes the operation and persists the result keyed by that key. On subsequent arrivals within a TTL window, the server returns the stored result instead of executing again. If the request body differs from the original under the same key, the server rejects with 422.

That paragraph contains four decisions, each with a wrong answer most teams reach for first.

Where keys come from

The key has to come from the client, not the server. A server-generated key is useless: by the time you've talked to the server to get one, you might already be on the second attempt.

Stripe's API uses an Idempotency-Key header and recommends UUIDv4. The constraint is high entropy. 16 bytes random is enough; a 4-character string is not. A 4-character alphanumeric key has roughly 24 bits of entropy, around 16 million possible values; by the birthday bound, a busy endpoint hits a collision in the low thousands of requests, silently merging unrelated operations.

Two client patterns work.

One key per user intent. The user taps "Pay": generate a UUID, attach it to the request, retry as many times as needed with the same key until you get a response. New tap = new UUID.

Hash-of-payload keys. Derive the key from the request body plus a client-side request ID. Useful when you don't control retry logic. Trickier because float serialization differences across SDK versions have caused real outages.

Pick the first; payload-hash keys turn every SDK serialization tweak into a potential outage you can't reproduce locally.

How long keys live

Stripe's documented TTL is 24 hours; keys are eligible for removal after that. That's a good default.

Twenty-four hours is longer than any realistic client retry budget. Mobile clients give up after minutes. Server-to-server retries with exponential backoff cap out at hours. Twenty-four hours covers the offline-buffered case.

It's short enough that the dedup table doesn't grow unboundedly. At 1,000 ops per second, 24 hours is 86 million rows; the unique-key index lands in the 4–6 GB range and stays resident on a 32 GB node.

The rule for non-default TTLs: longer than the longest client retry, shorter than your storage budget. Don't be clever with per-endpoint TTLs unless you have a reason. The cost of explaining the variance to the next engineer outweighs any gain.

The intent vs result contract

This is the place most implementations are wrong.

Naive idempotency stores only the result: response body and status code. On a duplicate, return the stored response. Done.

The bug: what does the server do when the duplicate arrives while the original is still in flight? Two requests 50ms apart, both miss the cache, both start processing. You've executed twice.

The fix: store intent before executing. Insert a row with status IN_PROGRESS before doing the work. A unique constraint on the key means the second request gets a duplicate-key error and waits for the in-progress row. When work completes, update the row with the final result.

Two states minimum: IN_PROGRESS and COMPLETED. Some systems add FAILED. The state column makes this safe under concurrency.

Brandur Leach's writeup at Stripe goes deeper on recovery: what to do when IN_PROGRESS rows go stale because the worker crashed. A janitor process re-checks rows older than a threshold and either resumes or marks them FAILED.

What counts as "the result" matters. For a payment, the result is the charge ID and the processor status. Not the HTTP response verbatim, which might include timestamps and request IDs that change between calls. Store durable identifiers and reconstruct the response on replay.

What table to use

Postgres works. Redis works for short-TTL workloads. DynamoDB works with conditional writes. The choice mostly comes down to what's already in your stack.

Postgres has two advantages. You can colocate the dedup table with the actual write and wrap both in a single transaction, atomic from the client's perspective. And the unique constraint is solid; no reasoning about eventual consistency.

Redis has speed. If the idempotency layer is in front of an external API call (not your own database), Redis with a TTL is simpler. Tradeoff: Redis durability is configurable, and a SET ... NX EX followed by an EXEC that crashes mid-flight can leave a key set with no result stored.

If your service already writes to Postgres, keep the dedup table there; the rest of this post assumes that.

A 60-line Postgres-backed idempotency layer

This handles the four cases that matter: first request, in-progress duplicate, completed duplicate, and changed-payload conflict.

CREATE TABLE idempotency_keys (
    key             TEXT PRIMARY KEY,
    request_hash    BYTEA NOT NULL,
    status          TEXT NOT NULL CHECK (status IN ('IN_PROGRESS','COMPLETED','FAILED')),
    response_body   JSONB,
    response_status INT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    completed_at    TIMESTAMPTZ
);

CREATE INDEX idempotency_keys_created_at_idx
    ON idempotency_keys (created_at);

import hashlib
import json
import time
import psycopg
from typing import Callable, Any

TTL_HOURS = 24

def _hash(payload: dict) -> bytes:
    canonical = json.dumps(payload, sort_keys=True).encode()
    return hashlib.blake2b(canonical, digest_size=32).digest()


def idempotent_call(
    conn: psycopg.Connection,
    key: str,
    payload: dict,
    do_work: Callable[[dict], tuple[int, dict]],
) -> tuple[int, dict]:
    req_hash = _hash(payload)

    with conn.transaction():
        cur = conn.execute(
            """
            INSERT INTO idempotency_keys
                (key, request_hash, status)
            VALUES (%s, %s, 'IN_PROGRESS')
            ON CONFLICT (key) DO NOTHING
            RETURNING xmax = 0 AS inserted
            """,
            (key, req_hash),
        )
        row = cur.fetchone()
        inserted = row is not None and row[0]

    if not inserted:
        return _replay(conn, key, req_hash)

    try:
        status, body = do_work(payload)
    except Exception:
        conn.execute(
            """
            UPDATE idempotency_keys
            SET status='FAILED', completed_at=now()
            WHERE key=%s
            """,
            (key,),
        )
        raise

    conn.execute(
        """
        UPDATE idempotency_keys
        SET status='COMPLETED',
            response_body=%s,
            response_status=%s,
            completed_at=now()
        WHERE key=%s
        """,
        (json.dumps(body), status, key),
    )
    return status, body

The replay path is its own helper because the inline path needs to fall through to it on conflict, and a janitor job will eventually call the same logic when resuming stale rows.

def _replay(
    conn: psycopg.Connection, key: str, req_hash: bytes
) -> tuple[int, dict]:
    deadline = time.time() + 5.0
    while time.time() < deadline:
        row = conn.execute(
            """
            SELECT status, request_hash,
                   response_status, response_body
            FROM idempotency_keys
            WHERE key=%s
            """,
            (key,),
        ).fetchone()
        if row is None:
            raise RuntimeError("idempotency row vanished")
        status, stored_hash, resp_status, resp_body = row
        if stored_hash != req_hash:
            return 422, {"error": "idempotency_key_reuse"}
        if status == "COMPLETED":
            return resp_status, resp_body
        if status == "FAILED":
            return 500, {"error": "previous_attempt_failed"}
        time.sleep(0.1)
    return 504, {"error": "in_progress_timeout"}

The xmax = 0 trick distinguishes "I just inserted this row" from "the row already existed." INSERT ... ON CONFLICT DO NOTHING doesn't tell you which case happened by default; the xmax column on the returned row is 0 only if the current transaction inserted it.

The replay path polls for completion with a 5-second budget. In a real system you'd lift this into a background queue and have the client poll a status endpoint, but for an inline implementation polling is simpler and the typical wait is sub-second.

A janitor job runs daily:

DELETE FROM idempotency_keys
WHERE created_at < now() - INTERVAL '24 hours';

Run it as a pg_cron job or a sidecar. Don't skip it; the created_at index makes this fast, but a year of unpruned rows will eventually slow the unique-constraint check on inserts.

What this does not solve

Two cases this layer doesn't handle.

Cross-service idempotency. This protects one service. If the operation triggers downstream services with their own side effects, each downstream call needs its own idempotency key derived from the original. The outbox pattern with deterministic message keys is the standard answer.

Long-running operations. A 30-second payment that crashes at second 25 leaves an IN_PROGRESS row and an unknown processor state. Recovery is to query the processor by your idempotency key (Stripe accepts the same key for this) and reconcile. Don't retry blindly.

What this buys you

A working idempotency layer makes retries a no-op. The client can be aggressive (every network blip becomes a retry) and the server stays consistent. Clients get simpler, queues get simpler, customer support stops getting tickets that begin with "I was charged twice."

The pattern is small. The discipline is applying it consistently across every mutating endpoint, every outbox publisher, every webhook delivery, and keeping the contract honest about what's stored and how long it lives.

If this was useful

The System Design Pocket Guide: Fundamentals covers idempotency alongside the patterns it pairs with: exactly-once messaging, the outbox, the saga, retry budgets, at the depth where you can defend the choices in a design review. The Event-Driven Architecture Pocket Guide goes further on the messaging side: how idempotency keys propagate across queues and consumers, and the traps that cause exactly-once to silently regress to at-least-once.

Top comments (1)

arun rajkumar • Apr 30

The "where the key comes from" question is where most teams trip. We tried letting the client mint UUIDs — fine until a buggy retry on a flaky network produced two different keys for the same intent. The fix that stuck: deterministic keys hashed from (merchant_id, payer_id, amount, intent_window) for the request, with a separate client-provided correlation ID for log threading. TTL is the other quiet decision — too short and a 24h dispute reconciliation re-fires the charge, too long and your idempotency table eats a disk. We went 7 days, indexed, and run a nightly compactor. The intent-vs-result split you mention is the part nobody writes down.