A friend once messaged me, mildly furious, with a screenshot from his bank: the same $500 payment, twice, two seconds apart. He'd tapped "Pay," the app spun for a moment, nothing happened, so he tapped again. Except the first tap had gone through - the response just never made it back to his phone. The server happily charged him a second time.
Nobody wrote a bug for this. Every individual piece worked exactly as designed. The payment endpoint did its job, twice, because from its point of view it received two perfectly valid requests. The problem lives in the space between the client and the server, and that space is where a huge category of production incidents hides.
This article is about the pattern that fixes it: idempotency keys. We'll build up from the failure, write a real implementation in Python (FastAPI + SQLAlchemy + Postgres), and - like its sibling, the transactional outbox - we'll spend most of our time on the four hard questions that the tidy "just store the key" tutorials skip.
The villain: retries are not optional
Here's the uncomfortable truth that makes idempotency necessary: on a network, you cannot tell the difference between "my request failed" and "my request succeeded but the response got lost." From the client's side, both look identical - you sent something, you waited, you got nothing back.
So clients retry. They should retry - it's the only sane response to a timeout. Mobile apps retry on flaky connections. Load balancers and API gateways retry on 502s. Message consumers redeliver. SDKs have automatic retry built-in. Users double-tap impatient buttons. Retries are a permanent fact of distributed life, and the only honest delivery guarantee the network gives you is at-least-once.
Which means duplicate requests aren't an edge case you can validate away. They are guaranteed to arrive. The server's job is not to prevent them - it can't - but to absorb them: to make a repeated request harmless. An operation you can safely apply many times with the same result as applying it once is called idempotent. GET, PUT, and DELETE are idempotent by their HTTP definitions. POST - "create a payment," "place an order" - is the dangerous one, and it's exactly the one that moves money.
The idea: a client-supplied key that means "this is the same operation"
The fix is wonderfully simple in concept. The client generates a unique key for each logical operation - one key per "I want to pay $500 for order #42," reused across every retry of that same intent - and sends it as an HTTP header:
POST /payments HTTP/1.1
Idempotency-Key: 5f5c1b0e-0b8a-4c7e-9d6a-2c1f3e4a5b6c
Content-Type: application/json
{ "order_id": "42", "amount_paise": 50000 }
The server makes a promise in return: for a given key, I will execute the operation at most once. If I see the same key again, I will replay the response I already produced - without doing the work a second time.
This isn't a niche trick. Stripe has done it for years, and it's being standardized by the IETF as the Idempotency-Key HTTP header field (a Standards-Track draft, currently -07). If you've used a payments API, you've used this pattern from the outside.
The happy path has two halves - the first request does the work; the second replays the answer:
Easy, right? It is, until you try to make it correct. The simplest implementation is a check-then-act:
# DON'T ship this - it's a race waiting to happen.
async def handle(key, request):
existing = await store.get(key)
if existing:
return existing.response # replay
response = await do_the_work(request)
await store.save(key, response) # record
return response
This is broken in at least three ways, and each one is a question worth answering properly.
1. What exactly do you store and where do you intercept?
When you "replay the response," what is the response? Not the return value of your handler function - the actual HTTP response: its status code, a selected set of headers, and the body, byte for byte. A replay that returns 200 when the original returned 201, or drops the Location header, is a subtly broken replay that will confuse clients.
So you must capture and persist something like this:
class StoredResponse(TypedDict):
status_code: int
headers: list[tuple[str, str]] # an allow-list, not everything
body_b64: str # base64 of the raw bytes
A crucial detail: don't replay every header. Store an allow-list (content-type, location, etag, …). Headers like Date, Server, or anything connection-specific should be regenerated fresh, not served from a week-old cache. Replaying them is how you ship weird bugs.
This requirement also answers where the logic lives, and it's a question people get wrong. There are three candidate layers:
A decorator on the handler - but a decorator sees the function's return value, not the serialized HTTP response. It can't faithfully capture the status and headers FastAPI is about to generate.
A dependency (
Depends(...)) - runs before the handler returns, so it can detect a replay and short-circuit, but it can't cleanly emit a full stored response without forcing your handlers into an unnatural shape.ASGI middleware - wraps the entire request/response lifecycle and owns the final response object end to end. This is the only layer that can both intercept early (to replay) and capture late (to store).
So the right home is middleware, with an optional per-route marker to opt routes in:
from starlette.middleware.base import BaseHTTPMiddleware
class IdempotencyMiddleware(BaseHTTPMiddleware):
def __init__(self, app, store, methods=("POST", "PATCH"), ttl=86_400):
super().__init__(app)
self.store = store
self.methods = set(methods)
self.ttl = ttl
async def dispatch(self, request, call_next):
if request.method not in self.methods:
return await call_next(request)
key = request.headers.get("Idempotency-Key")
if key is None:
return await call_next(request) # opt-in: no key, no magic
body = await request.body()
fp = fingerprint(request.method, request.url.path, body)
outcome = await self.store.claim(key, fp, self.ttl)
if outcome.kind == "conflict":
return JSONResponse({"error": "idempotency key reuse"}, 409)
if outcome.kind == "replay":
return replay(outcome.stored)
# outcome.kind == "fresh": we own this key - execute once.
response = await call_next(request)
await self.store.complete(key, capture(response))
return response
(There's a real-world wrinkle: reading request.body() in middleware consumes the stream, so you must make it re-readable downstream. It's a few lines of plumbing I'm eliding here to keep the shape clear - but if you implement this, that's the bit that'll bite you first.)
2. What if someone reuses a key with a different request?
Keys are client-generated, and clients have bugs. Sooner or later a buggy client will send key k1 for a $500 payment and then reuse k1 for a $900 payment. If you blindly replay, you'll return the $500 receipt for a $900 charge - or worse, silently swallow the second payment. Both are terrible.
The defense is a fingerprint: a hash of the request's identity - method, path, and canonicalized body. You store it alongside the key, and on every hit you compare:
import hashlib
def fingerprint(method: str, path: str, body: bytes) -> str:
h = hashlib.sha256()
h.update(method.encode())
h.update(b"\x00")
h.update(path.encode())
h.update(b"\x00")
h.update(body)
return h.hexdigest()
The rule becomes:
Same key, same fingerprint, completed → replay the stored response. (A genuine retry.)
Same key, different fingerprint →
409 Conflict. The client reused a key for a different operation; that's a bug on their side and you refuse to guess.Same key, same fingerprint, still in flight → wait (see Question 3).
That 409 is you being a good API citizen: you'd rather loudly reject an ambiguous request than quietly do the wrong thing with someone's money. (In practice you canonicalize the body before hashing - sort JSON keys, normalize whitespace - so that semantically identical payloads don't produce different fingerprints over trivial formatting differences.)
3. What happens when duplicates arrive at the same time?
This is the question that separates a toy from a tool, and it's the one the check-then-act code fails hardest. Picture not a slow retry seconds later, but fifty identical requests landing in the same instant - a stampede from an aggressive client SDK, or a gateway that fanned out a retry. Every one of them runs store.get(key), every one finds nothing recorded yet (because none has finished), and every one proceeds to execute. Fifty charges.
The race is structural: "check if it exists" and "record that it exists" are two steps, and concurrency loves a gap between two steps. To close it you need a single-flight lock: a per-key lock that guarantees exactly one request executes while the rest wait and then replay.
On Postgres, the cleanest tool for this is an advisory lock keyed on the idempotency key. You take it at the start, hold it until the transaction commits, and the database serializes everyone competing for that key:
from sqlalchemy import text
async def claim(session, key: str, fp: str, ttl: int) -> Outcome:
# Serialize everyone using this key until we commit/rollback.
await session.execute(
text("SELECT pg_advisory_xact_lock(hashtext(:k))"), {"k": key}
)
row = await session.get(IdempotencyKey, key)
if row is None:
session.add(IdempotencyKey(key=key, fingerprint=fp,
status="in_flight"))
return Outcome(kind="fresh")
if row.fingerprint != fp:
return Outcome(kind="conflict") # → 409
if row.status == "completed":
return Outcome(kind="replay", stored=row.response)
# in_flight, but WE now hold the lock - the previous owner is gone.
# Its work was never recorded, so this request retries it safely.
return Outcome(kind="fresh")
Why an advisory lock and not a row lock? Because on the very first request there's no row to lock yet - and reaching for a row lock creates an insert race precisely where you can least afford one. An advisory lock lets you serialize on a key that may not exist yet. The honest trade-off: holding pg_advisory_xact_lock for the whole handler means holding a transaction open for the whole handler, which you don't want for slow operations. For those, a short-lived lock plus an in_flight state column is the variation - more bookkeeping, shorter locks.
Redis is the popular alternative for speed: SET key value NX PX <ttl> is an atomic "acquire lock if absent," and a small Lua check-and-delete releases it. It's faster, but read Question 4 before you reach for it on a payments path.
4. What if the request holding the lock crashes?
Here's the failure that turns a clever idea into a 3am outage: Request A acquires the lock, starts executing, and the process dies mid-flight - OOM, deploy, pod eviction. The lock is held. Every retry of that key now blocks forever on a lock owned by a process that no longer exists. The key is wedged, and the customer can never complete their payment.
A correct implementation makes a dead lock-holder release its lock automatically. How depends on the backend:
Postgres advisory locks are owned by the database session. If the connection drops - which is exactly what happens when your process dies - Postgres releases the lock automatically. There's no timeout to tune and no reaper job to write; crash safety is a property of the mechanism. The key simply becomes claimable again, and the next retry re-executes (the original work was never recorded, so re-executing is correct).
Redis
SET NX PXrelies on thePXexpiry: a dead holder's lock evaporates after the TTL. This is where Redis gets subtle. If your handler runs longer than the lock TTL while still alive, the lock can expire out from under a living request, and a second request can start - a double execution. The classic distributed-lock caveat. You mitigate by setting the TTL safely above your handler timeout, but you cannot make it airtight without fencing tokens.
So my honest guidance, the kind I'd give in a design review: use Postgres for payment-critical idempotency - its lock gives you a strong, crash-safe single-flight guarantee. Use Redis when you want speed and can tolerate best-effort single-flight (it's still a massive improvement over nothing). Don't let "Redis is faster" talk you into weaker correctness on the path that moves money.
Putting it together: the state machine
Step back and the whole thing is a small, legible state machine per key:
And the storage is a single table:
from datetime import datetime
from sqlalchemy import DateTime, String, func
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.orm import Mapped, mapped_column
class IdempotencyKey(Base):
__tablename__ = "idempotency_key"
key: Mapped[str] = mapped_column(String, primary_key=True)
fingerprint: Mapped[str] = mapped_column(String)
status: Mapped[str] = mapped_column(String) # in_flight | completed
response: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now()
)
expires_at: Mapped[datetime] = mapped_column(DateTime(timezone=True))
That expires_at matters more than it looks. Idempotency keys are not forever - you keep them for a TTL (24 hours is a common choice; Stripe holds them around 24h) and then let them expire so the table doesn't grow without bound and keys can eventually be reused. A periodic sweep deletes rows past expires_at. The TTL is a deliberate product decision: long enough to cover realistic retries, short enough to bound storage.
What idempotency keys do not give you
Every honest pattern comes with a boundary, and naming it is what separates engineering from cargo-culting.
Idempotency keys make a specific request safe to repeat. They do not make your business logic idempotent in general. If two different keys both create a payment for the same order, idempotency keys won't stop you - that's a job for a unique constraint on (order_id) in your domain, a different layer of defense. Keys protect against retries of the same intent, not against distinct requests with the same effect.
They also don't help idempotent-by-definition methods. A GET doesn't need a key; a well-designed PUT (full replacement) is already idempotent. Reserve the machinery for the genuinely unsafe verbs - POST, sometimes PATCH - where re-execution causes harm. Putting an idempotency layer in front of everything is overhead with no payoff.
And one more piece of honesty, echoing the outbox pattern's "exactly-once is a lie": idempotency keys give you at-most-once execution per key, plus faithful replay - which, combined with client retries, feels like exactly-once from the outside. That's the strongest truthful promise. The duplicate request still arrives; you've just made it harmless. That's the whole game in this corner of distributed systems - you can't stop the duplicate, so you make it not matter.
The one-paragraph version
Clients will retry, because the network can't tell a lost response from a failed request, so duplicate requests are guaranteed, not hypothetical. Let clients send an Idempotency-Key header, and promise to execute each key at most once and replay the stored response thereafter. Store the real HTTP response (status, allow-listed headers, body) so replays are faithful. Fingerprint the request and return 409 if a key is reused for a different body. Use a per-key single-flight lock - a Postgres advisory lock for the strong, crash-safe guarantee - so a stampede of simultaneous duplicates produces exactly one execution. Expire keys on a TTL. Do that, and the bug that charged my friend $1000 for a $500 payment simply stops being possible.
This is the inbound cousin of the transactional outbox pattern - together they cover both directions of "do this exactly once" in a payments system. I write about backend reliability and the unglamorous distributed-systems details that only surface in production.




Top comments (0)