A user clicks "Place Order" on a flaky cellular connection. The request times out at five seconds. The retry library on the client kicks in and sends the same request again. The retry succeeds. The order goes through.
The next morning the user's credit card statement shows two charges. The customer support ticket is open by the time engineering walks in.
This is the canonical idempotency bug, and it shows up in every retry-prone system that does not design for it. The fix is idempotency keys, and getting the design right requires a few specific properties most teams figure out the hard way.
This guide walks through how to design idempotency keys that hold up across retries, deduplication windows, and the edge cases that real systems produce.
Why At-Least-Once Delivery Is the Norm
The shape of the bug is general. Any system where the client retries a write request, and the server cannot tell duplicate retries from new attempts, can apply the same write twice. The conditions that trigger it include:
- Network timeouts where the request succeeded server-side but the client never got the response
- Connection drops mid-response
- Load balancer failovers that retry the request transparently
- Client-side retry libraries that retry on any 5xx
- Application-level retry logic for transient failures
Without idempotency keys, the server has no way to tell that the second request is a retry rather than a new action. The default in most HTTP stacks is to treat every POST as a new write.
Idempotency keys solve this by making the request itself uniquely identifiable. The client generates a unique key per logical action. The server stores the key with the result of the action. If the key arrives again, the server returns the cached result rather than reapplying the action.
Step 1: Generate the Key on the Client at Action Time
The idempotency key has to be generated once per logical action, not once per HTTP request. If the client retries, the same key gets sent on every retry. If the user clicks "Place Order" twice on purpose, each click gets a different key.
The cleanest place to generate the key is at the moment the user takes the action, before the first request is sent. A UUID is sufficient. UUIDv4 is the right default; UUIDv7 (time-ordered) is acceptable if you want chronological sorting in logs.
Do not generate the key from the request body. A hash of the body fails when the body legitimately changes between retries (timestamps, signatures, etc.), and it fails to deduplicate when the user makes two identical actions on purpose.
Do not generate the key on the server. The server cannot tell which requests came from the same client-side action and which came from different ones.
const idempotencyKey = crypto.randomUUID();
await fetch('/api/orders', {
method: 'POST',
headers: {
'Idempotency-Key': idempotencyKey,
'Content-Type': 'application/json',
},
body: JSON.stringify(order),
});
The retry library should reuse the same key on every attempt of the same action.
Step 2: Store the Key Server-Side With the Result
When the server receives a request with an idempotency key, it does the following:
- Look up the key in a store (Redis, DynamoDB, Postgres).
- If the key exists with a "completed" status, return the stored result.
- If the key exists with an "in flight" status, return a 409 or wait for the in-flight request to complete.
- If the key does not exist, write it as "in flight," process the request, and update the key with the result.
The store needs to support atomic check-and-set so that two simultaneous retries cannot both pass step 4 and start processing. Postgres can do this with INSERT ... ON CONFLICT DO NOTHING. Redis can do it with SET NX. DynamoDB can do it with a conditional PutItem.
The "in flight" state matters. Without it, two retries arriving within milliseconds of each other can both start processing the same action. With it, the second retry sees the in-flight key and either waits or returns an error.
Step 3: Decide the Storage Window
Idempotency keys cannot be stored forever. The storage layer would grow without bound. The right window is 24 hours for most use cases, longer for high-consequence actions.
The window has to be longer than:
- The maximum retry duration the client will perform
- Any plausible delay between user retries (a user who navigates back and resubmits the form)
- The downstream system's batch processing window (if applicable)
For payment-related actions, 24 hours is the industry standard, matching what Stripe documents on idempotency and what most payment processors expect.
For high-consequence actions like account deletion or large data exports, 72 hours or longer is reasonable.
After the window, the key expires and can be reused. This is correct: a request from a year later with the same key is a different action, not a retry of the original.
Step 4: Match the Key Scope to the Action's Identity
A common mistake is scoping the idempotency key globally. Two different users generating the same UUID (vanishingly unlikely with UUIDv4, but conceptually possible) should not share keys. The key store should be scoped per user or per account.
The right scoping is to make the lookup key the combination of the user identifier and the idempotency key. (user_id, idempotency_key) uniquely identifies an action. The same UUID from two different users represents two different actions.
For unauthenticated requests, scope by the client's API credential or by IP address, with the caveats that IP-based scoping fails behind NAT and proxies. For most authenticated APIs, the user identifier is the right scoping dimension.
Step 5: Handle the Edge Cases That Real Systems Produce
Three edge cases show up often enough to plan for.
The first is the "key reuse with different body" case. The client sends a request with key K and body B1. Some time later, the client sends another request with the same key K but body B2. The server should reject the second request with a 422 because the key was already used for a different body. Otherwise an attacker (or a buggy client) could overwrite an existing action.
The fix is to store a hash of the request body alongside the key. On lookup, the server compares the new request's body hash to the stored hash. If they differ, return an error.
The second is the "in flight" case where the original request never completes. A crash mid-processing leaves the key in "in flight" state forever, and every retry sees it and either waits or errors. The fix is a timeout on the in-flight state: after N seconds with no result, treat the key as expired and allow a retry to proceed (or roll back any partial work first, depending on the action).
The third is the "long-running action" case where the action takes longer than the typical retry window. A retry arrives before the original completes. The server has to either wait, return a "still processing" response with a polling token, or process the retry and trust that the result will be consistent. For most actions, "still processing" with a polling token is the safest pattern.
Step 6: Surface the Key to the User When Helpful
For high-consequence actions, the idempotency key can be surfaced to the user as a transaction reference. "Your order has been placed. Reference: 7d3f-92ab-1c4e." If the user calls support unsure whether the order went through, the reference number lets support look up the action regardless of how many retries happened.
This is not strictly necessary, but it costs almost nothing and pays off in customer support flow.
How This Pairs With Optimistic UI
Idempotency keys are the server-side complement to optimistic UI on the client. The client applies the action optimistically; the request goes out with an idempotency key; the network can retry, the load balancer can retry, the application-level retry library can retry, all using the same key; the server sees one logical action no matter how many requests arrive.
The longer guide on how to implement optimistic UI updates that roll back cleanly at 137Foundry covers the client-side state model that pairs with idempotency keys. The key insight is that the client's optimistic queue stores the idempotency key alongside the apply and revert functions, so any retry of the same logical action uses the same key.
For more on related backend patterns and the kind of frontend-backend coordination that holds up in production, the 137Foundry data integration service and the 137Foundry web development service both handle this kind of work.
Common Mistakes to Avoid
Three mistakes show up often enough to call out.
The first is generating the idempotency key on every retry. If the retry library generates a new key each time, the deduplication is defeated and the server treats every retry as a new write. The key has to be generated once per logical action and reused on every retry.
The second is storing only the key without the result. If the server stores only "yes this key was used" without the response, the client cannot get the original response on a retry. The retry has to either reprocess (defeating the purpose) or return a generic "duplicate" error (frustrating for the client). Store the result.
The third is not having an expiration policy. Keys without expiration accumulate forever. The store grows, lookups slow down, and eventually the system has to be migrated. Set the expiration at provision time, not as an afterthought.
Where to Read More
For more on the client-side state model that pairs with idempotency keys, the longer guide on how to implement optimistic UI updates that roll back cleanly covers the optimistic queue, the rollback patterns, and the reconciliation logic. The Stripe documentation on idempotency, the AWS docs on idempotent receivers, and the IETF draft on the HTTP Idempotency-Key header are the references most teams use when designing this layer. The Redis SETNX documentation is useful for the storage-layer atomicity, which is the part most easy to get wrong.
The Short Version
Idempotency keys make retry-prone systems safe to retry. Generate the key on the client at action time. Store it server-side with the result and a body-hash check. Scope per user. Set an expiration window of at least 24 hours, longer for high-consequence actions. Handle the "in flight" timeout, the "key reuse with different body" case, and the "long-running action" case explicitly.
Done right, an idempotency key system means every action the user takes has a unique identity, every retry is safe, and double-charges or double-deletes become a class of bug the system structurally cannot produce.
Top comments (0)