DEV Community

Cover image for The Art of Safe Retries: Implementing Idempotency in Distributed Systems
Muhammad Ahsan Farooq
Muhammad Ahsan Farooq

Posted on

The Art of Safe Retries: Implementing Idempotency in Distributed Systems

Distributed systems fail.

Networks drop packets. Services time out. Containers restart. Clients retry requests. None of this is exceptional — it’s the default operating environment.

The real danger appears when retries cause side effects to run more than once.

Imagine a client sending a request:

“Charge the customer $50.”

The server processes the charge successfully but crashes before sending the response. The client receives no confirmation, assumes failure, and retries.
Without protection, the customer gets charged twice.

This isn’t a corner case. It’s a fundamental reality of distributed systems.


The Core Problem: You Can’t Trust the Network

Two facts every backend engineer must internalize:

  • Timeouts ≠ failures
    A timeout only means we didn’t get a response — not that the work wasn’t done.

  • Missing responses ≠ unprocessed requests
    The server may have completed the operation perfectly.

Because of this ambiguity, retries are both necessary and dangerous.

This is where idempotency becomes critical. It turns unreliable networks into reliable systems.


What Is Idempotency?

An operation is idempotent if performing it multiple times has the same effect as performing it once.

Simple Examples

✅ Idempotent (Safe to Retry)

SET balance = 100
DELETE /users/123
PUT /profile/update
Enter fullscreen mode Exit fullscreen mode

Running these multiple times does not change the final state.

❌ Not Idempotent (Dangerous to Retry)

balance += 100     // Running twice results in +200
POST /orders       // Creates duplicate orders
sendEmail()        // Spams the user
Enter fullscreen mode Exit fullscreen mode

Most real-world APIs — especially POST endpoints — are not naturally idempotent.
We must design them to be.


The Idempotency Key Pattern

The industry-standard solution (used by Stripe, Adyen, Shopify, and others) is the Idempotency Key pattern.

How It Works

The Client’s Responsibility

  • Generate a unique identifier (usually a UUID v4)
  • Send it with the request in a header, e.g. Idempotency-Key
  • Reuse the same key across retries

The Server’s Responsibility

  1. Check if the key has been seen before
  2. If yes → return the previously stored response immediately
  3. If no → process the request, store the result, and return it

This guarantees at-most-once execution, even if the client retries aggressively.


Backend Implementation (Node.js / Express)

Below is a practical example using Express, TypeScript, and Redis.

Idempotency Middleware

import { Request, Response, NextFunction } from 'express';
import { createClient } from 'redis';

const redis = createClient();

/**
 * Middleware to enforce idempotency
 */
export const idempotencyMiddleware = async (
  req: Request,
  res: Response,
  next: NextFunction
) => {
  const key = req.headers['idempotency-key'] as string;

  if (!key) {
    return res.status(400).json({
      error: 'Idempotency-Key header is missing'
    });
  }

  const cacheKey = `idempotency:${key}`;

  try {
    // 1️⃣ Check if this request was already processed
    const cachedResponse = await redis.get(cacheKey);

    if (cachedResponse) {
      const parsed = JSON.parse(cachedResponse);
      return res.status(parsed.statusCode).json(parsed.body);
    }

    // 2️⃣ Capture the response before sending it
    const originalJson = res.json;

    res.json = function (body: any): Response {
      const responseToCache = {
        statusCode: res.statusCode,
        body
      };

      redis.set(cacheKey, JSON.stringify(responseToCache), {
        EX: 60 * 60 * 24 // 24 hours
      });

      return originalJson.call(this, body);
    };

    next();
  } catch (err) {
    next(err);
  }
};
Enter fullscreen mode Exit fullscreen mode

The Payment Endpoint

import express from 'express';
import { v4 as uuidv4 } from 'uuid';
import { idempotencyMiddleware } from './idempotency';

const app = express();
app.use(express.json());

const db = {
  transactions: [] as any[]
};

app.post('/api/charge', idempotencyMiddleware, async (req, res) => {
  const { amount, userId } = req.body;

  const transactionId = uuidv4();

  db.transactions.push({
    transactionId,
    amount,
    userId,
    createdAt: new Date()
  });

  res.status(201).json({
    success: true,
    message: 'Charge processed successfully',
    transactionId
  });
});

app.listen(3000, () => console.log('Server running on port 3000'));
Enter fullscreen mode Exit fullscreen mode

This endpoint is now safe to retry infinitely.


Client-Side Implementation (TypeScript / Axios)

The frontend plays a crucial role.

The key must be generated once per user intent, not once per request attempt.

import axios from 'axios';
import { v4 as uuidv4 } from 'uuid';

async function performSafePayment(amount: number, userId: string) {
  // Generate ONCE
  const idempotencyKey = uuidv4();

  const makeRequest = () =>
    axios.post(
      'https://api.example.com/charge',
      { amount, userId },
      {
        headers: {
          'Idempotency-Key': idempotencyKey
        }
      }
    );

  try {
    await makeRequest();
  } catch (error: any) {
    if (!error.response) {
      // Network failure — safe to retry
      await makeRequest();
    } else {
      // Logic error — do NOT retry
      throw error;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Critical Edge Case: The In-Flight Race Condition

A common mistake is ignoring concurrency.

The Failure Scenario

  • Request A checks Redis → key not found
  • Request B checks Redis → key not found
  • Both process the payment 👉 Double charge

The Solution: Atomic Locking

Use Redis SET NX to claim the key before processing.

const lockKey = `lock:${key}`;
const acquired = await redis.set(lockKey, 'LOCKED', {
  NX: true,
  EX: 10 // seconds
});

if (!acquired) {
  return res.status(409).json({
    error: 'Request currently in progress'
  });
}
Enter fullscreen mode Exit fullscreen mode

This guarantees that only one request executes the side effect.


When Should You Use Idempotency?

You don’t need this everywhere, but it is mandatory for:

  • Payments — never charge twice
  • Order creation — prevent duplicate shipments
  • Webhooks — providers retry aggressively
  • Mobile clients — unstable networks
  • Notifications — avoid spam

If retries are possible and side effects exist, idempotency is not optional.


Real-World Case Study: Stripe

Stripe is the gold standard here.

Their API explicitly assumes:

  • Network failures are normal
  • Clients will retry
  • Requests may be replayed hours later

Providing an Idempotency-Key guarantees no duplicate side effects, even across retries and timeouts.


Conclusion

Idempotency separates junior APIs from production-grade systems.

By shifting correctness from the unreliable network layer to a durable storage layer, we allow clients to retry aggressively without fear of corruption.

Next time you design a POST endpoint, ask yourself:

“What happens if this is called twice?”

If the answer is “disaster”, it’s time to implement idempotency.

Top comments (0)