In distributed systems, failure is not an exception—it’s the default.
Network calls fail. Services timeout. APIs return 500s. The real question isn’t “Will things fail?” but “How gracefully do we recover?”
Two fundamental techniques help us build resilient systems:
- Exponential Backoff (Retry Strategy)
- Idempotency (Safe Re-execution)
What is Exponential Backoff?
When a request fails, retrying immediately can make things worse—especially during outages or traffic spikes.
Instead, we wait progressively longer between retries.
Formula
tₙ = base × 2ⁿ
Where:
-
tₙ= delay before nth retry -
base= initial delay (e.g., 100ms) -
n= retry attempt number
Example
| Attempt | Delay |
|---|---|
| 1 | 100ms |
| 2 | 200ms |
| 3 | 400ms |
| 4 | 800ms |
Why it works
- Reduces pressure on failing services
- Gives time for recovery (autoscaling, DB failover)
- Avoids cascading failures
Problem Without Backoff
Imagine:
- 10,000 clients hit your API
- Service goes down
- All clients retry instantly
You’ve created a retry storm (thundering herd problem)
Backoff with Jitter
Add randomness to spread retries:
const delay = base * Math.pow(2, attempt) + Math.random() * jitter;
What is Idempotency?
Retries are dangerous unless your operations are safe to repeat.
Idempotency means:
Performing the same operation multiple times results in the same outcome.
Non-idempotent API
POST /payments ->
• Calling twice → charges user twice
Idempotent API
POST /payments
Idempotency-Key: 12345
• First request → processed
• Second request → returns same response
Idempotency Key Pattern
Client sends:
Idempotency-Key: unique-key
Server:
• Stores key + response
• If duplicate → return stored response
Where it matters
• Payment systems
• Order creation
• Kafka consumers
• Distributed job processing
Combining Both: The Real Power
Exponential backoff + idempotency = safe retries
Flow
1. Client sends request with idempotency key
2. Server fails (timeout / 500)
3. Client retries with exponential backoff
4. Server ensures no duplicate side effects
Real-World Example (Payments)
• Client sends payment request
• Network times out after processing
• Client retries
Without idempotency:
User gets charged twice
With idempotency:
Same transaction returned
Retry Strategy (Client / Worker)
• Max retries (e.g., 5)
• Exponential delay with jitter
• Circuit breaker for persistent failures
Reliability isn’t built by preventing failures—it’s built by handling them intelligently.
• Exponential backoff controls when to retry
• Idempotency guarantees safe retry
Together, they form the backbone of resilient distributed systems.
Top comments (0)