Kengo Wada

Posted on Jul 21

When APIs Fail: A Developer's Journey with Retries, Back Off, and Jitter

#systemdesign #sre #distributedsystems #programming

You’ve just built your billion-dollar app.
It fetches and sends data to a web of external services. Stripe handles your payments. Twilio sends your OTPs and messages. You’ve got at least one AI model doing something magical. Maybe you’ve even thrown in a dash of crypto, because why not? You didn’t become a tech billionaire by building basic CRUD apps.

But then, it happens. A request fails.
At first, you think: “I must’ve messed up.”
And sure, during development you fixed those user-side bugs, bad requests, missing parameters, the dreaded 422 Unprocessable Entity. But now the errors you’re seeing are different.

You’re getting:

500 Internal Server Error - what even is that?
503 Service Unavailable - cool, cool, but when will it be available?
408 Request Timeout - like you, the API seems too busy to respond.

You double-check your code. Everything’s clean. Your app did nothing wrong.

The problem? The services you rely on just couldn’t keep up with your success.

Now, you have a choice:
You could accept the failure and move on, or…
You could build a resilient system that doesn’t just give up the moment an API gets cranky.

This is where our story begins, with retries.
But as you’ll see, retrying isn’t as simple as looping and hoping. We’ll explore how retries can go wrong, how to back off gracefully, and how adding a little randomness, jitter, can save you (and the API) from a world of pain.

Retries

Not every failed request deserves a second chance.

When a request fails, the first thing you need to do is understand why. Some failures are your fault, like sending malformed data that the server just can't process. In these cases, you'll see a 422 Unprocessable Entity, and no amount of retrying is going to magically fix bad input. Fix the payload, not the persistence.

But other times, the failure isn't on you. Maybe you're trying to charge a customer with Stripe and you get hit with a 503 Service Unavailable, or you're waiting for a response that never comes and receive a 408 Request Timeout.
You double-check your request. The headers are right. The data is valid. Everything on your end checks out.

That’s when retries come into play.

For transient errors like these, where the issue is likely temporary, it makes sense to try again. A retry is a way of saying:

Hey, I know you’re having a moment, but I’ll give you another shot.

Let’s look at how a basic retry might work in Python.

from http import HTTPStatus
from typing import Any

import requests


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    return requests.post(url, headers=headers, data=payload)


def main():
    headers = {
        "Authorization": "some-secure-api-token",
        "Content-Type": "application/json",
    }
    payload = {
        "card_number": "1234-1234-1234-1234",
        "cvv": "123",
        "expiry_date": "12/32",
        "amount": 1_000_000.00,
        "currency": "USD",
    }

    response = collect_money(headers, payload)
    if response.status_code == HTTPStatus.OK:
        # Bentley or Rolls Royce?? Nah, both
        print("💰")
        return

    # ! As we can see, if the request doesn't succeed,
    # ! we can't flex. Let's fix this


if __name__ == "__main__":
    main()

Now let's add simple retry

RETRYABLE_STATUS_CODES = {
    HTTPStatus.REQUEST_TIMEOUT,
    HTTPStatus.INTERNAL_SERVER_ERROR,
    HTTPStatus.BAD_GATEWAY,
    HTTPStatus.SERVICE_UNAVAILABLE,
    HTTPStatus.GATEWAY_TIMEOUT,
}


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    while True:
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(f"Transient failure with {response.status_code}. Retrying...")

But hold up, retrying blindly opens the door to a whole new set of problems.
As we just saw, our current implementation keeps retrying forever if the service stays down. That might sound persistent, even noble, but in reality... it’s reckless.

You’ve essentially built an infinite loop that hammers a failing service with requests, over and over again. Not only is this wasteful, but if everyone using the same service does this, it can make things worse for everyone.

We need to put a leash on our enthusiasm.
Enter: max retries.

MAX_RETRIES = 3 # Can be anything, just make it reasonable


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(1, MAX_RETRIES + 1):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

    return response

With that change, we’ve tamed our retries.

In this example, we’ve set a limit of 3 attempts, but the exact number is up to you. It should be reasonable, based on how critical the request is and how often the downstream service is likely to recover.

Right now, we simply return the final response. But in a real-world system, you might want to do more: log every failed attempt, store intermediate responses for debugging, or return a structured error indicating that this request failed after multiple retries.

But hey, that’s a story for another time. Let’s keep things simple for now.

Back Off

Now that we’ve added retries, things are looking better.
But there’s one big issue: our code is way too fast.

The time between retries is practically nonexistent. If the service is down, we just hammer it over and over again in rapid succession. Now imagine thousands of users doing the same thing, it starts to look a lot like a DDoS attack.

Picture this: someone’s busy, clearly overwhelmed, and you ask them for something. They say, “Wait…”, but before they even finish the word, you’re already asking again.
Do that enough times and even the calmest person would reach for… well, let’s just say you’d be testing the limits of their patience and the local laws.

We need to give services a breather between retries.
This is where we introduce back off, a deliberate pause before trying again.

Quick note:
For these examples, we’ll use seconds as our unit of time for delays, but in a real-world system, that could be milliseconds, minutes, or whatever suits your use case. The concept stays the same.

Constant Back Off

Just like the name suggests, this strategy waits the same amount of time between each retry. It’s the simplest form of backoff and incredibly easy to implement and reason about.

But simplicity has its limits.

While constant backoff works fine in development or low-traffic environments, it doesn’t scale well under load. In high-concurrency systems, it can cause a phenomenon known as a retry storm, when multiple clients all retry at the exact same intervals, bombarding the server in perfect sync.

This retry pattern creates the perfect conditions for a server meltdown.
(You might also hear it called a “thundering herd” problem in other documentation, it’s the same issue with a different name.)

That said, constant back off is still useful. It’s great for testing APIs, simulating retry behavior, or even as a baseline for tuning more advanced strategies like exponential back off(we'll get there).

BACK_OFF_TIME = 2


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(1, MAX_RETRIES + 1):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

        time.sleep(BACK_OFF_TIME)

    return response

Linear Back Off

In linear backoff, the time between retries increases at a steady, predictable rate.
Typically, this delay grows based on the retry attempt number.

For example, if we start with a 1-second base delay, the first retry waits 1 second, the second waits 2 seconds, the third waits 3 seconds, and so on.

This approach is slightly better than constant backoff for reducing immediate contention, especially when multiple clients are retrying around the same time. The gradually increasing delays can help space out retry attempts just enough to give a struggling server a breather.

But linear backoff isn’t without flaws.

It still suffers from the same core issue as constant back off, many clients can still retry in sync, especially if they all follow the same retry pattern. This makes it vulnerable to retry storms (also called thundering herd problems) under high traffic.

Another problem is that it ramps up too slowly.

If a service is seriously struggling, say it’s under heavy load or temporarily offline, your app is still retrying too often. The delays (1s, 2s, 3s…) aren’t big enough to give the service time to breathe.

In those situations, what you really need is a more aggressive approach, one that quickly increases the wait time between retries.

BASE_DELAY = 1


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(1, MAX_RETRIES + 1):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

        time.sleep(BASE_DELAY + i - 1)

    return response

Exponential Back Off

With exponential backoff, the delay between retries grows exponentially with each attempt.
A common pattern uses a base of 2: so you’d wait 1 second, then 2 seconds, then 4, 8, 16… and so on, usually up to a defined maximum delay.

This strategy is much better at reducing pressure on struggling systems. As the delay increases, the retry frequency slows down significantly, which gives the service more time to recover. It also greatly lowers the chances of multiple clients retrying at the same time, which helps avoid retry storms.

Because of this, exponential back off is well-suited for high-load, distributed systems, and is often the go-to strategy for handling transient errors in production environments.

However, it’s not perfect.

If the issue resolves quickly, say after a very brief glitch, you might still be stuck waiting longer than necessary for the next retry. That’s the trade off: we gain resilience, but sometimes at the cost of speed.

def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(1, MAX_RETRIES + 1):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

        time.sleep(2**i - 1)

    return response

Jitter

At this point, you're probably thinking:

Okay, you used the word "jitter" in the title, but what the hell is it?

And listen, there’s a reason I have the billion-dollar app and you don’t.
(Kidding. Mostly.)

Jokes aside, jitter is all about introducing randomness into your back off delays.

Up until now, all our retry strategies have been very... predictable. Constant. Linear. Exponential. If ten clients fail at the same time, they’ll all retry at the same time. Again. And again. And again. That predictability is exactly what causes retry storms (aka thundering herds).

Jitter shakes things up.

Instead of waiting exactly 4 seconds after the 3rd retry, for example, we might wait somewhere between 0 and 4 seconds. Each client gets a slightly different delay, so retry attempts are spread out rather than stacked on top of each other.

This reduces the chance of all clients hitting the server simultaneously, like traffic lights going green at staggered times instead of all at once.

It's most commonly paired with exponential back off, where the delays grow quickly and predictably.
But hey, with enough creativity (or chaos), you can slap jitter onto just about any back off strategy.

Full Jitter

With full jitter, we take the calculated exponential back off delay, and then add randomness.

More specifically, we pick a random value between 0 and the current exponential delay. This spreads retries out nicely across time and makes it much less likely that multiple clients will retry at the same moment.

It’s easy to implement, and it significantly improves resilience in distributed systems by reducing congestion and maximizing the spread of retry attempts.

But like everything, it has its trade-offs.

Since the delay is random between 0 and the upper bound, it’s possible for some retries to happen almost immediately.
If you're in a situation where you need a minimum wait time, this can be a problem.
Also, without a cap, the exponential growth can cause delays to become unreasonably long. That’s why most implementations include a cap to limit how big the delay can get.

delay = random(0, min(cap, base * 2^attempt))

Where:
- cap: This is maximum delay between requests
- base: The initial delay for the first retry
- attempt: The current attempt retry number

MAX_DELAY = 5


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(MAX_RETRIES):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i+1}] Transient failure with {response.status_code}. Retrying..."
        )

        # Here our base delay is 1
        delay = random.randint(0, min(MAX_DELAY, 2**i))
        time.sleep(delay)

    return response

Equal Jitter

This strategy guarantees a minimum delay, then adds some randomness on top. The idea is to split the calculated delay in half: use one half as the minimum wait, and the other half as the random spread. This way, we avoid retries happening almost instantly, which can still happen with full jitter.

base_delay = min(cap, base * 2^attempt) / 2
delay = base_delay + random(0, base_delay)

With this, all retries wait at least half of the calculated delay, and then add a random amount on top. This prevents very short delays and still spreads out retry attempts, reducing the chance of retry storms.

Equal jitter strikes a balance between responsiveness and stability: it ensures a consistent minimum wait time while still adding some randomness. That said, it has its downsides. Compared to full jitter, the range of delays is narrower, and average delays tend to be longer, since we avoid those short, fast retries.

MAX_DELAY = 5


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    for i in range(MAX_RETRIES):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

        base_delay = min(MAX_DELAY, 2**i) // 2
        delay = base_delay + random.randint(0, base_delay)
        time.sleep(delay)

    return response

Decorrelated Jitter

This is a rare but intriguing variation. Its main goal is to further spread out retry attempts by making the delay depend not only on the current attempt, but also on the previous delay value. This creates a more dynamic and adaptive retry strategy.

delay = min(cap,random(base,previous_sleep×3))

The result is a wider and more irregular spread of retries. Like full jitter, it’s highly effective at reducing retry storms, but it takes a different approach, the randomness is influenced by the delay used in the last attempt.

Pros:

Adapts to runtime behavior, since each delay depends on the last.
Excellent at avoiding retry synchronization.
Offers extreme desynchronization, which can be valuable in some high-concurrency environments.

Cons:

Less predictable, which may be undesirable in some systems.
Can lead to very long delays, especially if a previous delay was high.
Introduces statefulness, since the previous delay must be tracked, unlike pure stateless strategies like full or equal jitter.
Slightly harder to reason about and implement correctly.

In short, decorrelated jitter is powerful, but should be used sparingly, when data or experience shows that your system would benefit from maximum spread and desync in retries.

MAX_DELAY = 5
BASE_DELAY = 2


def collect_money(
    headers: dict[str, str], payload: dict[str, Any]
) -> requests.Response:
    url = "https://important.com/api/v1/path"

    previous_delay = BASE_DELAY

    for i in range(MAX_RETRIES):
        response = requests.post(url, headers=headers, data=payload)

        if response.status_code == HTTPStatus.OK:
            return response

        if response.status_code not in RETRYABLE_STATUS_CODES:
            return response

        print(
            f"[Attempt: {i}] Transient failure with {response.status_code}. Retrying..."
        )

        delay = min(MAX_DELAY, random.randint(BASE_DELAY, previous_delay * 3))
        time.sleep(delay)
        previous_delay = delay

    return response

Let’s wrap this up with a few parting thoughts:

Not every error deserves a retry. Before you blindly try again, make sure retrying makes sense for the failure, some errors are permanent or unrecoverable.
Always cap your retries. Set a finite number of attempts that makes sense for the system. Too few might miss recoverable errors; too many might hammer a broken service.
Don't retry instantly. After a failure, give it a moment. Backing off gives systems a chance to recover, but don’t wait forever either.
Add some randomness. This spreads out retry attempts across clients and helps prevent the dreaded retry storm (also called a thundering herd).
Use the right tool for the job. You don’t need exponential back off with decorrelated jitter for a script running on your laptop. Don’t jump through invisible hoops. Keep it practical, not theatrical.
This isn’t just for APIs. Back off strategies apply to all kinds of networked systems, message queues, database connections, RPC calls, IoT, and more. Anywhere you're making requests across a flaky boundary, these principles can help.

DEV Community