Kevin Kimani

Posted on Nov 20 • Originally published at embedded.gusto.com

Defensive Programming: A Guide to Building Resilient API Clients

#defensiveprogramming #api #apiclients #programming

Imagine this: it's payday, and my system calls a payroll application programming interface (API) to process salaries. The request times out, so my client retries it. However, the initial request actually went through, but I just didn't receive the confirmation. Suddenly, every employee is double-paid, or worse, no one gets paid at all. One small glitch on the network level just turned into a very expensive mistake that can cause serious financial and compliance issues.

Thankfully, defensive programming can help. Instead of assuming that everything works as expected, defensive programming involves writing API clients that anticipate failures and protect against them. Networks are unreliable, services have downtimes, and bugs happen, but a resilient client treats these as the norm, not an exception. By designing with failure in mind, you can avoid critical errors, like duplicate charges, missed payments, or cascading retries that overwhelm a service.

In this guide, I'll share techniques for building resilient API clients. I show how to implement retries safely using exponential backoff and why idempotent requests are important. I illustrate these points with examples of Gusto Embedded's Payroll APIs.

Understand Why Defensive Programming Matters for API Clients

We all know APIs don't run in perfect environments. Failures are not only possible but expected. You can see everything from transient network drops and HTTP 500 errors under peak load to rate-limit responses like HTTP 429 Too Many Requests. Even DNS resolution failures or partial timeouts can cause requests to hang indefinitely. Even with robust infrastructure, no API can guarantee 100 percent uptime or consistency. If my API client design assumes success by default, one minor issue can break my integration and ripple through the rest of my system.

In domains with high stakes like payroll, defensive programming is even more important. If retries aren't handled correctly, duplicate adjustments or repeated tax withholdings can occur, leading to inaccurate payroll results. These issues can harm trust, create compliance risks, and be costly to fix.

A defensive mindset is critical to building resilient API clients. Instead of assuming every request succeeds, assume the requests may be lost, delayed, or duplicated. Design each interaction with the expectation of failure. When you plan for failure at every step, your client can keep functioning gracefully even when the API or network is unpredictable.

Build Resilient API Clients Using Defensive Programming

Resilient API clients aren't built with a single technique but through a set of practices designed to predict and contain failure.

Handle Failures Correctly with Retries

Retries are the primary tool developers reach for when dealing with unreliable APIs. The idea is this: if a request fails, send it again. However, not all failures are created equal, and retrying blindly can do more harm than good.

Manage Transient and Permanent Failures

Before retrying after an error, you need to decide whether you should retry or stop and fail fast. The distinction between transient and permanent failures helps guide that logic.

Transient errors are faults that may resolve on their own if you try again. Examples include network timeouts, service overloads, or temporary outages in downstream systems. In HTTP, you often see status codes like 500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable, or even 504 Gateway Timeout. These are signals that the server isn't able to fulfill the request right now, but may be able to later. In these cases, retrying makes sense, provided you back off appropriately.

A special case worth mentioning is 429 Too Many Requests. This indicates that your client has hit a rate limit. While it's technically a transient error, you shouldn't retry immediately. Instead, check the Retry-After header in the response (if present) and wait for the specified duration before sending another request. This ensures your client respects server limits and avoids being throttled further.

Permanent errors reflect problems that can't be resolved with a retry. These include 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, and 422 Unprocessable Entity due to invalid parameters. Once one of these errors is returned, retrying is futile. Instead, your client should fail fast and surface the error for intervention.

Prevent the Risk of Naive Retry Strategies

Naive retry strategies can introduce new problems. If every client retries immediately after a timeout, the server can become overwhelmed by a surge of duplicate requests. Even worse, a poorly designed client can execute the same operation twice, leading to unintended side effects, like submitting the same payroll run multiple times. Error handling must go beyond simply trying again.

Combine Exponential Backoff with Jitter

To avoid these pitfalls, the industry standard is to combine exponential backoff with jitter. Exponential backoff spaces out each retry by increasing the delay each time: wait one second, then two, then four, then eight, and so on. Jitter adds a random variation to these wait times, ensuring multiple clients don't retry at the same time and overwhelm the server.

Here's a simple Python example showing exponential backoff with jitter when calling an API endpoint. This example uses a GET request for simplicity. The same retry logic applies to POST, PUT, PATCH, and DELETE operations as well. Just make sure those operations are idempotent before retrying so that multiple attempts do not cause unintended side effects:

import requests
import time
import random

def make_request_with_retries(url, max_retries=5):
   last_exception = None

   for attempt in range(max_retries):
       try:
           response = requests.get(
               url,
               headers={
                   "accept": "application/json",
                   "X-Gusto-API-Version": "2025-06-15",
                   "authorization": "Bearer <YOUR-COMPANY-API-TOKEN>"
               },
               timeout=5
           )
           response.raise_for_status()
           return response.json()

       except requests.exceptions.RequestException as e:
           last_exception = e

           # Don't retry on client errors like 400 or 401
           if response is not None and 400 <= e.response.status_code < 500:
               raise e

           # Don't sleep after the last attempt
           if attempt < max_retries - 1:
               wait_time = (2 ** attempt) + random.uniform(0, 1)
               print(f"Retry {attempt + 1}/{max_retries} failed: {e}. Retrying in {wait_time:.2f}s...")
               time.sleep(wait_time)
           else:
               print(f"Retry {attempt + 1}/{max_retries} failed: {e}. Max retries reached.")

   raise last_exception or Exception("Max retries reached. Request failed.")

# Example usage:
data = make_request_with_retries("https://api.gusto-demo.com/v1/payrolls/<PAYROLL_UUID>/receipt")

This code attempts to get a single payroll receipt with increasing wait times between retries. Each failure triggers an exponential delay (one second, two seconds, four seconds, and so on) with a bit of randomness added (the jitter) to avoid synchronized retry attempts. If all attempts fail, it raises the last exception so that the calling code can log or handle it appropriately.

Handle Idempotency: Design Safe Retries

Retries are only truly safe when they're paired with idempotency. In the context of APIs, idempotency means that a request can be executed multiple times but only produces the same outcome as the initial successful execution. It's the safeguard that prevents retries from accidentally causing duplicate side effects.

In real-world systems, idempotency protects against the unpredictable behavior of networks and clients. Imagine a user submitting a payment form or signing up for an account, and their browser freezes midway. If they refresh and the client resends the same request, idempotency ensures the user isn't charged twice or created as a duplicate record (in cases where there are no unique identifiers, such as username or email).

The most common way to achieve this is through a unique identifier that links all the attempts of a request to a single logical operation. The Gusto Embedded Payroll APIs support this pattern with the Idempotency-Key header. When a client submits a request with an idempotency key, the server stores both the request and its result. If the client retries with the same key against the same endpoint, the server doesn't run the operation again; it simply returns the previously stored result.

Version-based object management takes this a step further. For update operations, the API requires a version field that helps prevent race conditions. If the resource has changed since the client's last request, the API returns a 409 Conflict instead of silently overwriting data. This guarantees that retries don't create conflicting state changes even under concurrent operations.

This approach is clearly important in payroll systems where correctness is important. Imagine creating a payroll run for your workforce. Without an idempotency key, a retry after a timeout may be interpreted as a second payroll run, leading to employees being double-paid and the company facing compliance issues. With idempotency in place, all retries are tied back to the same logical run, so employees are paid exactly once, no matter how many times the request is retried.

Here's an example of how you can send a payroll creation request with an idempotency key header to guarantee safe retries:

import uuid
import requests

def create_payroll(company_id, idempotency_key, payroll_data):
   headers = {
       "Authorization": "Bearer <YOUR-COMPANY-ACCESS-TOKEN>",
       "accept": "application/json",
       "X-Gusto-API-Version": "2025-06-15",
       "Content-Type": "application/json",
       "Idempotency-Key": idempotency_key  # unique key for this operation,
   }

   url = f"https://api.gusto-demo.com/v1/companies/{company_id}/payrolls"

   response = requests.post(
       url,
       json=payroll_data,
       headers=headers
   )
   response.raise_for_status()
   return response.json()

# Example usage
idempotency_key = str(uuid.uuid4())
payload = {
  "off_cycle": True,
  "off_cycle_reason": "Bonus",
  "start_date": "2025-11-11",
  "end_date": "2025-11-18"
}

result = create_payroll(company_id='123', idempotency_key=idempotency_key, payroll_data=payload)
print(result)

This code sends a create payroll request to Gusto while using an idempotency key to ensure safe retries. It generates a unique key with uuid.uuid4() and passes it in the Idempotency-Key header so the API can recognize repeat attempts as the same operation.

Implement Resilient API Client Patterns

Once you understand retries and idempotency, the next step is to apply them consistently. Resilient API clients aren't just a collection of one-off fixes; they're built around a set of design patterns that treat reliability as a core concern. These patterns assume that every request can fail, and they put safeguards in place so those failures never compromise correctness.

Building resilience also means thinking beyond a single API call. In distributed systems, dependencies can go down, networks can degrade, and partial failures can ripple across services. Robust clients use techniques, such as fallback services (eg using cached data or a secondary endpoint when the primary is unavailable), graceful degradation (serving limited functionality instead of breaking completely), and circuit breakers to prevent cascading failures. Together with retries, timeouts, and idempotency, these techniques form the foundation of reliability in APIs.

Here are the key patterns that guide the design of reliable API clients:

Pattern	Why It Matters	Example in Practice
Treat every request as if it can fail.	Networks and servers are inherently unreliable. Coding as if success is guaranteed leads to brittle integrations.	Always implement timeouts, retries, and error handling logic.
Ensure critical operations are idempotent.	Without idempotency, retries can trigger duplicate side effects, like double payments.	Use an `Idempotency-Key` (or equivalent identifier) to safely retry operations. For example, Gusto supports this through the `Idempotency-Key` header.
Monitor and log retry behavior.	Silent retries hide systemic issues. Visibility is needed for debugging and capacity planning.	Record retry attempts and their outcomes in your client logs.
Back off intelligently.	Immediate retries amplify the load and risk further failure. Exponential backoff with jitter smooths traffic spikes.	Retry requests with increasing wait times and randomized delays.
Fail fast on permanent errors.	Retrying invalid requests wastes resources and delays recovery.	Stop retries when the API returns `4xx` client errors like `401 Unauthorized`.

When applied together, these patterns transform an integration from fragile to dependable. Instead of breaking down under real-world conditions, the client absorbs failures, retries safely, and communicates clearly when something is wrong.

Optimistic Version Control

Another method of ensuring data integrity is utilizing optimistic version control. Object-based versions—essentially snapshots of a given resource—are used to process concurrent updates correctly, ensuring data integrity in a multi-user environment. It is a form of optimistic concurrency control, where the system 'assumes' that conflicts are rare and only checks for them at the time of an update.

Conclusion

Building resilient API clients means embracing the reality that things fail: networks drop, servers timeout, and dependencies falter. You need to anticipate these failures—handle retries with exponential backoff and make requests idempotent so developers can ensure their integrations behave predictably, even under stress.

DEV Community