DEV Community

Cover image for Building Bulletproof APIs: A Complete Guide to Error Handling and Retry Strategies
Flud
Flud

Posted on

Building Bulletproof APIs: A Complete Guide to Error Handling and Retry Strategies

Because your users shouldn't suffer when third-party services have a bad day

Every developer has been there: your app works perfectly in development, but in production, external APIs fail, networks timeout, and suddenly your users are staring at cryptic error messages. Today, we'll build a robust system that gracefully handles failures and keeps your application running smoothly.

Why This Matters More Than You Think

Consider this scenario: your e-commerce app depends on a payment processor, shipping API, and inventory service. If any of these fails without proper handling, you could lose sales, frustrate customers, or worse - corrupt data.

According to recent studies, the average API has a 99.5% uptime, which sounds great until you realize that's 3.65 hours of downtime per month. For a system calling 10 different APIs, the combined reliability drops significantly.

The Anatomy of Resilient API Calls

Let's start with a real-world example. Here's what most developers write initially:

// ❌ Fragile approach
async function getUserProfile(userId) {
  const response = await fetch(`/api/users/${userId}`);
  const data = await response.json();
  return data;
}
Enter fullscreen mode Exit fullscreen mode

This code is a disaster waiting to happen. Let's transform it step by step.

Step 1: Proper Error Handling

// ✅ Better approach with basic error handling
async function getUserProfile(userId) {
  try {
    const response = await fetch(`/api/users/${userId}`);

    if (!response.ok) {
      throw new APIError(`HTTP ${response.status}: ${response.statusText}`);
    }

    const data = await response.json();
    return data;
  } catch (error) {
    console.error('Failed to fetch user profile:', error);
    throw error;
  }
}

class APIError extends Error {
  constructor(message, status, isRetryable = false) {
    super(message);
    this.name = 'APIError';
    this.status = status;
    this.isRetryable = isRetryable;
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Smart Retry Logic

Not all errors should trigger retries. Here's how to implement intelligent retry logic:

// ✅ Smart retry mechanism
async function makeResilientRequest(url, options = {}) {
  const {
    maxRetries = 3,
    baseDelay = 1000,
    maxDelay = 10000,
    timeout = 5000,
    retryCondition = (error) => error.isRetryable
  } = options;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      const controller = new AbortController();
      const timeoutId = setTimeout(() => controller.abort(), timeout);

      const response = await fetch(url, {
        ...options,
        signal: controller.signal
      });

      clearTimeout(timeoutId);

      if (response.ok) {
        return await response.json();
      }

      // Determine if error is retryable
      const isRetryable = response.status >= 500 || 
                         response.status === 429 || 
                         response.status === 408;

      const error = new APIError(
        `HTTP ${response.status}: ${response.statusText}`,
        response.status,
        isRetryable
      );

      if (attempt === maxRetries || !isRetryable) {
        throw error;
      }

      // Exponential backoff with jitter
      const delay = Math.min(
        baseDelay * Math.pow(2, attempt) + Math.random() * 1000,
        maxDelay
      );

      console.warn(`Attempt ${attempt + 1} failed, retrying in ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));

    } catch (error) {
      if (attempt === maxRetries || !retryCondition(error)) {
        throw error;
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Circuit Breaker Pattern

Prevent cascading failures with the circuit breaker pattern:

class CircuitBreaker {
  constructor(options = {}) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 60000;
    this.monitoringPeriod = options.monitoringPeriod || 10000;

    this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
    this.failureCount = 0;
    this.lastFailureTime = null;
    this.successCount = 0;
  }

  async execute(operation) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime < this.resetTimeout) {
        throw new Error('Circuit breaker is OPEN');
      } else {
        this.state = 'HALF_OPEN';
        this.successCount = 0;
      }
    }

    try {
      const result = await operation();

      if (this.state === 'HALF_OPEN') {
        this.successCount++;
        if (this.successCount >= 3) {
          this.reset();
        }
      } else {
        this.reset();
      }

      return result;
    } catch (error) {
      this.recordFailure();
      throw error;
    }
  }

  recordFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();

    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  reset() {
    this.state = 'CLOSED';
    this.failureCount = 0;
    this.successCount = 0;
    this.lastFailureTime = null;
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Putting It All Together

Here's a production-ready API client that combines all these patterns:

class ResilientAPIClient {
  constructor(baseURL, options = {}) {
    this.baseURL = baseURL;
    this.circuitBreaker = new CircuitBreaker(options.circuitBreaker);
    this.defaultOptions = {
      maxRetries: 3,
      timeout: 5000,
      ...options
    };
  }

  async request(endpoint, options = {}) {
    const url = `${this.baseURL}${endpoint}`;
    const requestOptions = { ...this.defaultOptions, ...options };

    return this.circuitBreaker.execute(async () => {
      return makeResilientRequest(url, requestOptions);
    });
  }

  // Convenience methods
  async get(endpoint, options = {}) {
    return this.request(endpoint, { ...options, method: 'GET' });
  }

  async post(endpoint, data, options = {}) {
    return this.request(endpoint, {
      ...options,
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        ...options.headers
      },
      body: JSON.stringify(data)
    });
  }
}
Enter fullscreen mode Exit fullscreen mode

Usage Example

const apiClient = new ResilientAPIClient('https://api.example.com', {
  maxRetries: 3,
  timeout: 8000,
  circuitBreaker: {
    failureThreshold: 5,
    resetTimeout: 30000
  }
});

// Now your API calls are bulletproof
async function fetchUserData(userId) {
  try {
    const user = await apiClient.get(`/users/${userId}`);
    return user;
  } catch (error) {
    // Handle the error gracefully
    console.error('Failed to fetch user after all retries:', error);
    return { error: 'User data temporarily unavailable' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Advanced Patterns for Production

1. Rate Limiting Protection

class RateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.requests = [];
  }

  async acquire() {
    const now = Date.now();
    this.requests = this.requests.filter(time => now - time < this.windowMs);

    if (this.requests.length >= this.maxRequests) {
      const oldestRequest = Math.min(...this.requests);
      const waitTime = this.windowMs - (now - oldestRequest);
      await new Promise(resolve => setTimeout(resolve, waitTime));
    }

    this.requests.push(now);
  }
}
Enter fullscreen mode Exit fullscreen mode

2. Request Deduplication

class RequestDeduplicator {
  constructor() {
    this.pendingRequests = new Map();
  }

  async dedupe(key, requestFn) {
    if (this.pendingRequests.has(key)) {
      return this.pendingRequests.get(key);
    }

    const promise = requestFn().finally(() => {
      this.pendingRequests.delete(key);
    });

    this.pendingRequests.set(key, promise);
    return promise;
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Health Checking

class HealthChecker {
  constructor(apiClient) {
    this.apiClient = apiClient;
    this.isHealthy = true;
    this.lastCheck = null;
  }

  async checkHealth() {
    try {
      await this.apiClient.get('/health');
      this.isHealthy = true;
    } catch (error) {
      this.isHealthy = false;
    }
    this.lastCheck = Date.now();
  }

  shouldAttemptRequest() {
    const fiveMinutesAgo = Date.now() - 5 * 60 * 1000;
    return this.isHealthy || !this.lastCheck || this.lastCheck < fiveMinutesAgo;
  }
}
Enter fullscreen mode Exit fullscreen mode

Monitoring and Observability

Don't forget to add comprehensive logging:

class APIMetrics {
  constructor() {
    this.metrics = {
      requests: 0,
      failures: 0,
      retries: 0,
      circuitBreakerTrips: 0
    };
  }

  recordRequest() { this.metrics.requests++; }
  recordFailure() { this.metrics.failures++; }
  recordRetry() { this.metrics.retries++; }
  recordCircuitBreakerTrip() { this.metrics.circuitBreakerTrips++; }

  getSuccessRate() {
    return (this.metrics.requests - this.metrics.failures) / this.metrics.requests;
  }
}
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. Always handle network failures - They're inevitable, not exceptional
  2. Implement smart retries - Not all errors should trigger retries
  3. Use exponential backoff with jitter - Prevents thundering herd problems
  4. Implement circuit breakers - Fail fast when services are down
  5. Add comprehensive monitoring - You can't fix what you can't see
  6. Test failure scenarios - Use tools like Chaos Monkey in staging

Testing Your Resilient APIs

// Mock a flaky API for testing
class FlakyAPI {
  constructor(failureRate = 0.3) {
    this.failureRate = failureRate;
  }

  async get(url) {
    if (Math.random() < this.failureRate) {
      throw new Error('Network error');
    }
    return { data: 'success' };
  }
}

// Test your resilient client
async function testResilience() {
  const flakyAPI = new FlakyAPI(0.7); // 70% failure rate
  const client = new ResilientAPIClient(flakyAPI);

  const results = await Promise.allSettled(
    Array(10).fill().map(() => client.get('/test'))
  );

  console.log('Success rate:', results.filter(r => r.status === 'fulfilled').length / 10);
}
Enter fullscreen mode Exit fullscreen mode

Building resilient APIs isn't just about preventing errors—it's about creating systems that gracefully degrade and recover automatically. Your users will thank you, your on-call rotations will be quieter, and you'll sleep better knowing your applications can handle whatever the internet throws at them.

Have you implemented similar patterns in your applications? Share your experiences and additional strategies in the comments below!


Found this helpful? Follow me for more deep dives into building production-ready applications that actually work in the real world.

Top comments (0)