Because your users shouldn't suffer when third-party services have a bad day
Every developer has been there: your app works perfectly in development, but in production, external APIs fail, networks timeout, and suddenly your users are staring at cryptic error messages. Today, we'll build a robust system that gracefully handles failures and keeps your application running smoothly.
Why This Matters More Than You Think
Consider this scenario: your e-commerce app depends on a payment processor, shipping API, and inventory service. If any of these fails without proper handling, you could lose sales, frustrate customers, or worse - corrupt data.
According to recent studies, the average API has a 99.5% uptime, which sounds great until you realize that's 3.65 hours of downtime per month. For a system calling 10 different APIs, the combined reliability drops significantly.
The Anatomy of Resilient API Calls
Let's start with a real-world example. Here's what most developers write initially:
// ❌ Fragile approach
async function getUserProfile(userId) {
const response = await fetch(`/api/users/${userId}`);
const data = await response.json();
return data;
}
This code is a disaster waiting to happen. Let's transform it step by step.
Step 1: Proper Error Handling
// ✅ Better approach with basic error handling
async function getUserProfile(userId) {
try {
const response = await fetch(`/api/users/${userId}`);
if (!response.ok) {
throw new APIError(`HTTP ${response.status}: ${response.statusText}`);
}
const data = await response.json();
return data;
} catch (error) {
console.error('Failed to fetch user profile:', error);
throw error;
}
}
class APIError extends Error {
constructor(message, status, isRetryable = false) {
super(message);
this.name = 'APIError';
this.status = status;
this.isRetryable = isRetryable;
}
}
Step 2: Smart Retry Logic
Not all errors should trigger retries. Here's how to implement intelligent retry logic:
// ✅ Smart retry mechanism
async function makeResilientRequest(url, options = {}) {
const {
maxRetries = 3,
baseDelay = 1000,
maxDelay = 10000,
timeout = 5000,
retryCondition = (error) => error.isRetryable
} = options;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), timeout);
const response = await fetch(url, {
...options,
signal: controller.signal
});
clearTimeout(timeoutId);
if (response.ok) {
return await response.json();
}
// Determine if error is retryable
const isRetryable = response.status >= 500 ||
response.status === 429 ||
response.status === 408;
const error = new APIError(
`HTTP ${response.status}: ${response.statusText}`,
response.status,
isRetryable
);
if (attempt === maxRetries || !isRetryable) {
throw error;
}
// Exponential backoff with jitter
const delay = Math.min(
baseDelay * Math.pow(2, attempt) + Math.random() * 1000,
maxDelay
);
console.warn(`Attempt ${attempt + 1} failed, retrying in ${delay}ms`);
await new Promise(resolve => setTimeout(resolve, delay));
} catch (error) {
if (attempt === maxRetries || !retryCondition(error)) {
throw error;
}
}
}
}
Step 3: Circuit Breaker Pattern
Prevent cascading failures with the circuit breaker pattern:
class CircuitBreaker {
constructor(options = {}) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeout = options.resetTimeout || 60000;
this.monitoringPeriod = options.monitoringPeriod || 10000;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.failureCount = 0;
this.lastFailureTime = null;
this.successCount = 0;
}
async execute(operation) {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime < this.resetTimeout) {
throw new Error('Circuit breaker is OPEN');
} else {
this.state = 'HALF_OPEN';
this.successCount = 0;
}
}
try {
const result = await operation();
if (this.state === 'HALF_OPEN') {
this.successCount++;
if (this.successCount >= 3) {
this.reset();
}
} else {
this.reset();
}
return result;
} catch (error) {
this.recordFailure();
throw error;
}
}
recordFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
}
}
reset() {
this.state = 'CLOSED';
this.failureCount = 0;
this.successCount = 0;
this.lastFailureTime = null;
}
}
Step 4: Putting It All Together
Here's a production-ready API client that combines all these patterns:
class ResilientAPIClient {
constructor(baseURL, options = {}) {
this.baseURL = baseURL;
this.circuitBreaker = new CircuitBreaker(options.circuitBreaker);
this.defaultOptions = {
maxRetries: 3,
timeout: 5000,
...options
};
}
async request(endpoint, options = {}) {
const url = `${this.baseURL}${endpoint}`;
const requestOptions = { ...this.defaultOptions, ...options };
return this.circuitBreaker.execute(async () => {
return makeResilientRequest(url, requestOptions);
});
}
// Convenience methods
async get(endpoint, options = {}) {
return this.request(endpoint, { ...options, method: 'GET' });
}
async post(endpoint, data, options = {}) {
return this.request(endpoint, {
...options,
method: 'POST',
headers: {
'Content-Type': 'application/json',
...options.headers
},
body: JSON.stringify(data)
});
}
}
Usage Example
const apiClient = new ResilientAPIClient('https://api.example.com', {
maxRetries: 3,
timeout: 8000,
circuitBreaker: {
failureThreshold: 5,
resetTimeout: 30000
}
});
// Now your API calls are bulletproof
async function fetchUserData(userId) {
try {
const user = await apiClient.get(`/users/${userId}`);
return user;
} catch (error) {
// Handle the error gracefully
console.error('Failed to fetch user after all retries:', error);
return { error: 'User data temporarily unavailable' };
}
}
Advanced Patterns for Production
1. Rate Limiting Protection
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.requests = [];
}
async acquire() {
const now = Date.now();
this.requests = this.requests.filter(time => now - time < this.windowMs);
if (this.requests.length >= this.maxRequests) {
const oldestRequest = Math.min(...this.requests);
const waitTime = this.windowMs - (now - oldestRequest);
await new Promise(resolve => setTimeout(resolve, waitTime));
}
this.requests.push(now);
}
}
2. Request Deduplication
class RequestDeduplicator {
constructor() {
this.pendingRequests = new Map();
}
async dedupe(key, requestFn) {
if (this.pendingRequests.has(key)) {
return this.pendingRequests.get(key);
}
const promise = requestFn().finally(() => {
this.pendingRequests.delete(key);
});
this.pendingRequests.set(key, promise);
return promise;
}
}
3. Health Checking
class HealthChecker {
constructor(apiClient) {
this.apiClient = apiClient;
this.isHealthy = true;
this.lastCheck = null;
}
async checkHealth() {
try {
await this.apiClient.get('/health');
this.isHealthy = true;
} catch (error) {
this.isHealthy = false;
}
this.lastCheck = Date.now();
}
shouldAttemptRequest() {
const fiveMinutesAgo = Date.now() - 5 * 60 * 1000;
return this.isHealthy || !this.lastCheck || this.lastCheck < fiveMinutesAgo;
}
}
Monitoring and Observability
Don't forget to add comprehensive logging:
class APIMetrics {
constructor() {
this.metrics = {
requests: 0,
failures: 0,
retries: 0,
circuitBreakerTrips: 0
};
}
recordRequest() { this.metrics.requests++; }
recordFailure() { this.metrics.failures++; }
recordRetry() { this.metrics.retries++; }
recordCircuitBreakerTrip() { this.metrics.circuitBreakerTrips++; }
getSuccessRate() {
return (this.metrics.requests - this.metrics.failures) / this.metrics.requests;
}
}
Key Takeaways
- Always handle network failures - They're inevitable, not exceptional
- Implement smart retries - Not all errors should trigger retries
- Use exponential backoff with jitter - Prevents thundering herd problems
- Implement circuit breakers - Fail fast when services are down
- Add comprehensive monitoring - You can't fix what you can't see
- Test failure scenarios - Use tools like Chaos Monkey in staging
Testing Your Resilient APIs
// Mock a flaky API for testing
class FlakyAPI {
constructor(failureRate = 0.3) {
this.failureRate = failureRate;
}
async get(url) {
if (Math.random() < this.failureRate) {
throw new Error('Network error');
}
return { data: 'success' };
}
}
// Test your resilient client
async function testResilience() {
const flakyAPI = new FlakyAPI(0.7); // 70% failure rate
const client = new ResilientAPIClient(flakyAPI);
const results = await Promise.allSettled(
Array(10).fill().map(() => client.get('/test'))
);
console.log('Success rate:', results.filter(r => r.status === 'fulfilled').length / 10);
}
Building resilient APIs isn't just about preventing errors—it's about creating systems that gracefully degrade and recover automatically. Your users will thank you, your on-call rotations will be quieter, and you'll sleep better knowing your applications can handle whatever the internet throws at them.
Have you implemented similar patterns in your applications? Share your experiences and additional strategies in the comments below!
Found this helpful? Follow me for more deep dives into building production-ready applications that actually work in the real world.
Top comments (0)