Designing for Failure: Circuit Breakers and Bulkheads in Node.js
Distributed systems fail. The question isn't if — it's how gracefully.
Circuit Breaker Pattern
A circuit breaker wraps external calls and short-circuits after repeated failures, preventing cascade failures:
import CircuitBreaker from 'opossum';
const options = {
timeout: 3000, // If fn takes longer than 3s, trigger failure
errorThresholdPercent: 50, // Open circuit when 50% of requests fail
resetTimeout: 30000, // Try again after 30s
};
const breaker = new CircuitBreaker(callExternalAPI, options);
breaker.fallback(() => ({ data: [], fromCache: true }));
breaker.on('open', () => console.warn('Circuit opened — external API failing'));
breaker.on('halfOpen', () => console.info('Circuit half-open — testing recovery'));
breaker.on('close', () => console.info('Circuit closed — external API recovered'));
// Usage
const result = await breaker.fire(requestData);
Bulkhead Pattern
Bulkheads isolate failure. Like a ship's compartments — one flooded compartment doesn't sink the ship:
import Bottleneck from 'bottleneck';
// Separate rate limiters for different external services
// A spike in Stripe calls won't impact email sending
const stripeLimiter = new Bottleneck({ maxConcurrent: 10, minTime: 100 });
const emailLimiter = new Bottleneck({ maxConcurrent: 5, minTime: 200 });
const aiLimiter = new Bottleneck({ maxConcurrent: 3, minTime: 500 });
// Wrap calls with their respective limiters
const chargeCard = stripeLimiter.wrap(stripe.charges.create.bind(stripe.charges));
const sendEmail = emailLimiter.wrap(resend.emails.send.bind(resend.emails));
const callClaude = aiLimiter.wrap(anthropic.messages.create.bind(anthropic.messages));
Timeout + Retry Pattern
async function withRetry<T>(
fn: () => Promise<T>,
{ attempts = 3, delay = 1000, timeout = 5000 }: RetryOptions = {}
): Promise<T> {
for (let i = 0; i < attempts; i++) {
try {
return await Promise.race([
fn(),
new Promise<never>((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), timeout)
),
]);
} catch (err) {
if (i === attempts - 1) throw err;
await new Promise(r => setTimeout(r, delay * Math.pow(2, i)));
}
}
throw new Error('unreachable');
}
Health Checks
app.get('/health', async (req, res) => {
const checks = await Promise.allSettled([
db.$queryRaw`SELECT 1`,
redis.ping(),
]);
const status = checks.every(c => c.status === 'fulfilled') ? 200 : 503;
res.status(status).json({
db: checks[0].status,
redis: checks[1].status,
});
});
Resilience patterns — circuit breakers, bulkheads, health checks — are production-ready in the AI SaaS Starter Kit.
Top comments (0)