When you call another service over HTTP, you are inheriting its worst day. If that dependency slows to a crawl, every request you make to it piles up, holds a connection, and eventually drags your service down with it. The fix is two old, unglamorous patterns: aggressive timeouts and a circuit breaker. Here is how to implement both in Node.js with no framework.
Step 1: Never make a request without a timeout
The single most common production incident is a missing timeout. By default, most HTTP clients will wait forever. One stalled dependency is enough to exhaust your connection pool.
async function fetchWithTimeout(url, { timeoutMs = 2000, ...options } = {}) {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
return await fetch(url, { ...options, signal: controller.signal });
} finally {
clearTimeout(timer);
}
}
AbortController gives you a hard ceiling. If the dependency hasn't answered in 2 seconds, you fail fast and free the connection instead of letting it hang.
Step 2: Stop hammering a service that's already down
Timeouts protect a single request. But if a dependency is fully down, retrying every call still wastes 2 seconds each and keeps the pressure on. A circuit breaker tracks failures and, once they cross a threshold, "opens" — rejecting calls instantly without even trying the network. After a cooldown it lets one probe request through to see if the service has recovered.
class CircuitBreaker {
constructor({ failureThreshold = 5, cooldownMs = 10000 } = {}) {
this.failureThreshold = failureThreshold;
this.cooldownMs = cooldownMs;
this.failures = 0;
this.state = 'CLOSED'; // CLOSED | OPEN | HALF_OPEN
this.nextAttempt = 0;
}
async call(fn) {
if (this.state === 'OPEN') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit is OPEN — failing fast');
}
this.state = 'HALF_OPEN'; // time to test the waters
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
onSuccess() {
this.failures = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failures += 1;
if (this.failures >= this.failureThreshold) {
this.state = 'OPEN';
this.nextAttempt = Date.now() + this.cooldownMs;
}
}
}
Step 3: Wire them together
Now combine the timeout and the breaker. The breaker wraps the timed request, so a slow dependency counts as a failure and eventually trips the circuit.
const breaker = new CircuitBreaker({ failureThreshold: 5, cooldownMs: 10000 });
async function getUser(id) {
return breaker.call(async () => {
const res = await fetchWithTimeout(`https://api.example.com/users/${id}`, {
timeoutMs: 2000,
});
if (!res.ok) throw new Error(`Upstream returned ${res.status}`);
return res.json();
});
}
The three states tell the whole story:
- CLOSED — normal operation, requests flow through and failures are counted.
- OPEN — too many failures; calls are rejected instantly for the cooldown window.
- HALF_OPEN — one probe request is allowed; success closes the circuit, failure re-opens it.
Step 4: Always have a fallback
Failing fast is only useful if you have something to do with the failure. Degrade gracefully instead of returning a 500.
async function getUserSafe(id) {
try {
return await getUser(id);
} catch (err) {
// Serve stale cache, a default, or a partial response
return { id, name: 'Unknown', degraded: true };
}
}
A few rules that keep this honest
Set timeouts based on real latency percentiles, not a guess — start near the p99 of the dependency and tune. Keep one breaker per dependency, never a global one, or a single sick service will trip calls to healthy ones. And always log state transitions: an OPEN circuit is one of the highest-signal alerts you can have, because it tells you a dependency is failing before your users start complaining.
Timeouts and circuit breakers are about ten lines of code each, and together they turn a cascading outage into a localized, recoverable blip.
Testing resilience behavior by hand is tedious — you need to simulate slow responses, forced errors, and recovery. APIKumo makes it easy to mock failing and slow endpoints, script multi-step request flows, and replay them so you can watch your timeouts and breaker actually trip before you ship them to production.
Top comments (0)