Timeouts and Circuit Breakers: Stop One Slow API From Taking Down Your Whole App

#api #webdev #tutorial #node

When you call another service over HTTP, you are inheriting its worst day. If that dependency slows to a crawl, every request you make to it piles up, holds a connection, and eventually drags your service down with it. The fix is two old, unglamorous patterns: aggressive timeouts and a circuit breaker. Here is how to implement both in Node.js with no framework.

Step 1: Never make a request without a timeout

The single most common production incident is a missing timeout. By default, most HTTP clients will wait forever. One stalled dependency is enough to exhaust your connection pool.

async function fetchWithTimeout(url, { timeoutMs = 2000, ...options } = {}) {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);
  try {
    return await fetch(url, { ...options, signal: controller.signal });
  } finally {
    clearTimeout(timer);
  }
}

AbortController gives you a hard ceiling. If the dependency hasn't answered in 2 seconds, you fail fast and free the connection instead of letting it hang.

Step 2: Stop hammering a service that's already down

Timeouts protect a single request. But if a dependency is fully down, retrying every call still wastes 2 seconds each and keeps the pressure on. A circuit breaker tracks failures and, once they cross a threshold, "opens" — rejecting calls instantly without even trying the network. After a cooldown it lets one probe request through to see if the service has recovered.

class CircuitBreaker {
  constructor({ failureThreshold = 5, cooldownMs = 10000 } = {}) {
    this.failureThreshold = failureThreshold;
    this.cooldownMs = cooldownMs;
    this.failures = 0;
    this.state = 'CLOSED';      // CLOSED | OPEN | HALF_OPEN
    this.nextAttempt = 0;
  }

  async call(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit is OPEN — failing fast');
      }
      this.state = 'HALF_OPEN'; // time to test the waters
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }

  onSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failures += 1;
    if (this.failures >= this.failureThreshold) {
      this.state = 'OPEN';
      this.nextAttempt = Date.now() + this.cooldownMs;
    }
  }
}

Step 3: Wire them together

Now combine the timeout and the breaker. The breaker wraps the timed request, so a slow dependency counts as a failure and eventually trips the circuit.

const breaker = new CircuitBreaker({ failureThreshold: 5, cooldownMs: 10000 });

async function getUser(id) {
  return breaker.call(async () => {
    const res = await fetchWithTimeout(`https://api.example.com/users/${id}`, {
      timeoutMs: 2000,
    });
    if (!res.ok) throw new Error(`Upstream returned ${res.status}`);
    return res.json();
  });
}

The three states tell the whole story:

CLOSED — normal operation, requests flow through and failures are counted.
OPEN — too many failures; calls are rejected instantly for the cooldown window.
HALF_OPEN — one probe request is allowed; success closes the circuit, failure re-opens it.

Step 4: Always have a fallback

Failing fast is only useful if you have something to do with the failure. Degrade gracefully instead of returning a 500.

async function getUserSafe(id) {
  try {
    return await getUser(id);
  } catch (err) {
    // Serve stale cache, a default, or a partial response
    return { id, name: 'Unknown', degraded: true };
  }
}

A few rules that keep this honest

Set timeouts based on real latency percentiles, not a guess — start near the p99 of the dependency and tune. Keep one breaker per dependency, never a global one, or a single sick service will trip calls to healthy ones. And always log state transitions: an OPEN circuit is one of the highest-signal alerts you can have, because it tells you a dependency is failing before your users start complaining.

Timeouts and circuit breakers are about ten lines of code each, and together they turn a cascading outage into a localized, recoverable blip.

Testing resilience behavior by hand is tedious — you need to simulate slow responses, forced errors, and recovery. APIKumo makes it easy to mock failing and slow endpoints, script multi-step request flows, and replay them so you can watch your timeouts and breaker actually trip before you ship them to production.