When your application depends on external APIs that you don't control, failures are not a question of "if" but "when." X's API rate-limits you. Your proxy provider has an outage. The AI model endpoint returns 503s for 20 minutes.
The question is: does one failure cascade into total system failure, or does your system degrade gracefully?
We built a circuit breaker system for HelperX that keeps healthy slots running when unhealthy ones fail. Here's the implementation.
The cascade problem
Without circuit breakers, here's what happens when a proxy goes down:
- Slot A sends a request → proxy timeout (30 seconds)
- Slot A retries → another timeout (30 seconds)
- Slot A retries again → third timeout (30 seconds)
- Meanwhile, the scheduler is blocked for 90 seconds
- Other modules in Slot A queue up behind the blocked request
- If the proxy is truly dead, this loop continues indefinitely
- Node.js event loop gets congested with pending timeouts
- Other slots start experiencing delayed scheduling
One dead proxy degrades the entire system. With 200 slots, one bad proxy shouldn't affect 199 healthy ones.
The circuit breaker pattern
A circuit breaker sits between your application and an external dependency. It has three states:
┌──────────┐
│ CLOSED │ ← Normal operation. Requests pass through.
└────┬─────┘
│ failures >= threshold
▼
┌──────────┐
│ OPEN │ ← Requests fail immediately. No network calls.
└────┬─────┘
│ after resetTimeout
▼
┌───────────┐
│ HALF-OPEN │ ← Allow one test request through.
└─────┬─────┘
│
┌─────┴──────┐
│ success? │
├─yes────────┤──► CLOSED (resume normal)
└─no─────────┘──► OPEN (wait longer)
Implementation
class CircuitBreaker {
constructor(name, options = {}) {
this.name = name;
this.state = 'closed';
this.failures = 0;
this.successes = 0;
this.lastFailure = null;
this.lastAttempt = null;
this.threshold = options.threshold || 5;
this.resetTimeout = options.resetTimeout || 60_000;
this.halfOpenMax = options.halfOpenMax || 1;
this.onStateChange = options.onStateChange || (() => {});
}
async execute(fn) {
if (this.state === 'open') {
if (Date.now() - this.lastFailure >= this.resetTimeout) {
this.transition('half-open');
} else {
throw new CircuitOpenError(
`Circuit ${this.name} is open. ` +
`Resets in ${this.timeUntilReset()}ms`
);
}
}
if (this.state === 'half-open') {
// Only allow limited requests through
if (this.halfOpenAttempts >= this.halfOpenMax) {
throw new CircuitOpenError(
`Circuit ${this.name} is half-open, max attempts reached`
);
}
this.halfOpenAttempts++;
}
this.lastAttempt = Date.now();
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure(err);
throw err;
}
}
onSuccess() {
this.failures = 0;
this.successes++;
if (this.state === 'half-open') {
this.transition('closed');
}
}
onFailure(err) {
this.failures++;
this.lastFailure = Date.now();
this.lastError = err;
if (this.failures >= this.threshold) {
this.transition('open');
}
}
transition(newState) {
const oldState = this.state;
this.state = newState;
if (newState === 'half-open') {
this.halfOpenAttempts = 0;
}
this.onStateChange({
name: this.name,
from: oldState,
to: newState,
failures: this.failures,
lastError: this.lastError
});
}
timeUntilReset() {
if (this.state !== 'open') return 0;
return Math.max(0,
this.resetTimeout - (Date.now() - this.lastFailure)
);
}
getStatus() {
return {
name: this.name,
state: this.state,
failures: this.failures,
successes: this.successes,
lastFailure: this.lastFailure,
timeUntilReset: this.timeUntilReset()
};
}
}
class CircuitOpenError extends Error {
constructor(message) {
super(message);
this.name = 'CircuitOpenError';
this.isCircuitOpen = true;
}
}
Per-slot circuit breakers
Each slot gets its own circuit breaker for each external dependency:
class SlotDependencies {
constructor(slotId) {
this.slotId = slotId;
this.proxy = new CircuitBreaker(`${slotId}:proxy`, {
threshold: 3,
resetTimeout: 120_000, // 2 minutes
onStateChange: (e) => this.logStateChange(e)
});
this.ai = new CircuitBreaker(`${slotId}:ai`, {
threshold: 5,
resetTimeout: 60_000, // 1 minute
onStateChange: (e) => this.logStateChange(e)
});
this.api = new CircuitBreaker(`${slotId}:api`, {
threshold: 3,
resetTimeout: 300_000, // 5 minutes (rate limits are longer)
onStateChange: (e) => this.logStateChange(e)
});
}
logStateChange(event) {
const db = getDb(this.slotId);
db.prepare(`
INSERT INTO audit_log (id, module, action, status, detail, timestamp)
VALUES (?, 'system', 'circuit_breaker', ?, ?, datetime('now'))
`).run(
crypto.randomUUID(),
event.to === 'open' ? 'warning' : 'info',
`${event.name}: ${event.from} → ${event.to} (${event.failures} failures)`
);
}
}
When Slot A's proxy circuit opens, Slot A stops sending requests through that proxy. Slots B through Z continue normally — they have their own circuit breakers with their own state.
Using circuit breakers in the scheduler
async function executeModuleAction(slotId, module) {
const deps = getSlotDependencies(slotId);
// Step 1: Find a tweet to reply to (uses proxy)
let tweet;
try {
tweet = await deps.proxy.execute(() =>
searchTweets(slotId, module.config.query)
);
} catch (err) {
if (err.isCircuitOpen) {
logAudit(slotId, module.name, 'skipped',
`Proxy circuit open, resets in ${deps.proxy.timeUntilReset()}ms`);
return;
}
throw err;
}
// Step 2: Generate AI reply (uses AI endpoint)
let reply;
try {
reply = await deps.ai.execute(() =>
generateReply(slotId, tweet, module.config.persona)
);
} catch (err) {
if (err.isCircuitOpen) {
logAudit(slotId, module.name, 'skipped',
`AI circuit open, resets in ${deps.ai.timeUntilReset()}ms`);
return;
}
throw err;
}
// Step 3: Send the reply (uses proxy + API)
try {
await deps.proxy.execute(() =>
deps.api.execute(() =>
sendReply(slotId, tweet.id, reply)
)
);
} catch (err) {
if (err.isCircuitOpen) {
logAudit(slotId, module.name, 'skipped',
`Circuit open: ${err.message}`);
return;
}
throw err;
}
logAudit(slotId, module.name, 'success', reply);
}
Each step of the action is wrapped in its own circuit breaker. If the AI is down but the proxy is fine, the system skips AI-dependent modules but can still run non-AI modules (scheduled posts, reposts).
Monitoring circuit state
The dashboard shows circuit breaker state for each slot:
function getSystemHealth() {
const slots = getAllActiveSlots();
return slots.map(slot => {
const deps = getSlotDependencies(slot.id);
return {
slotId: slot.id,
proxy: deps.proxy.getStatus(),
ai: deps.ai.getStatus(),
api: deps.api.getStatus(),
healthy: ['proxy', 'ai', 'api']
.every(dep => deps[dep].state === 'closed')
};
});
}
An operator sees at a glance which slots are healthy, which have open circuits, and when each circuit will attempt recovery.
Tuning thresholds
Default thresholds aren't universal. We tuned ours based on failure patterns:
| Dependency | Threshold | Reset timeout | Why |
|---|---|---|---|
| Proxy | 3 failures | 2 min | Proxy failures are usually transient. Quick retry. |
| AI model | 5 failures | 1 min | AI endpoints recover fast. Higher threshold to absorb occasional 503s. |
| X API | 3 failures | 5 min | Rate limits last 15 min. Longer reset avoids hammering. |
The key insight: reset timeout should match the expected recovery time of the dependency, not an arbitrary number.
What we learned
1. One circuit breaker per dependency per tenant. Global circuit breakers cause healthy tenants to suffer for unhealthy ones. Per-tenant isolation is the whole point.
2. Log state transitions. When a circuit opens, the audit log records it. This is the most valuable diagnostic information during incidents.
3. Graceful skip > hard failure. When a circuit is open, the action is skipped and logged — not retried, not errored, not queued. The scheduler moves to the next action. Queuing failures leads to thundering herds when the circuit closes.
4. Nested circuit breakers work. An action that uses proxy + API goes through both breakers. If either is open, the action is skipped. This handles compound failures cleanly.
5. Half-open state prevents oscillation. Without half-open, a circuit that closes immediately sends a burst of requests that may re-trigger the failure. Half-open allows exactly one test request, preventing the open/close/open oscillation.
HelperX uses per-slot circuit breakers to keep your accounts running independently — one bad proxy doesn't affect the rest. Free 30-day trial.
Top comments (0)