DEV Community

Cover image for Circuit Breaker Pattern in System Design
CodeWithDhanian
CodeWithDhanian

Posted on

Circuit Breaker Pattern in System Design

In distributed systems and microservices architectures, failures are inevitable. Network latency, service overload, database slowdowns, or third-party API outages can quickly cascade into widespread system instability. The Circuit Breaker Pattern serves as a critical resilience mechanism that prevents these cascading failures by intelligently isolating faulty components. Inspired by electrical circuit breakers that interrupt current flow during overloads, the software version acts as a protective proxy around remote calls, allowing systems to fail fast, degrade gracefully, and recover automatically.

This pattern is essential for building robust, highly available applications where services depend on each other across network boundaries. By monitoring failure rates and response times, the circuit breaker stops repeated attempts to reach an unhealthy service, giving it time to recover while providing immediate feedback or fallback responses to callers.

Understanding the Circuit Breaker Pattern

The Circuit Breaker Pattern functions as a stateful wrapper around operations that interact with external services or resources. Instead of allowing every request to reach a failing downstream service—which could overwhelm it further and degrade the entire system—the circuit breaker tracks metrics such as error counts, latency, or exceptions. When failure thresholds are breached, it “trips” and redirects traffic away from the problematic service.

Key benefits include:

  • Prevention of cascading failures
  • Reduction in resource consumption on both caller and callee sides
  • Faster response times through immediate failure detection
  • Graceful degradation via fallback mechanisms
  • Automatic recovery without manual intervention

The pattern works best when combined with complementary techniques such as retries with exponential backoff, timeouts, rate limiting, and bulkhead isolation.

The Three States of a Circuit Breaker

A circuit breaker maintains one of three distinct states, each dictating how incoming requests are handled. These states form a finite state machine that transitions based on observed behavior and configurable thresholds.

Closed State:

This is the normal operating state. All requests pass through to the protected service. The circuit breaker monitors outcomes, counting failures within a sliding time window or consecutive failure count. If the failure rate or count exceeds a predefined threshold (for example, 50% errors in the last 10 seconds or 5 consecutive failures), the breaker transitions to the Open state. Successes reset or decrement failure counters.

Open State:

When the circuit is open, the breaker immediately rejects all requests without forwarding them to the downstream service. This prevents further load on the failing component and avoids long timeouts or resource exhaustion. Instead, the caller receives an immediate exception or a fallback response. A timeout timer (reset timeout) starts, after which the breaker moves to the Half-Open state to test recovery.

Half-Open State:

This transitional state allows a limited number of test requests (often just one or a small configurable count) to reach the service. If these probe requests succeed, the circuit breaker assumes recovery and returns to the Closed state, resetting failure counters. If any test fails, the breaker reverts to the Open state and restarts the timeout period. This cautious probing ensures the service has truly stabilized before resuming full traffic.

These state transitions enable self-healing while protecting system stability.

Detailed Implementation of the Circuit Breaker Pattern

Implementing a circuit breaker from scratch requires careful handling of concurrency, metrics tracking, and state persistence. In production, developers typically use battle-tested libraries such as Resilience4j (Java), Hystrix (legacy Java), Polly (.NET), or pybreaker (Python). Below are complete, illustrative code structures.

Pseudocode for a Generic Circuit Breaker

class CircuitBreaker {
    enum State { CLOSED, OPEN, HALF_OPEN }

    State currentState = CLOSED;
    int failureCount = 0;
    int successCount = 0;
    long lastFailureTime = 0;
    Configuration config;  // failureThreshold, timeout, successThreshold, etc.

    Object execute(Callable operation) {
        if (currentState == OPEN) {
            if (isTimeoutExpired()) {
                transitionTo(HALF_OPEN);
            } else {
                return invokeFallback();  // or throw CircuitOpenException
            }
        }

        try {
            Object result = operation.call();
            onSuccess();
            return result;
        } catch (Exception e) {
            onFailure(e);
            return invokeFallback();
        }
    }

    private void onSuccess() {
        failureCount = 0;
        successCount++;
        if (currentState == HALF_OPEN && successCount >= config.successThreshold) {
            transitionTo(CLOSED);
        }
    }

    private void onFailure(Exception e) {
        failureCount++;
        lastFailureTime = currentTime();
        if (failureCount >= config.failureThreshold || currentState == HALF_OPEN) {
            transitionTo(OPEN);
        }
    }

    private boolean isTimeoutExpired() {
        return (currentTime() - lastFailureTime) > config.resetTimeout;
    }

    private void transitionTo(State newState) {
        currentState = newState;
        // Log state change, notify monitoring system
        if (newState == HALF_OPEN) {
            successCount = 0;
        }
    }

    private Object invokeFallback() {
        // Execute fallback logic, e.g., return cached data or default value
        return defaultResponse();
    }
}
Enter fullscreen mode Exit fullscreen mode

Java Example Using Resilience4j Style (Conceptual Full Structure)

// Configuration
CircuitBreakerConfig config = CircuitBreakerConfig.custom()
    .failureRateThreshold(50)           // Open if failure rate > 50%
    .waitDurationInOpenState(Duration.ofSeconds(30))  // Reset timeout
    .permittedNumberOfCallsInHalfOpenState(3)         // Test calls
    .slidingWindowSize(10)              // Window for metrics
    .build();

CircuitBreaker circuitBreaker = CircuitBreaker.of("paymentService", config);

// Decorator usage
Supplier<String> decoratedSupplier = CircuitBreaker.decorateSupplier(
    circuitBreaker, 
    () -> callPaymentService()  // remote call
);

// With fallback
String result = Try.ofSupplier(decoratedSupplier)
    .recover(throwable -> fallbackPaymentResponse())
    .get();
Enter fullscreen mode Exit fullscreen mode

Python Example Using a Simple Custom Implementation

import time
from enum import Enum
from typing import Callable, Any

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5, reset_timeout: int = 30, success_threshold: int = 2):
        self.state = CircuitState.CLOSED
        self.failure_threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.success_threshold = success_threshold
        self.failure_count = 0
        self.success_count = 0
        self.last_failure_time = 0

    def call(self, func: Callable, *args, **kwargs) -> Any:
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.reset_timeout:
                self.state = CircuitState.HALF_OPEN
                self.success_count = 0
            else:
                raise CircuitBreakerOpenException("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise e  # or handle with fallback

    def _on_success(self):
        self.failure_count = 0
        self.success_count += 1
        if self.state == CircuitState.HALF_OPEN and self.success_count >= self.success_threshold:
            self.state = CircuitState.CLOSED

    def _on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold or self.state == CircuitState.HALF_OPEN:
            self.state = CircuitState.OPEN

class CircuitBreakerOpenException(Exception):
    pass
Enter fullscreen mode Exit fullscreen mode

These implementations highlight essential elements: configurable thresholds, state management, fallback execution, and safe transitions. In real systems, thread-safety (using locks or atomic operations) and integration with monitoring tools like Prometheus are mandatory.

When and How to Use the Circuit Breaker Pattern

Apply the Circuit Breaker Pattern to any synchronous or asynchronous call to external services, databases, or third-party APIs where failure could propagate. Common scenarios include microservices communication, payment gateways, inventory checks, or recommendation engines.

Best practices:

  • Combine with timeouts to avoid indefinite waits.
  • Implement meaningful fallbacks—cached data, default values, or queued operations.
  • Monitor state transitions and metrics for observability.
  • Tune thresholds based on service characteristics and traffic patterns.
  • Ensure idempotency for operations that may be retried.

The pattern shines in high-traffic environments but adds slight overhead in normal operation due to metric collection. For extremely latency-sensitive paths, evaluate whether the protection justifies the cost.

Mastering the Circuit Breaker Pattern equips system designers with a powerful tool to build resilient, fault-tolerant distributed systems that maintain availability even when individual components fail.

Circuit breaker pattern diagram in design

System Design Handbook

For more in-depth insights and comprehensive coverage of system design topics, consider purchasing the System Design Handbook at https://codewithdhanian.gumroad.com/l/ntmcf. It will equip you with the knowledge to master complex distributed systems.

Buy me coffee to support my content at: https://ko-fi.com/codewithdhanian

Top comments (0)