Circuit Breaker Pattern Deep Guide: Building Resilient Distributed Systems
In distributed systems, a single service failure can cascade and cause the entire system to collapse. The Circuit Breaker Pattern is a core architectural pattern designed specifically to solve this critical problem.
Why Do We Need Circuit Breaker
Imagine a scenario where your microservice depends on an external payment gateway that normally responds within 100ms. But one day, the payment gateway experiences a failure and the response time jumps to 30 seconds. If you do not have proper protection:
- Resource Exhaustion: Requests pile up, thread pools exhausted
- Cascading Failure: Payment service unavailable, order service crashes
- System Avalanche: Entire system collapses within minutes
The core concept is similar to an electrical fuse - when anomalies detected, quickly trip to prevent failure spread.
Three States of Circuit Breaker
1. CLOSED (Normal Operation)
- Normal state, execute calls normally
- Record failures, transition to OPEN when threshold reached
2. OPEN (Tripped)
- Service unavailable, fail fast
- Return error or fallback response
- Transition to HALF_OPEN after cooling timeout
3. HALF_OPEN (Testing)
- Allow test requests
- If success → CLOSED
- If fail → OPEN
Core Implementation
import time
import threading
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60.0):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self._failure_count = 0
self._last_failure_time = None
self._state = CircuitState.CLOSED
self._lock = threading.Lock()
@property
def state(self):
with self._lock:
if self._state == CircuitState.OPEN:
if time.time() - self._last_failure_time >= self.recovery_timeout:
self._state = CircuitState.HALF_OPEN
return self._state
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
raise Exception("Circuit is OPEN")
try:
result = func(*args, **kwargs)
self._on_success()
return result
except Exception:
self._on_failure()
raise
def _on_success(self):
with self._lock:
self._failure_count = 0
self._state = CircuitState.CLOSED
def _on_failure(self):
with self._lock:
self._failure_count += 1
self._last_failure_time = time.time()
if self._failure_count >= self.failure_threshold:
self._state = CircuitState.OPEN
Practical Example
payment_circuit = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0)
def pay_order(order_id, amount):
try:
return payment_circuit.call(payment_gateway.charge, order_id)
except Exception:
# Fallback: queue for later retry
payment_queue.enqueue({"order_id": order_id, "amount": amount})
return {"status": "pending"}
Integration With Other Patterns
1. Retry Pattern
Works with circuit breaker in half-open state. Exponential backoff provides better results.
2. Bulkhead Pattern
Circuit breaker protects overall, bulkhead protects individual components.
3. Fallback Pattern
Return degraded response when circuit is open.
Framework Support
- Java: Spring @CircuitBreaker, Resilience4j
- Python: PyBreaker
- Go: Hystrix
Best Practices
- Set thresholds based on normal failure rates
- Monitor key metrics: state, failure rate, response time
- Set reasonable cooling time (30s to 5min)
- Always implement fallback handling
- Use distributed tracing for debugging
Summary
Circuit Breaker pattern is the foundation of building resilient distributed systems. By quickly tripping on failures, it prevents cascade failures and enables high-availability microservices architecture.
Top comments (0)