The LLM provider is having an incident. Your agent is making requests. Every request fails with 503. Your retry logic waits 2 seconds, tries again. Fails. Waits 4 seconds. Fails. Waits 8 seconds. Meanwhile, you are wasting API quota, burning retry delay, and potentially queueing up more requests behind these failing ones.
The circuit breaker pattern solves this. After N failures, the circuit opens. Open circuit means fail immediately without trying the provider. After a recovery timeout, the circuit half-opens. One test request goes through. If it succeeds, the circuit closes. If it fails, the circuit re-opens.
llm-circuit-breaker-py is the circuit breaker pattern for LLM provider calls.
The Shape of the Fix
from llm_circuit_breaker_py import CircuitBreaker, CircuitOpenError
breaker = CircuitBreaker(
failure_threshold=5, # Open after 5 consecutive failures
recovery_timeout=60, # Try again after 60 seconds
)
def call_llm_safe(**kwargs) -> dict:
try:
return breaker.call(anthropic_client.messages.create, **kwargs)
except CircuitOpenError as e:
raise RuntimeError(
f"Anthropic circuit is open. "
f"Retry after {e.retry_after:.0f}s. "
f"Use fallback provider."
)
5 failures in a row: circuit opens. Next 60 seconds: all calls raise CircuitOpenError immediately, no network calls. After 60 seconds: one test call goes through. If it succeeds, circuit closes and normal operation resumes.
What It Does NOT Do
llm-circuit-breaker-py does not automatically fail over to another provider. When the circuit is open, calls raise CircuitOpenError. Your code decides what to do: fail with an error, use a cached response, or route to a different provider via llm-fallback-chain.
It does not differentiate between error types. A network timeout and a 400 Bad Request both count as failures by default. For production use, you likely want to only open the circuit on server-side errors (5xx) and rate limits (429), not on client errors (4xx). Pass count_error_fn to filter which exceptions count.
It does not track failure rate, only consecutive failures. If you have 4 failures followed by 1 success followed by 4 more failures, the counter resets on the success. For a percentage-based failure rate threshold, you need a sliding window implementation.
Inside the Library
The three states and transitions:
from enum import Enum
import threading
import time
class State(Enum):
CLOSED = "CLOSED" # Normal: requests pass through
OPEN = "OPEN" # Failing: requests blocked
HALF_OPEN = "HALF_OPEN" # Testing: one request allowed
class CircuitBreaker:
def __init__(
self,
failure_threshold: int = 5,
recovery_timeout: float = 60.0,
count_error_fn=None,
):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self._count_error = count_error_fn or (lambda e: True)
self.state = State.CLOSED
self.failure_count = 0
self._opened_at: float | None = None
self._lock = threading.Lock()
def call(self, fn, *args, **kwargs):
with self._lock:
if self.state == State.OPEN:
elapsed = time.monotonic() - self._opened_at
if elapsed >= self.recovery_timeout:
self.state = State.HALF_OPEN
else:
retry_after = self.recovery_timeout - elapsed
raise CircuitOpenError(
state=self.state,
retry_after=retry_after,
failure_count=self.failure_count,
)
try:
result = fn(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure(e)
raise
def _on_success(self):
with self._lock:
self.failure_count = 0
self.state = State.CLOSED
self._opened_at = None
def _on_failure(self, error):
with self._lock:
if not self._count_error(error):
return # Don't count this error type
self.failure_count += 1
if self.state == State.HALF_OPEN:
# Test request failed: reopen
self.state = State.OPEN
self._opened_at = time.monotonic()
elif self.failure_count >= self.failure_threshold:
self.state = State.OPEN
self._opened_at = time.monotonic()
The CircuitOpenError carries actionable information:
@dataclass
class CircuitOpenError(Exception):
state: State
retry_after: float # seconds until next test request
failure_count: int
When to Use It
Use it on every external LLM API call. Provider incidents happen. Without a circuit breaker, your agent burns retries and delays against a provider that is not going to respond for the next 10 minutes.
Use it per provider in a multi-provider setup. Anthropic has its own breaker; OpenAI has its own. A breaker on Anthropic does not affect OpenAI calls. The breaker state for each provider is independent.
Use it with llm-fallback-chain. When the Anthropic circuit opens, CircuitOpenError tells the chain to skip Anthropic and try the next provider. The chain handles the routing; the breaker manages the state.
Use it with agent-event-bus to broadcast circuit state changes. Subscribe to llm.circuit_opened events to alert on-call when a provider circuit opens. Subscribe to llm.circuit_closed events when recovery completes.
Install
pip install git+https://github.com/MukundaKatta/llm-circuit-breaker-py
# Or from PyPI (429 cooldown pending)
pip install llm-circuit-breaker-py
from llm_circuit_breaker_py import CircuitBreaker, CircuitOpenError
from llm_fallback_chain import FallbackChain, Provider
anthropic_breaker = CircuitBreaker(
failure_threshold=5,
recovery_timeout=60,
count_error_fn=lambda e: not isinstance(e, anthropic.BadRequestError),
)
openai_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)
chain = FallbackChain([
Provider(
"anthropic",
fn=lambda **kw: anthropic_breaker.call(anthropic_client.messages.create, **kw),
model="claude-sonnet-4-6",
),
Provider(
"openai",
fn=lambda **kw: openai_breaker.call(openai_client.chat.completions.create, **kw),
model="gpt-4o",
),
])
def call_resilient(**kwargs) -> dict:
result, trace = chain.call(**kwargs)
# Log state for monitoring
for provider_name, breaker in [("anthropic", anthropic_breaker), ("openai", openai_breaker)]:
if breaker.state.value != "CLOSED":
logger.warning("circuit_state", provider=provider_name, state=breaker.state.value)
return result
Sibling Libraries
| Library | What it solves |
|---|---|
llm-fallback-chain |
Route to next provider when circuit opens |
llm-retry |
Backoff retry before the circuit opens |
llm-rate-limit-bucket |
Rate limiting to prevent triggering the circuit |
llm-pretty-error |
Normalize provider errors for circuit error counting |
agent-event-bus |
Publish circuit state changes as events |
The failure handling stack: llm-rate-limit-bucket prevents overload, llm-retry handles transient errors, llm-circuit-breaker-py stops hammering on sustained outages, llm-fallback-chain routes to alternatives.
What's Next
Sliding window failure rate: instead of consecutive failure counting, track failure rate over a time window (e.g., 50% failure rate in the last 60 seconds). More nuanced than consecutive count — handles partial outages where every other request fails.
Metrics emission: on state transitions (CLOSED → OPEN, OPEN → HALF_OPEN, HALF_OPEN → CLOSED), emit metrics to your observability platform. Circuit state is a key SRE signal for provider health.
Async native: CircuitBreaker.async_call() for async agent loops. The current call() method works for sync callables; async wrapping needs explicit await support.
Built as part of the agent-stack family: composable Python primitives for production LLM agents.
Top comments (0)