Circuit Breaker for LLM Calls: Stop Hammering a Provider That Is Already Down

#hermeschallenge #ai #python #agents

The LLM provider is having an incident. Your agent is making requests. Every request fails with 503. Your retry logic waits 2 seconds, tries again. Fails. Waits 4 seconds. Fails. Waits 8 seconds. Meanwhile, you are wasting API quota, burning retry delay, and potentially queueing up more requests behind these failing ones.

The circuit breaker pattern solves this. After N failures, the circuit opens. Open circuit means fail immediately without trying the provider. After a recovery timeout, the circuit half-opens. One test request goes through. If it succeeds, the circuit closes. If it fails, the circuit re-opens.

llm-circuit-breaker-py is the circuit breaker pattern for LLM provider calls.

The Shape of the Fix

from llm_circuit_breaker_py import CircuitBreaker, CircuitOpenError

breaker = CircuitBreaker(
    failure_threshold=5,    # Open after 5 consecutive failures
    recovery_timeout=60,    # Try again after 60 seconds
)

def call_llm_safe(**kwargs) -> dict:
    try:
        return breaker.call(anthropic_client.messages.create, **kwargs)
    except CircuitOpenError as e:
        raise RuntimeError(
            f"Anthropic circuit is open. "
            f"Retry after {e.retry_after:.0f}s. "
            f"Use fallback provider."
        )

5 failures in a row: circuit opens. Next 60 seconds: all calls raise CircuitOpenError immediately, no network calls. After 60 seconds: one test call goes through. If it succeeds, circuit closes and normal operation resumes.

What It Does NOT Do

llm-circuit-breaker-py does not automatically fail over to another provider. When the circuit is open, calls raise CircuitOpenError. Your code decides what to do: fail with an error, use a cached response, or route to a different provider via llm-fallback-chain.

It does not differentiate between error types. A network timeout and a 400 Bad Request both count as failures by default. For production use, you likely want to only open the circuit on server-side errors (5xx) and rate limits (429), not on client errors (4xx). Pass count_error_fn to filter which exceptions count.

It does not track failure rate, only consecutive failures. If you have 4 failures followed by 1 success followed by 4 more failures, the counter resets on the success. For a percentage-based failure rate threshold, you need a sliding window implementation.

Inside the Library

The three states and transitions:

from enum import Enum
import threading
import time

class State(Enum):
    CLOSED = "CLOSED"           # Normal: requests pass through
    OPEN = "OPEN"               # Failing: requests blocked
    HALF_OPEN = "HALF_OPEN"     # Testing: one request allowed

class CircuitBreaker:
    def __init__(
        self,
        failure_threshold: int = 5,
        recovery_timeout: float = 60.0,
        count_error_fn=None,
    ):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self._count_error = count_error_fn or (lambda e: True)

        self.state = State.CLOSED
        self.failure_count = 0
        self._opened_at: float | None = None
        self._lock = threading.Lock()

    def call(self, fn, *args, **kwargs):
        with self._lock:
            if self.state == State.OPEN:
                elapsed = time.monotonic() - self._opened_at
                if elapsed >= self.recovery_timeout:
                    self.state = State.HALF_OPEN
                else:
                    retry_after = self.recovery_timeout - elapsed
                    raise CircuitOpenError(
                        state=self.state,
                        retry_after=retry_after,
                        failure_count=self.failure_count,
                    )

        try:
            result = fn(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure(e)
            raise

    def _on_success(self):
        with self._lock:
            self.failure_count = 0
            self.state = State.CLOSED
            self._opened_at = None

    def _on_failure(self, error):
        with self._lock:
            if not self._count_error(error):
                return  # Don't count this error type

            self.failure_count += 1

            if self.state == State.HALF_OPEN:
                # Test request failed: reopen
                self.state = State.OPEN
                self._opened_at = time.monotonic()
            elif self.failure_count >= self.failure_threshold:
                self.state = State.OPEN
                self._opened_at = time.monotonic()

The CircuitOpenError carries actionable information:

@dataclass
class CircuitOpenError(Exception):
    state: State
    retry_after: float  # seconds until next test request
    failure_count: int

When to Use It

Use it on every external LLM API call. Provider incidents happen. Without a circuit breaker, your agent burns retries and delays against a provider that is not going to respond for the next 10 minutes.

Use it per provider in a multi-provider setup. Anthropic has its own breaker; OpenAI has its own. A breaker on Anthropic does not affect OpenAI calls. The breaker state for each provider is independent.

Use it with llm-fallback-chain. When the Anthropic circuit opens, CircuitOpenError tells the chain to skip Anthropic and try the next provider. The chain handles the routing; the breaker manages the state.

Use it with agent-event-bus to broadcast circuit state changes. Subscribe to llm.circuit_opened events to alert on-call when a provider circuit opens. Subscribe to llm.circuit_closed events when recovery completes.

Install

pip install git+https://github.com/MukundaKatta/llm-circuit-breaker-py

# Or from PyPI (429 cooldown pending)
pip install llm-circuit-breaker-py

from llm_circuit_breaker_py import CircuitBreaker, CircuitOpenError
from llm_fallback_chain import FallbackChain, Provider

anthropic_breaker = CircuitBreaker(
    failure_threshold=5,
    recovery_timeout=60,
    count_error_fn=lambda e: not isinstance(e, anthropic.BadRequestError),
)

openai_breaker = CircuitBreaker(failure_threshold=5, recovery_timeout=60)

chain = FallbackChain([
    Provider(
        "anthropic",
        fn=lambda **kw: anthropic_breaker.call(anthropic_client.messages.create, **kw),
        model="claude-sonnet-4-6",
    ),
    Provider(
        "openai",
        fn=lambda **kw: openai_breaker.call(openai_client.chat.completions.create, **kw),
        model="gpt-4o",
    ),
])

def call_resilient(**kwargs) -> dict:
    result, trace = chain.call(**kwargs)

    # Log state for monitoring
    for provider_name, breaker in [("anthropic", anthropic_breaker), ("openai", openai_breaker)]:
        if breaker.state.value != "CLOSED":
            logger.warning("circuit_state", provider=provider_name, state=breaker.state.value)

    return result

Sibling Libraries

Library	What it solves
`llm-fallback-chain`	Route to next provider when circuit opens
`llm-retry`	Backoff retry before the circuit opens
`llm-rate-limit-bucket`	Rate limiting to prevent triggering the circuit
`llm-pretty-error`	Normalize provider errors for circuit error counting
`agent-event-bus`	Publish circuit state changes as events

The failure handling stack: llm-rate-limit-bucket prevents overload, llm-retry handles transient errors, llm-circuit-breaker-py stops hammering on sustained outages, llm-fallback-chain routes to alternatives.

What's Next

Sliding window failure rate: instead of consecutive failure counting, track failure rate over a time window (e.g., 50% failure rate in the last 60 seconds). More nuanced than consecutive count — handles partial outages where every other request fails.

Metrics emission: on state transitions (CLOSED → OPEN, OPEN → HALF_OPEN, HALF_OPEN → CLOSED), emit metrics to your observability platform. Circuit state is a key SRE signal for provider health.

Async native: CircuitBreaker.async_call() for async agent loops. The current call() method works for sync callables; async wrapping needs explicit await support.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.