Build a Provider Failover Chain That Actually Works Under Pressure

#hermeschallenge #ai #python #agents

2am. A production agent starts returning errors. The Anthropic status page shows a partial outage. The on-call engineer scrambles to update environment variables, redeploy, wait for containers to start. Twenty minutes of downtime.

The agent could have switched providers automatically. It had an OpenAI key. It had a Gemini key. It just had no way to use them when the primary failed.

The Shape of the Fix

from llm_fallback_chain import FallbackChain, AttemptTrace

chain = FallbackChain([
    ("anthropic", lambda prompt: call_anthropic(prompt)),
    ("openai",    lambda prompt: call_openai(prompt)),
    ("gemini",    lambda prompt: call_gemini(prompt)),
])

result, trace = chain.call("Summarize this report: ...")

print(f"Used: {trace.used_provider}")
print(f"Attempts: {[(a.provider, a.error) for a in trace.attempts]}")

Three providers. One call. The chain tries them in order, stops at the first success, and tells you exactly what happened via AttemptTrace.

What It Does NOT Do

llm-fallback-chain does not load-balance across providers. It always tries the first provider first. For true load distribution, you want llm-fallback-router. This library is for ordered failover: primary, then secondary, then tertiary.

It does not normalize provider responses. The result from the successful provider is returned as-is. If Anthropic and OpenAI return different response shapes, you need to normalize them yourself or wrap each lambda to produce a consistent shape.

It does not cache results. Each call goes to the network. For caching, pair with tool-result-cache.

Inside the Library

The chain iterates through providers and calls each one inside a try/except:

def call(self, *args, **kwargs) -> tuple[T, AttemptTrace]:
    trace = AttemptTrace(attempts=[])
    for name, fn in self._providers:
        try:
            result = fn(*args, **kwargs)
            trace.used_provider = name
            return result, trace
        except Exception as e:
            if self._skip_predicate and self._skip_predicate(name, e):
                trace.attempts.append(Attempt(provider=name, error=str(e), skipped=True))
                continue
            trace.attempts.append(Attempt(provider=name, error=str(e)))
    raise AllProvidersFailed(trace=trace)

The skip_predicate is a callable (provider_name, exception) -> bool. Return True to skip a provider without counting it as a real failure. Useful for filtering: skip a provider if the error is an auth error (wrong key) rather than an availability error.

Async variant: AsyncFallbackChain wraps async lambdas with the same interface. Both sync and async in one package.

The 24 tests cover: first provider succeeds, first fails, all fail, skip_predicate, AttemptTrace contents, AllProvidersFailed exception, async variant, and exception propagation.

When to Use It

Use it when you have multiple provider credentials and you want automatic failover without manual intervention. Provider outages, rate limit spikes, regional unavailability.

The key design choice is ordering. Put your cheapest or fastest provider first if cost and latency matter during normal operation. Put your most reliable provider first if availability is the primary concern. These are often different providers.

Skip it if you only have one provider credential. A chain of one adds overhead without benefit. And skip it if your providers have incompatible output shapes that are expensive to normalize — the normalization complexity can outweigh the reliability benefit.

Install

pip install git+https://github.com/MukundaKatta/llm-fallback-chain

from llm_fallback_chain import FallbackChain, AllProvidersFailed
import anthropic, openai

anthropic_client = anthropic.Anthropic()
openai_client = openai.OpenAI()

def anthropic_call(prompt: str) -> str:
    r = anthropic_client.messages.create(
        model="claude-sonnet-4-6", max_tokens=512,
        messages=[{"role": "user", "content": prompt}]
    )
    return r.content[0].text

def openai_call(prompt: str) -> str:
    r = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    return r.choices[0].message.content

chain = FallbackChain([
    ("anthropic", anthropic_call),
    ("openai", openai_call),
])

try:
    result, trace = chain.call("Translate to French: Hello world")
    print(f"Provider: {trace.used_provider}")
    print(f"Result: {result}")
except AllProvidersFailed as e:
    print(f"All providers failed: {e.trace}")

Sibling Libraries

Library	What it solves
`llm-fallback-router`	Parallel routing: send to multiple providers simultaneously
`llm-retry-py`	Retry the same provider with backoff
`llm-circuit-breaker-py`	Open circuit after N failures, skip provider for cooldown period
`token-budget-pool`	Shared budget across concurrent agents
`agent-deadline`	Time-based deadlines for loop control

The production pattern: llm-circuit-breaker-py per provider (opens after repeated failures), llm-retry-py for transient errors within one provider, llm-fallback-chain for provider-level failover when the circuit opens.

What's Next

Circuit breaker integration is the natural next step. Right now the chain tries each provider on every call regardless of recent failure history. If a provider has been failing for the last 10 minutes, trying it first still costs a round-trip latency. A built-in per-provider circuit breaker state would let the chain skip known-bad providers automatically.

Health checks are another direction: a background thread that periodically probes each provider and maintains a healthy flag. The chain skips unhealthy providers without attempting a real call. This trades a background thread for reduced tail latency on failover.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.