Mukunda Rao Katta

Posted on May 25

llm-fallback-router-rs: Multi-Provider LLM Failover in Rust

#hermeschallenge #ai #rust #agents

The outage that hit production at 2am

The agent was running fine. Then the primary provider went down. Rate limit? Outage? The error message was not helpful. What was helpful was that the queue started backing up and every call was failing hard.

The fallback to a second provider was manual. Someone had to notice the alerts, update a config flag, and restart the service. That took forty minutes. Forty minutes of failed requests, angry retries from clients, and a growing queue of work that could not be processed.

The configuration for a second provider existed. The credentials were already in the environment. Everything needed for failover was there. It just was not wired into the code. The code assumed the primary would be up. It had always been up before.

Provider reliability is not binary. You get partial outages. You get rate limits that hit one region but not another. You get model-specific degradations where one model on a platform is slow and another is fine. You get situations where your Anthropic quota is exhausted but your OpenAI account has headroom. Writing these fallbacks by hand in every project is repetitive and error-prone. The policy is always the same. The code should be too.

This is a solved problem in networking. You have a list of upstreams. You try them in order. If one fails, you move to the next. You track the failures. You do not return an error to the caller until all options are exhausted.

llm-fallback-router-rs does this for LLM API calls.

Shape of the fix

The crate gives you a FallbackRouter that holds an ordered list of Provider entries. Each provider wraps a name and an async closure that makes the actual call. The router tries each in order.

use llm_fallback_router_rs::{FallbackRouter, Provider};

let router = FallbackRouter::new(vec![
    Provider::new("claude", |req| {
        let client = claude_client.clone();
        async move { client.call(req).await }
    }),
    Provider::new("gpt4o", |req| {
        let client = openai_client.clone();
        async move { client.call(req).await }
    }),
    Provider::new("gemini", |req| {
        let client = gemini_client.clone();
        async move { client.call(req).await }
    }),
]);

match router.call(request).await {
    Ok(resp) => handle_response(resp),
    Err(e) => {
        // All three failed. e.attempts has the full trace.
        for attempt in &e.attempts {
            tracing::error!(provider = %attempt.provider, error = %attempt.error);
        }
        return Err(e.into());
    }
}

The AllProvidersFailedError carries the full attempt trace. You can log which providers failed and why. You can emit metrics. You can surface a useful error message rather than just the last failure.

For the RetryHint trait pattern, where you want providers to report whether an error is retryable at all:

use llm_fallback_router_rs::{FallbackRouter, Provider, RetryHint};

// If a provider signals NotRetryable (e.g., invalid auth),
// the router stops immediately rather than burning through the list.
let router = FallbackRouter::with_retry_hints(vec![
    Provider::with_hint("claude", claude_fn, RetryHint::Ordered),
    Provider::with_hint("gpt4o", openai_fn, RetryHint::Ordered),
]);

What it does not do

The router does not implement circuit breaking. If a provider is returning errors on every call, the router will still attempt it on the next request. It does not track a failure rate or mark providers as unhealthy. For circuit breaking, pair this with llm-circuit-breaker around each provider closure. The router also does not handle rate limit backoff. If a provider returns a 429, the router treats it as a failure and moves to the next provider immediately. You can add retry-with-backoff inside the provider closure using llm-retry if you want the router to attempt the same provider multiple times before failing over. These are composable concerns. The router handles ordering and failover. Other crates handle rate limits and circuit states.

Inside the lib

The Provider struct owns a name and a Box<dyn Fn(Req) -> BoxFuture<Result<Resp, Err>>>. The boxed future is necessary because the provider closures have different concrete types. The router holds a Vec<Provider> and iterates it in order.

Error collection is where most of the design work went. The router needs to accumulate failures from each attempt and surface them all at the end. It uses a Vec<Attempt> that grows as providers are tried. When all providers fail, this vec becomes the body of AllProvidersFailedError. When a provider succeeds, the vec is dropped.

This means the happy path allocates the vec and then drops it. That allocation is inescapable in the current design. A future version may use a fixed-size array for configs with small provider lists.

The RetryHint trait is a signal from the provider to the router. A provider can return RetryHint::NotRetryable on errors that indicate the request itself is invalid, like a malformed body or an auth failure on that provider. The router uses this to short-circuit the attempt list. There is no point trying three more providers if the issue is with the request, not the provider.

Generic parameters keep the crate flexible. The call method on FallbackRouter is generic over the request and response types. You define what Req and Resp look like for your application. The crate does not impose a specific LLM SDK type.

When useful

Production agents where provider uptime is a real concern and you cannot afford manual failover.
Multi-region deployments where you route to a regional endpoint first and fall back to a global one.
Cost-tiered setups where you try a cheaper provider first and fall back to a more capable one when it fails.
Development environments where you test against one provider and fall back to a stub when the API key is missing.
Batch pipelines where you want to maximize throughput by not stopping when one provider has an outage.

When not useful

Single-provider setups where failover adds complexity without benefit.
Use cases where provider selection must be dynamic based on request content rather than failure order.
Situations where you need sticky routing: the same user always goes to the same provider regardless of failures.
Real-time voice or streaming where provider failover mid-stream is not feasible.
Cases where your SLA allows retrying with the same provider and you just need backoff, not a different provider.

Install

[dependencies]
llm-fallback-router-rs = "0.1"
tokio = { version = "1", features = ["full"] }

cargo add llm-fallback-router-rs

Siblings

Crate / Package	Language	What it does
llm-fallback-router	Python	Same ordered failover for Python agent stacks
llm-fallback-chain	Python	Chained failover with sync and async support
llm-retry	Rust	Exponential backoff retry for a single provider
llm-circuit-breaker	Rust	Circuit breaker to skip unhealthy providers
llm-retry-py	Python	Python port of the retry crate

What is next

The next version adds a HealthTracker that monitors failure rates per provider and reorders the list dynamically. If the primary provider has failed five times in the last two minutes, it temporarily moves to the back of the list. After that, I want to add metrics hooks so you can count attempts-per-provider and feed that data into your observability stack without wrapping the router manually.

Source is at github.com/MukundaKatta/llm-fallback-router-rs. The crate publish is queued and will appear at crates.io/crates/llm-fallback-router-rs once the rate limit clears.

DEV Community