Bonnie for CrossPostr

Posted on Apr 2

Implementing Automatic LLM Provider Fallback In AI Agents Using an LLM Gateway (OpenAI, Anthropic, Gemini & Bifrost)

#opensource #programming #webdev #tutorial

Every major LLM provider such as OpenAI, Anthropic, Gemini e.t.c, has experienced outages or rate-limiting incidents in the last twelve months.

As a developer, shipping AI-powered applications or AI agents depending on a single LLM provider is a production risk you cannot afford.

For that reason, you need to implement automatic LLM provider fallback in your app where AI requests are routed to backup LLM providers ( e.g, Anthropic or Gemini) the moment your primary provider (e.g, OpenAI) hits a rate limit, outage, or network error.

In this guide, you will learn how to implement automatic LLM provider fallback using the Bifrost LLM Gateway

Before we jump in, here is what we will be covering:

What is LLM provider fallback (and why it matters in production)?
How to set up Bifrost LLM gateway with multiple providers
How to configure automatic LLM provider failover
Testing LLM fallback with the Bifrost Mocker plugin

What is automatic LLM provider fallback?

LLM provider fallback (also called LLM provider failover) is the practice of automatically routing AI requests to backup providers when your primary provider encounters issues such as rate limiting, network errors, or outages.

Without a fallback strategy, a single provider incident can bring down your entire AI app hence frustrating users and eroding trust. With it, your application continues working transparently, as if nothing went wrong.

To implement automatic LLM provider failover, you can use an LLM gateway such as Bifrost. When your primary AI provider fails, Bifrost automatically tries backup providers in the order you specify using this process:

Primary attempt — Bifrost tries your configured primary provider and model first.
Automatic detection — If the primary fails, Bifrost detects the failure immediately.
Sequential fallbacks — Bifrost tries each fallback provider in order until one succeeds.
Success response — The response from the first successful provider is returned.
Complete failure — If all providers fail, Bifrost returns the original error from the primary provider.

You can read more about how fallbacks work in the Bifrost docs.

Prerequisites

Before we start, make sure you have the following:

Node.js 18+ (for running Bifrost via npx) or Docker (for containerized deployment)
API keys for the providers you want to use:
- OpenAI: platform.openai.com
- Anthropic: console.anthropic.com
- Google (Gemini): aistudio.google.com or Google Cloud (for Vertex)
Python 3.9+ or Node.js for writing the agent code
Go 1.21+ for automatic LLM provider failover demo using Bifrost Mocker plugin

Setting up and configuring LLM providers with Bifrost

In this section, you will how to set up Bifrost LLM gateway with multiple AI model providers that you can use with Claude Code.

Let’s get started.

Step 1: Install Bifrost using NPX binary

To install Bifrost, first, create your project folder named automatic-failover-demo and open it using your preferred code editor such as VS Code or Cursor.

Then run the command below with the **-app-dir** flag that determines where Bifrost stores all its data:

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

Step 2: Create a config.json file

Once you have installed Bifrost in your project, create a config.json file in the ./my-bifrost-data folder. Then add the code below that defines multiple LLM providers and database persistence.

{
  "$schema": "https://www.getbifrost.ai/schema",
  "client": {
    "enable_logging": true,
    "disable_content_logging": false,
    "drop_excess_requests": false,
    "initial_pool_size": 300,
    "allow_direct_keys": false
  },
  "providers": {
    "openai": {
      "keys": [
        {
          "name": "openai-primary",
          "value": "env.OPENAI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "anthropic": {
      "keys": [
        {
          "name": "anthropic-primary",
          "value": "env.ANTHROPIC_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    },
    "gemini": {
      "keys": [
        {
          "name": "gemini-primary",
          "value": "env.GEMINI_API_KEY",
          "models": [],
          "weight": 1.0
        }
      ]
    }
  },
  "config_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./config.db"
    }
  },
  "logs_store": {
    "enabled": true,
    "type": "sqlite",
    "config": {
      "path": "./logs.db"
    }
  }
}

Step 3: Set up your API keys

After creating the config.json file, set your API keys as environment variables so they're never hardcoded using the commands below in the terminal.

export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"
export GEMINI_API_KEY="your-gemini-api-key"

Step 4: Start Bifrost Gateway server

Once you have set up your API keys, start the Bifrost LLM gateway server by running the command below again (Make sure to stop the initial running server instance).

npx -y @maximhq/bifrost -app-dir ./my-bifrost-data

Then Bifrost will listen on port 8080, as shown below.

Finally, navigate to the gateway dashboard at http://localhost:8080 as shown below

Step 5: Verify the Setup

After installing and starting Bifrost, you can verify if it's working by running the curl command below in the terminal.

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Then you should get a response from the Bifrost LLM gateway, as shown below.

How to configure automatic LLM provider failover

To implement automatic LLM provider failover, add fallbacks array to the request payload, as shown below:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Explain what is an LLM gateway in simple terms"
      }
    ],
    "fallbacks": [
      "anthropic/claude-3-5-sonnet-20241022",
      "openrouter/anthropic.claude-3-sonnet-20240229-v1:0"
    ],
    "max_tokens": 1000,
    "temperature": 0.7
  }'

Bifrost will attempt OpenAI first. If that fails, it will try Anthropic. If Anthropic also fails, it will try Openrouter. The first successful response is returned.

After sending the request, the response format returned is a standard OpenAI-compatible JSON with an extra_fields object that tells you which provider actually handled the request, a shown below.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "An LLM gateway is a proxy layer that sits between your application and multiple AI providers..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 42,
    "total_tokens": 60
  },
  "extra_fields": {
    "provider": "anthropic",
    "latency": 1.34
  }
}

If extra_fields.provider is "anthropic" and you sent to "openai/gpt-4o-mini", you know a fallback was triggered.

Implementing automatic LLM provider fallback using Bifrost Mocker plugin

Bifrost triggers LLM provider fallback when it detects any of the following:

Network connectivity issues
Provider API errors (HTTP 500, 502, 503, 504)
Rate limiting (HTTP 429 errors)
Model unavailability
Request timeouts
Authentication failures

Rather than waiting for a real outage, you can use the Bifrost Mocker plugin to simulate these failure conditions and verify that your fallback chain works as expected.

In this section, we will mock a rate-limiting error to confirm that Bifrost automatically falls back from OpenAI to Anthropic.

Let’s get started!

Step 1: Initialize the Go Project

Start by creating a new directory for your project and initializing a Go module:

mkdir bifrost-fallback-demo
cd bifrost-fallback-demo
go mod init bifrost-fallback-demo

Step 2: Install Dependencies

After initializing the Go project, install the core Bifrost SDK and the Mocker plugin using the commands below:

go get github.com/maximhq/bifrost/core
go get github.com/maximhq/bifrost/plugins/mocker

Step 3: Set your API keys

Once you have installed the dependencies, set your LLM provider API keys as environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Step 4: Configure the Bifrost Account Interface

After setting your API keys, create a file named account.go that implements the schemas.Account interface required by Bifrost to know which providers are available and how to authenticate them, as shown below.

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/maximhq/bifrost/core/schemas"
)

// MyAccount implements the schemas.Account interface with support
// for OpenAI (primary) and Anthropic (fallback) providers.
type MyAccount struct{}

// GetConfiguredProviders returns the list of AI providers this account supports.
// Step 1: Define the providers we want to configure for fallback logic.
func (a *MyAccount) GetConfiguredProviders() ([]schemas.ModelProvider, error) {
    // The order doesn't dictate fallback order here, but it lists all available providers.
    return []schemas.ModelProvider{
        schemas.OpenAI,
        schemas.Anthropic,
    }, nil
}

// GetKeysForProvider returns API keys for the given provider from env vars.
// Step 2: Implement key retrieval for each configured provider.
func (a *MyAccount) GetKeysForProvider(ctx context.Context, provider schemas.ModelProvider) ([]schemas.Key, error) {
    // Step 2a: Setup routing based on the requested provider.
    switch provider {
    case schemas.OpenAI:
        // Step 2b: Retrieve the OpenAI key from the environment.
        key := os.Getenv("OPENAI_API_KEY")
        if key == "" {
            return nil, fmt.Errorf("OPENAI_API_KEY environment variable is not set")
        }
        // Step 2c: Return the configured key.
        // The Weight controls how traffic is routed if multiple keys are present.
        return []schemas.Key{{
            Value:  *schemas.NewEnvVar(key),
            Models: []string{}, // Empty = all models
            Weight: 1.0,
        }}, nil

    case schemas.Anthropic:
        // Step 2d: Retrieve the Anthropic key from the environment.
        key := os.Getenv("ANTHROPIC_API_KEY")
        if key == "" {
            return nil, fmt.Errorf("ANTHROPIC_API_KEY environment variable is not set")
        }
        // Step 2e: Return the Anthropic key with identical configuration format.
        return []schemas.Key{{
            Value:  *schemas.NewEnvVar(key),
            Models: []string{}, // Empty = all models
            Weight: 1.0,
        }}, nil

    default:
        // Step 2f: Handle unconfigured providers gracefully with an error.
        return nil, fmt.Errorf("provider %s not supported", provider)
    }
}

// GetConfigForProvider returns provider-specific network and concurrency config.
// Step 3: Define connection parameters (like rate limits and timeouts) for each provider.
func (a *MyAccount) GetConfigForProvider(provider schemas.ModelProvider) (*schemas.ProviderConfig, error) {
    // Step 3a: Apply configuration to supported providers.
    switch provider {
    case schemas.OpenAI, schemas.Anthropic:
        // Step 3b: Return the default Bifrost network and concurrency settings 
        // for both OpenAI and Anthropic to simplify the setup.
        return &schemas.ProviderConfig{
            NetworkConfig:            schemas.DefaultNetworkConfig,
            ConcurrencyAndBufferSize: schemas.DefaultConcurrencyAndBufferSize,
        }, nil
    default:
        // Step 3c: Reject unsupported providers.
        return nil, fmt.Errorf("provider %s not supported", provider)
    }
}

Step 5: Configure Bifrost Chat Request

Once you have configured the Bifrost account interface, create a main.go file with a helper function that builds a schemas.BifrostChatRequest. The chat request configures the request to target your primary provider (OpenAI) and explicitly define the fallback chain (Anthropic), as shown below:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    bifrost "github.com/maximhq/bifrost/core"
    "github.com/maximhq/bifrost/core/schemas"
    mocker "github.com/maximhq/bifrost/plugins/mocker"
)

// --- Pointer helpers ---

func stringPtr(s string) *string { return &s }
func boolPtr(b bool) *bool       { return &b }
func intPtr(i int) *int           { return &i }

// buildChatRequest creates a standard chat request targeting OpenAI
// with Anthropic as the fallback provider.
// Step 1: Define the chat request with OpenAI as primary and Anthropic as a fallback.
func buildChatRequest(message string) *schemas.BifrostChatRequest {
    return &schemas.BifrostChatRequest{
        Provider: schemas.OpenAI,
        Model:    "gpt-4o-mini",
        Input: []schemas.ChatMessage{
            {
                Role: schemas.ChatMessageRoleUser,
                Content: &schemas.ChatMessageContent{
                    ContentStr: schemas.Ptr(message),
                },
            },
        },
        // Fallback chain: OpenAI → Anthropic
        Fallbacks: []schemas.Fallback{
            {
                Provider: schemas.Anthropic,
                Model:    "claude-sonnet-4-5-20250929",
            },
        },
        Params: &schemas.ChatParameters{
            MaxCompletionTokens: bifrost.Ptr(200),
            Temperature:         bifrost.Ptr(0.7),
        },
    }
}

Step 6: Configure the Mocker Plugin and Initialize the Bifrost Client

After configuring Bifrost chat interface, configure the Mocker Plugin and initialize the Bifrost Client by mocking rate limiting fallback trigger in the main.go file, as shown below:

package main

import (
    "context"
    "fmt"
    "log"
    "time"

    bifrost "github.com/maximhq/bifrost/core"
    "github.com/maximhq/bifrost/core/schemas"
    mocker "github.com/maximhq/bifrost/plugins/mocker"
)

// --- Pointer helpers ---

func stringPtr(s string) *string { return &s }
func boolPtr(b bool) *bool       { return &b }
func intPtr(i int) *int           { return &i }

// ...

// =====================================================================
// Scenario 1: Mock Error → Fallback Succeeds
// =====================================================================
// Mocker always returns a 429 rate-limit error for OpenAI,
// but AllowFallbacks=true lets Bifrost try Anthropic.

func runScenario1() {
    fmt.Println("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    fmt.Println("SCENARIO 1: Mock Error → Fallback Succeeds")
    fmt.Println("━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━")
    fmt.Println("  Primary:  OpenAI   (will be mocked with 429 error)")
    fmt.Println("  Fallback: Anthropic (real provider)")
    fmt.Println("  AllowFallbacks: true")
    fmt.Println()

    // Step 2: Configure the Mocker plugin. We will set up a rule that always
    // triggers a 429 rate limit error for OpenAI requests.
    plugin, err := mocker.Init(mocker.MockerConfig{
        Enabled: true,
        Rules: []mocker.MockRule{
            {
                Name:        "openai-rate-limit",
                Enabled:     true,
                Priority:    100,
                Probability: 1.0, // Always trigger
                Conditions: mocker.Conditions{
                    Providers: []string{"openai"},
                },
                Responses: []mocker.Response{
                    {
                        Type:           mocker.ResponseTypeError,
                        AllowFallbacks: boolPtr(true), // ← KEY: allow Bifrost to try fallbacks
                        Error: &mocker.ErrorResponse{
                            Message:    "Rate limit exceeded",
                            Type:       stringPtr("rate_limit"),
                            Code:       stringPtr("429"),
                            StatusCode: intPtr(429),
                        },
                    },
                },
                Latency: &mocker.Latency{
                    Type: mocker.LatencyTypeFixed,
                    Min:  100 * time.Millisecond, // Simulate slight network delay
                },
            },
        },
    })
    if err != nil {
        log.Fatalf("Scenario 1 — plugin creation failed: %v", err)
    }

    // Step 3: Initialize the Bifrost client, attaching our Account definition and the plugin.
    client, initErr := bifrost.Init(context.Background(), schemas.BifrostConfig{
        Account:    &MyAccount{},
        LLMPlugins: []schemas.LLMPlugin{plugin},
        Logger:     bifrost.NewDefaultLogger(schemas.LogLevelInfo),
    })
    if initErr != nil {
        log.Fatalf("Scenario 1 — Bifrost init failed: %v", initErr)
    }
    defer client.Shutdown()

    // Step 4: Execute the ChatCompletionRequest. Bifrost will attempt OpenAI first,
    // encounter the rate limit error from the mocker, and automatically fallback to Anthropic.
    response, bifrostErr := client.ChatCompletionRequest(
        schemas.NewBifrostContext(context.Background(), schemas.NoDeadline),
        buildChatRequest("What is the capital of France? Reply in one sentence."),
    )
    if bifrostErr != nil {
        // Step 5: Handle the final error (if both primary and fallback fail).
        fmt.Printf("  ❌ Request failed: %v\n\n", bifrostErr.Error.Message)
        return
    }

    // Step 6: Process the successful response.
    fmt.Printf("  ✅ Fallback succeeded!\n")
    fmt.Printf("  Provider: %s\n", response.ExtraFields.Provider)
    fmt.Printf("  Response: %s\n\n", *response.Choices[0].Message.Content.ContentStr)
}

// =====================================================================
// Main
// =====================================================================

func main() {
    fmt.Println()
    fmt.Println("╔══════════════════════════════════════════════════╗")
    fmt.Println("║  Bifrost Mocker Plugin — Fallback Demo           ║")
    fmt.Println("╚══════════════════════════════════════════════════╝")
    fmt.Println()

    runScenario1()

    fmt.Println("Done!")
}

Step 7: Run the demo

Once you have configured the Mocker Plugin and initialized the Bifrost Client by mocking rate limiting fallback trigger, run the demo using the command below:

go run .

After running the demo, you should see that a rate limiting error by OpenAI provider triggered an automatic fallback to Anthropic provider as shown below:

Step 8: Mock other fallback triggers

If you want to mock other fallback triggers such as Network connectivity issues, Provider API errors (500, 502, 503, 504), Request timeouts and Authentication failures, clone the demo repo from GitHub using the command below,

git clone https://github.com/TheGreatBonnie/automatic-llm-provider-fallback-demo.git

Then set your LLM provider API keys as environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Finally run the demos

go run .

Congratulations! You have successfully implemented automatic LLM provider failover or fall back.

Conclusion

Automatic LLM provider fallback is one of the most effective ways to harden your AI agents against the outages, rate limits, and network issues that are an inevitable part of production.

In this guide, you:

Set up Bifrost as an OpenAI-compatible LLM gateway
Configured multiple AI providers (OpenAI, Anthropic, Gemini)
Implemented a fallback chain so requests automatically move from a failing primary model to backup providers
Tested fallback behavior locally using the Bifrost Mocker plugin without waiting for a real outage

As a next step, explore Bifrost's other reliability features to build an even more resilient AI stack.

Explore Bifrost: