Debby McKinney

Posted on Dec 21, 2025

Add Observability, Routing, and Failover to Your LLM Stack With One URL Change

#ai #programming #mcp #agents

If your LLM application already works, you shouldn’t have to refactor it just to add observability, routing, or failover.

The Problem

You’ve built your LLM application.

It’s live. It works.

Now you want things like:

Observability
Load balancing
Caching
Provider failover

Most solutions require you to:

Rewrite API calls
Learn a new SDK
Refactor stable code
Re-test everything

That’s risky and expensive.

Bifrost was built to avoid this entirely.

You drop it in, change one URL, and you’re done.

maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…

View on GitHub

OpenAI-Compatible API

Bifrost speaks the OpenAI API format.

If your code works with OpenAI, it will work with Bifrost.

Before

import openai

openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

After

import openai

openai.api_base = "http://localhost:8080/openai"
openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

One line changed.

Everything else stays the same.

Works With Major Frameworks

Because Bifrost is OpenAI-compatible, it works with any framework that already supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)

Same pattern everywhere:

update the base URL, keep the rest of your code unchanged.

One Interface, Multiple Providers

Bifrost routes requests to multiple providers using the same API.

Configuration

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}

Application Code

# Routes to OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# Routes to Anthropic
response = client.chat_completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[...]
)

Switch providers by changing the model name.

No refactoring required.

Built-In Observability (No Instrumentation)

Bifrost ships with observability integrations out of the box.

Maxim AI

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}

Every request is automatically traced in the Maxim dashboard.

No instrumentation code needed.

Prometheus

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}

Metrics are exposed at /metrics and can be scraped by Prometheus.

OpenTelemetry

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}

Standard OTLP export to any OpenTelemetry-compatible collector.

Framework-Specific Integrations

Claude Code

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}

All Claude Code requests now flow through Bifrost, enabling cost tracking, token usage, and caching.

LibreChat

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]

Universal model access across all configured providers.

MCP (Model Context Protocol) Support

Bifrost supports MCP for tool calling and shared context.

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}

Once configured, your LLM calls automatically gain access to MCP tools.

Deployment Options

Docker

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest

Docker Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080

Terraform examples are available in the documentation.

Real Integration Example

Before (Direct OpenAI)

No observability

No caching

No load balancing

No provider failover

After (Through Bifrost)

llm = ChatOpenAI(
    model="gpt-4",
    openai_api_base="http://localhost:8080/langchain"
)

Automatically enabled:

Observability

Semantic caching

Multi-key load balancing

Provider failover

One line changed. All features enabled.

Migration Checklist

Run Bifrost
Add provider API keys
Update the base URL
Test one request
Deploy

Total migration time: ~10 minutes.

The Bottom Line

Bifrost integrates into existing LLM stacks in minutes:

OpenAI-compatible API
One URL change
Multi-provider routing
Built-in observability

No refactoring required

No new SDKs.

No code rewrites.

Just drop it in.

Built by the team at Maxim AI.

DEV Community