DEV Community

Debby McKinney
Debby McKinney

Posted on

Add Observability, Routing, and Failover to Your LLM Stack With One URL Change

If your LLM application already works, you shouldn’t have to refactor it just to add observability, routing, or failover.


The Problem

You’ve built your LLM application.

It’s live. It works.

Now you want things like:

  • Observability
  • Load balancing
  • Caching
  • Provider failover

Most solutions require you to:

  • Rewrite API calls
  • Learn a new SDK
  • Refactor stable code
  • Re-test everything

That’s risky and expensive.

risky

Bifrost was built to avoid this entirely.

You drop it in, change one URL, and you’re done.

GitHub logo maximhq / bifrost

Fastest LLM gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost

Go Report Card Discord badge Known Vulnerabilities codecov Docker Pulls Run In Postman Artifact Hub License

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Get started

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'
Enter fullscreen mode Exit fullscreen mode

That's it! Your AI gateway is running with a web interface for visual configuration, real-time monitoring…


OpenAI-Compatible API

Bifrost speaks the OpenAI API format.

If your code works with OpenAI, it will work with Bifrost.

Before

import openai

openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

After

import openai

openai.api_base = "http://localhost:8080/openai"
openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)
Enter fullscreen mode Exit fullscreen mode

One line changed.

Everything else stays the same.

Works With Major Frameworks

Because Bifrost is OpenAI-compatible, it works with any framework that already supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)
Enter fullscreen mode Exit fullscreen mode

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)
Enter fullscreen mode Exit fullscreen mode

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)
Enter fullscreen mode Exit fullscreen mode

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)
Enter fullscreen mode Exit fullscreen mode

Same pattern everywhere:

update the base URL, keep the rest of your code unchanged.


One Interface, Multiple Providers

Bifrost routes requests to multiple providers using the same API.

Configuration

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Application Code

# Routes to OpenAI
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...]
)

# Routes to Anthropic
response = client.chat_completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[...]
)
Enter fullscreen mode Exit fullscreen mode

Switch providers by changing the model name.

No refactoring required.


Built-In Observability (No Instrumentation)

Bifrost ships with observability integrations out of the box.

Maxim AI

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Every request is automatically traced in the Maxim dashboard.

No instrumentation code needed.

Prometheus

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}
Enter fullscreen mode Exit fullscreen mode

Metrics are exposed at /metrics and can be scraped by Prometheus.

OpenTelemetry

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}
Enter fullscreen mode Exit fullscreen mode

Standard OTLP export to any OpenTelemetry-compatible collector.


Framework-Specific Integrations

Claude Code

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}
Enter fullscreen mode Exit fullscreen mode

All Claude Code requests now flow through Bifrost, enabling cost tracking, token usage, and caching.

LibreChat

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]
Enter fullscreen mode Exit fullscreen mode

Universal model access across all configured providers.


MCP (Model Context Protocol) Support

Bifrost supports MCP for tool calling and shared context.

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Once configured, your LLM calls automatically gain access to MCP tools.


Deployment Options

Docker

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest
Enter fullscreen mode Exit fullscreen mode

Docker Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data
Enter fullscreen mode Exit fullscreen mode

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080
Enter fullscreen mode Exit fullscreen mode

Terraform examples are available in the documentation.


Real Integration Example

Before (Direct OpenAI)

No observability

No caching

No load balancing

No provider failover

After (Through Bifrost)

llm = ChatOpenAI(
    model="gpt-4",
    openai_api_base="http://localhost:8080/langchain"
)
Enter fullscreen mode Exit fullscreen mode

Automatically enabled:

Observability

Semantic caching

Multi-key load balancing

Provider failover

One line changed. All features enabled.


Migration Checklist

  • Run Bifrost
  • Add provider API keys
  • Update the base URL
  • Test one request
  • Deploy

Total migration time: ~10 minutes.


The Bottom Line

Bifrost integrates into existing LLM stacks in minutes:

  • OpenAI-compatible API
  • One URL change
  • Multi-provider routing
  • Built-in observability

No refactoring required

No new SDKs.

No code rewrites.

Just drop it in.

Built by the team at Maxim AI.

Top comments (0)