DEV Community

Kuldeep Paul
Kuldeep Paul

Posted on

How Bifrost Integrates With Your Existing LLM Stack (No Refactoring Required)

The Problem

You’ve built your LLM application. It works.

Now you want better observability, load balancing, or caching.

Most solutions require:

  • Rewriting your API calls
  • Learning new SDKs
  • Refactoring working code
  • Testing everything again

We built Bifrost to be different: drop it in, change one URL, done.


OpenAI-Compatible API

Bifrost speaks OpenAI’s API format.

If your code works with OpenAI, it works with Bifrost.

Before

import openai

openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Enter fullscreen mode Exit fullscreen mode

After

import openai

openai.api_base = "http://localhost:8080/openai"  # Only change
openai.api_key = "sk-..."  # Your actual API key

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Enter fullscreen mode Exit fullscreen mode

One line changed. That’s it.


Works With Every Major Framework

Because Bifrost is OpenAI-compatible, it works with any framework that supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)

Enter fullscreen mode Exit fullscreen mode

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)

Enter fullscreen mode Exit fullscreen mode

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)

Enter fullscreen mode Exit fullscreen mode

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)

Enter fullscreen mode Exit fullscreen mode

Same pattern everywhere: change the base URL, keep everything else.


Multiple Providers, One Interface

Bifrost routes to multiple providers through the same API.

Configuration

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Your code

# OpenAI
response = client.chat.completions.create(
    model="gpt-4",  # Routes to OpenAI
    messages=[...]
)

# Anthropic (same code structure)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",  # Routes to Anthropic
    messages=[...]
)

Enter fullscreen mode Exit fullscreen mode

Switch providers by changing the model name.

No refactoring required.


Built-In Observability Integration

Bifrost integrates with observability platforms out of the box.

Maxim AI

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Every request is automatically traced to the Maxim dashboard.

Zero instrumentation code.


Prometheus

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}

Enter fullscreen mode Exit fullscreen mode

Metrics exposed at /metrics.

Plug into your existing Prometheus setup.


OpenTelemetry

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}

Enter fullscreen mode Exit fullscreen mode

Standard OTLP export to any OpenTelemetry collector.


Framework-Specific Integrations

Claude Code

Update your Claude Code config:

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}

Enter fullscreen mode Exit fullscreen mode

All Claude Code requests now flow through Bifrost.

Track token usage, costs, and cache responses automatically.


LibreChat

Add to librechat.yaml:

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]

Enter fullscreen mode Exit fullscreen mode

Universal model access across all configured providers.


MCP (Model Context Protocol) Support

Bifrost supports MCP for tool calling and context management.

Configure MCP servers

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}

Enter fullscreen mode Exit fullscreen mode

Your LLM calls automatically gain access to MCP tools.

No manual tool definitions required.


Deployment Options

Docker

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest

Enter fullscreen mode Exit fullscreen mode

Docker Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data

Enter fullscreen mode Exit fullscreen mode

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080

Enter fullscreen mode Exit fullscreen mode

Terraform examples are available in the docs.


Real Integration Example

Before (Direct OpenAI)

import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent

openai.api_key = "sk-..."

llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm)

# No observability
# No caching
# No load balancing
# No failover

Enter fullscreen mode Exit fullscreen mode

After (Through Bifrost)

from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent

llm = ChatOpenAI(
    model="gpt-4",
    openai_api_base="http://localhost:8080/langchain"
)

agent = initialize_agent(tools, llm)

# Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓

Enter fullscreen mode Exit fullscreen mode

One line changed. All features enabled.


Migration Checklist

1. Install Bifrost

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

Enter fullscreen mode Exit fullscreen mode

2. Add API keys

  • Visit http://localhost:8080
  • Add your provider keys

3. Update base URL

openai.api_base = "http://localhost:8080/openai"

Enter fullscreen mode Exit fullscreen mode

LangChain:

openai_api_base = "http://localhost:8080/langchain"

Enter fullscreen mode Exit fullscreen mode

4. Test one request

Verify it works and check the dashboard.

5. Deploy

Everything else stays the same.

Total migration time: ~10 minutes.


Try It Yourself

git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up

Enter fullscreen mode Exit fullscreen mode

Full integration examples for LangChain, LiteLLM, and more are available in the GitHub repo.


The Bottom Line

Bifrost integrates with your existing stack in minutes:

  • OpenAI-compatible API (works everywhere)
  • Change one URL, keep all your code
  • Multi-provider support through one interface
  • Built-in observability with zero instrumentation

No refactoring. No new SDKs. Just drop it in.

Built by the team at Maxim AI — we also build evaluation and observability tools for production AI agents.

Top comments (0)