Pranay Batta

Posted on May 5

Migrating from LiteLLM to Bifrost: A Step-by-Step Guide

#llm #gateway #migration #tutorial

TL;DR: I migrated a production LLM workload from LiteLLM to Bifrost and the swap took about 30 minutes for the gateway, plus a few config translations. The OpenAI-compatible endpoint means application code did not change. This post walks through the full migration: config mapping, virtual key translation, semantic cache porting, and the gotchas I hit.

This post assumes familiarity with LiteLLM proxy mode, OpenAI-compatible APIs, and basic Docker or Node.js operations.

Why Teams Are Looking at the Migration

LiteLLM is the most widely adopted open-source LLM gateway and covers the breadth case well. The reasons I see teams move to Bifrost are usually one of three:

Latency overhead. LiteLLM proxy adds roughly 8 milliseconds per request. Bifrost adds 11 microseconds at P99 latency at 5k RPS, which is 50x lower. For high-throughput agent workloads that matters.
MCP support. LiteLLM does not have MCP gateway functionality. If you are running Claude Code or building agentic workflows that hit dozens of tool servers, that gap shows up fast.
Dual-layer semantic caching. Bifrost ships exact match plus vector similarity caching with Weaviate, Redis, or Qdrant. LiteLLM has request-level caching but not the same dual-layer model.

I will walk through the migration assuming you have LiteLLM running today and want to switch without rewriting application code.

Step 1: Run Bifrost Side by Side

Before touching any application config, get Bifrost running on a different port. Default is 8080, so I leave that alone and keep LiteLLM on its existing port.

npx -y @maximhq/bifrost

That is the entire setup command for local testing. For production, the Docker option works:

docker run -p 8080:8080 maximhq/bifrost:latest

The Bifrost setup docs cover persistent volumes and configuration mounting if you need them.

Step 2: Translate Provider Configs

LiteLLM uses a config.yaml with model_list entries. Bifrost uses provider configs that map to providers, not individual models.

LiteLLM config:

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY

Bifrost equivalent:

providers:
  - name: openai
    api_key: ${OPENAI_API_KEY}
    allowed_models: ["gpt-4o", "gpt-4o-mini"]
    weight: 1.0
  - name: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    allowed_models: ["claude-sonnet-4-6", "claude-opus-4-7"]
    weight: 1.0

The mental shift: in LiteLLM, you list every model. In Bifrost, you list providers and use allowed_models to filter. The provider configuration docs cover the full schema.

One thing to watch: Bifrost is deny-by-default. If you forget to add a provider, every request to that provider returns a clear error. With LiteLLM, missing model entries return 404, which is harder to debug.

Step 3: Translate Virtual Keys and Budgets

LiteLLM virtual keys map to Bifrost virtual keys, but the budget model is different.

LiteLLM:

virtual_keys:
  - key_name: customer-acme
    key: sk-acme-abc
    max_budget: 100.00
    rpm_limit: 200

Bifrost:

virtual_keys:
  - key_name: customer-acme
    key: vk-acme-abc
    rate_limit:
      request_limit: 200
      request_limit_duration: "1h"
      token_limit: 500000
      token_limit_duration: "1d"
    budget_limit: 100.00
    budget_duration: "1M"
    allowed_models: ["gpt-4o", "claude-sonnet-4-6"]

The big addition is the four-tier budget hierarchy. Customer, Team, Virtual Key, and Provider Config limits all apply independently. A request must pass all four. Reset durations are calendar-aligned for 1d, 1w, 1M, 1Y in UTC, which matters if your billing aligns to calendar months. The budget and limits docs cover this in detail.

Step 4: Update Application Endpoints

Both gateways expose OpenAI-compatible endpoints, so most application code does not change. Only the base URL switches.

# Before (LiteLLM)
client = OpenAI(
    base_url="http://litellm:4000/v1",
    api_key="sk-acme-abc"
)

# After (Bifrost)
client = OpenAI(
    base_url="http://bifrost:8080/openai/v1",
    api_key="vk-acme-abc"
)

Bifrost also exposes /anthropic and /genai endpoints if you want to keep using the native Anthropic or Gemini SDKs. The drop-in replacement docs cover the full endpoint matrix.

Step 5: Port Semantic Caching

LiteLLM has request-level caching with Redis. Bifrost has dual-layer caching with vector similarity. The migration is not 1:1, you are upgrading the cache model.

semantic_cache:
  enabled: true
  vector_store: weaviate
  weaviate_url: ${WEAVIATE_URL}
  similarity_threshold: 0.92
  conversation_history_threshold: 3
  ttl_seconds: 86400

Every request needs the x-bf-cache-key header to participate in caching. In application code:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    extra_headers={"x-bf-cache-key": f"customer-{tenant_id}"}
)

The semantic caching docs cover threshold tuning and per-request overrides.

Comparison

Capability	LiteLLM	Bifrost
Latency overhead	~8ms	11 microseconds
Throughput	Python-bound	5,000 RPS single instance
Virtual keys	Yes	Yes
Budget tiers	Single	Four-tier (Customer/Team/VK/Provider)
Semantic caching	Request-level	Dual-layer with vector similarity
MCP gateway	No	Yes
Provider count	100+	Major providers + custom
Managed cloud	Yes	No (self-hosted only)

Trade-offs and Limitations

Bifrost is self-hosted only. If you are using LiteLLM Cloud, you have to take on infrastructure operations yourself.

The provider catalog is smaller. LiteLLM supports 100+ providers out of the box. Bifrost covers the major ones (OpenAI, Anthropic, Gemini, Bedrock, Vertex, Azure OpenAI, Ollama, Together, Groq, Cohere) and lets you add custom providers, but if you depend on a niche provider, check the list before migrating.

The community is smaller and the project is newer. Documentation is solid but Stack Overflow answers and community plugins are still building up.

OpenRouter compatibility is broken because of a tool call streaming issue. If your stack routes through OpenRouter today, you cannot keep that path through Bifrost.

Quick Recap

Application code does not change because Bifrost is OpenAI-compatible
Provider configs replace LiteLLM model lists, with provider-level filtering via allowed_models
Virtual keys translate directly, but the four-tier budget hierarchy is new
Semantic caching upgrades from request-level to dual-layer with vector similarity
Run side by side first, cutover by changing the base URL, roll back instantly if needed

DEV Community