DEV Community

Cover image for Migrating from LiteLLM to Bifrost: A Step-by-Step Guide
Pranay Batta
Pranay Batta

Posted on

Migrating from LiteLLM to Bifrost: A Step-by-Step Guide

TL;DR: I migrated a production LLM workload from LiteLLM to Bifrost and the swap took about 30 minutes for the gateway, plus a few config translations. The OpenAI-compatible endpoint means application code did not change. This post walks through the full migration: config mapping, virtual key translation, semantic cache porting, and the gotchas I hit.

This post assumes familiarity with LiteLLM proxy mode, OpenAI-compatible APIs, and basic Docker or Node.js operations.

Why Teams Are Looking at the Migration

LiteLLM is the most widely adopted open-source LLM gateway and covers the breadth case well. The reasons I see teams move to Bifrost are usually one of three:

  1. Latency overhead. LiteLLM proxy adds roughly 8 milliseconds per request. Bifrost adds 11 microseconds at P99 latency at 5k RPS, which is 50x lower. For high-throughput agent workloads that matters.
  2. MCP support. LiteLLM does not have MCP gateway functionality. If you are running Claude Code or building agentic workflows that hit dozens of tool servers, that gap shows up fast.
  3. Dual-layer semantic caching. Bifrost ships exact match plus vector similarity caching with Weaviate, Redis, or Qdrant. LiteLLM has request-level caching but not the same dual-layer model.

I will walk through the migration assuming you have LiteLLM running today and want to switch without rewriting application code.

Step 1: Run Bifrost Side by Side

Before touching any application config, get Bifrost running on a different port. Default is 8080, so I leave that alone and keep LiteLLM on its existing port.

npx -y @maximhq/bifrost
Enter fullscreen mode Exit fullscreen mode

That is the entire setup command for local testing. For production, the Docker option works:

docker run -p 8080:8080 maximhq/bifrost:latest
Enter fullscreen mode Exit fullscreen mode

The Bifrost setup docs cover persistent volumes and configuration mounting if you need them.

Step 2: Translate Provider Configs

LiteLLM uses a config.yaml with model_list entries. Bifrost uses provider configs that map to providers, not individual models.

LiteLLM config:

model_list:
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY
  - model_name: claude-sonnet-4-6
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY
Enter fullscreen mode Exit fullscreen mode

Bifrost equivalent:

providers:
  - name: openai
    api_key: ${OPENAI_API_KEY}
    allowed_models: ["gpt-4o", "gpt-4o-mini"]
    weight: 1.0
  - name: anthropic
    api_key: ${ANTHROPIC_API_KEY}
    allowed_models: ["claude-sonnet-4-6", "claude-opus-4-7"]
    weight: 1.0
Enter fullscreen mode Exit fullscreen mode

The mental shift: in LiteLLM, you list every model. In Bifrost, you list providers and use allowed_models to filter. The provider configuration docs cover the full schema.

One thing to watch: Bifrost is deny-by-default. If you forget to add a provider, every request to that provider returns a clear error. With LiteLLM, missing model entries return 404, which is harder to debug.

Step 3: Translate Virtual Keys and Budgets

LiteLLM virtual keys map to Bifrost virtual keys, but the budget model is different.

LiteLLM:

virtual_keys:
  - key_name: customer-acme
    key: sk-acme-abc
    max_budget: 100.00
    rpm_limit: 200
Enter fullscreen mode Exit fullscreen mode

Bifrost:

virtual_keys:
  - key_name: customer-acme
    key: vk-acme-abc
    rate_limit:
      request_limit: 200
      request_limit_duration: "1h"
      token_limit: 500000
      token_limit_duration: "1d"
    budget_limit: 100.00
    budget_duration: "1M"
    allowed_models: ["gpt-4o", "claude-sonnet-4-6"]
Enter fullscreen mode Exit fullscreen mode

The big addition is the four-tier budget hierarchy. Customer, Team, Virtual Key, and Provider Config limits all apply independently. A request must pass all four. Reset durations are calendar-aligned for 1d, 1w, 1M, 1Y in UTC, which matters if your billing aligns to calendar months. The budget and limits docs cover this in detail.

Step 4: Update Application Endpoints

Both gateways expose OpenAI-compatible endpoints, so most application code does not change. Only the base URL switches.

# Before (LiteLLM)
client = OpenAI(
    base_url="http://litellm:4000/v1",
    api_key="sk-acme-abc"
)

# After (Bifrost)
client = OpenAI(
    base_url="http://bifrost:8080/openai/v1",
    api_key="vk-acme-abc"
)
Enter fullscreen mode Exit fullscreen mode

Bifrost also exposes /anthropic and /genai endpoints if you want to keep using the native Anthropic or Gemini SDKs. The drop-in replacement docs cover the full endpoint matrix.

Step 5: Port Semantic Caching

LiteLLM has request-level caching with Redis. Bifrost has dual-layer caching with vector similarity. The migration is not 1:1, you are upgrading the cache model.

semantic_cache:
  enabled: true
  vector_store: weaviate
  weaviate_url: ${WEAVIATE_URL}
  similarity_threshold: 0.92
  conversation_history_threshold: 3
  ttl_seconds: 86400
Enter fullscreen mode Exit fullscreen mode

Every request needs the x-bf-cache-key header to participate in caching. In application code:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    extra_headers={"x-bf-cache-key": f"customer-{tenant_id}"}
)
Enter fullscreen mode Exit fullscreen mode

The semantic caching docs cover threshold tuning and per-request overrides.

Comparison

Capability LiteLLM Bifrost
Latency overhead ~8ms 11 microseconds
Throughput Python-bound 5,000 RPS single instance
Virtual keys Yes Yes
Budget tiers Single Four-tier (Customer/Team/VK/Provider)
Semantic caching Request-level Dual-layer with vector similarity
MCP gateway No Yes
Provider count 100+ Major providers + custom
Managed cloud Yes No (self-hosted only)

Trade-offs and Limitations

Bifrost is self-hosted only. If you are using LiteLLM Cloud, you have to take on infrastructure operations yourself.

The provider catalog is smaller. LiteLLM supports 100+ providers out of the box. Bifrost covers the major ones (OpenAI, Anthropic, Gemini, Bedrock, Vertex, Azure OpenAI, Ollama, Together, Groq, Cohere) and lets you add custom providers, but if you depend on a niche provider, check the list before migrating.

The community is smaller and the project is newer. Documentation is solid but Stack Overflow answers and community plugins are still building up.

OpenRouter compatibility is broken because of a tool call streaming issue. If your stack routes through OpenRouter today, you cannot keep that path through Bifrost.

Quick Recap

  • Application code does not change because Bifrost is OpenAI-compatible
  • Provider configs replace LiteLLM model lists, with provider-level filtering via allowed_models
  • Virtual keys translate directly, but the four-tier budget hierarchy is new
  • Semantic caching upgrades from request-level to dual-layer with vector similarity
  • Run side by side first, cutover by changing the base URL, roll back instantly if needed

Links

Further Reading

Top comments (0)