TL;DR: I migrated a production LLM workload from LiteLLM to Bifrost and the swap took about 30 minutes for the gateway, plus a few config translations. The OpenAI-compatible endpoint means application code did not change. This post walks through the full migration: config mapping, virtual key translation, semantic cache porting, and the gotchas I hit.
This post assumes familiarity with LiteLLM proxy mode, OpenAI-compatible APIs, and basic Docker or Node.js operations.
Why Teams Are Looking at the Migration
LiteLLM is the most widely adopted open-source LLM gateway and covers the breadth case well. The reasons I see teams move to Bifrost are usually one of three:
- Latency overhead. LiteLLM proxy adds roughly 8 milliseconds per request. Bifrost adds 11 microseconds at P99 latency at 5k RPS, which is 50x lower. For high-throughput agent workloads that matters.
- MCP support. LiteLLM does not have MCP gateway functionality. If you are running Claude Code or building agentic workflows that hit dozens of tool servers, that gap shows up fast.
- Dual-layer semantic caching. Bifrost ships exact match plus vector similarity caching with Weaviate, Redis, or Qdrant. LiteLLM has request-level caching but not the same dual-layer model.
I will walk through the migration assuming you have LiteLLM running today and want to switch without rewriting application code.
Step 1: Run Bifrost Side by Side
Before touching any application config, get Bifrost running on a different port. Default is 8080, so I leave that alone and keep LiteLLM on its existing port.
npx -y @maximhq/bifrost
That is the entire setup command for local testing. For production, the Docker option works:
docker run -p 8080:8080 maximhq/bifrost:latest
The Bifrost setup docs cover persistent volumes and configuration mounting if you need them.
Step 2: Translate Provider Configs
LiteLLM uses a config.yaml with model_list entries. Bifrost uses provider configs that map to providers, not individual models.
LiteLLM config:
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet-4-6
litellm_params:
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
Bifrost equivalent:
providers:
- name: openai
api_key: ${OPENAI_API_KEY}
allowed_models: ["gpt-4o", "gpt-4o-mini"]
weight: 1.0
- name: anthropic
api_key: ${ANTHROPIC_API_KEY}
allowed_models: ["claude-sonnet-4-6", "claude-opus-4-7"]
weight: 1.0
The mental shift: in LiteLLM, you list every model. In Bifrost, you list providers and use allowed_models to filter. The provider configuration docs cover the full schema.
One thing to watch: Bifrost is deny-by-default. If you forget to add a provider, every request to that provider returns a clear error. With LiteLLM, missing model entries return 404, which is harder to debug.
Step 3: Translate Virtual Keys and Budgets
LiteLLM virtual keys map to Bifrost virtual keys, but the budget model is different.
LiteLLM:
virtual_keys:
- key_name: customer-acme
key: sk-acme-abc
max_budget: 100.00
rpm_limit: 200
Bifrost:
virtual_keys:
- key_name: customer-acme
key: vk-acme-abc
rate_limit:
request_limit: 200
request_limit_duration: "1h"
token_limit: 500000
token_limit_duration: "1d"
budget_limit: 100.00
budget_duration: "1M"
allowed_models: ["gpt-4o", "claude-sonnet-4-6"]
The big addition is the four-tier budget hierarchy. Customer, Team, Virtual Key, and Provider Config limits all apply independently. A request must pass all four. Reset durations are calendar-aligned for 1d, 1w, 1M, 1Y in UTC, which matters if your billing aligns to calendar months. The budget and limits docs cover this in detail.
Step 4: Update Application Endpoints
Both gateways expose OpenAI-compatible endpoints, so most application code does not change. Only the base URL switches.
# Before (LiteLLM)
client = OpenAI(
base_url="http://litellm:4000/v1",
api_key="sk-acme-abc"
)
# After (Bifrost)
client = OpenAI(
base_url="http://bifrost:8080/openai/v1",
api_key="vk-acme-abc"
)
Bifrost also exposes /anthropic and /genai endpoints if you want to keep using the native Anthropic or Gemini SDKs. The drop-in replacement docs cover the full endpoint matrix.
Step 5: Port Semantic Caching
LiteLLM has request-level caching with Redis. Bifrost has dual-layer caching with vector similarity. The migration is not 1:1, you are upgrading the cache model.
semantic_cache:
enabled: true
vector_store: weaviate
weaviate_url: ${WEAVIATE_URL}
similarity_threshold: 0.92
conversation_history_threshold: 3
ttl_seconds: 86400
Every request needs the x-bf-cache-key header to participate in caching. In application code:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
extra_headers={"x-bf-cache-key": f"customer-{tenant_id}"}
)
The semantic caching docs cover threshold tuning and per-request overrides.
Comparison
| Capability | LiteLLM | Bifrost |
|---|---|---|
| Latency overhead | ~8ms | 11 microseconds |
| Throughput | Python-bound | 5,000 RPS single instance |
| Virtual keys | Yes | Yes |
| Budget tiers | Single | Four-tier (Customer/Team/VK/Provider) |
| Semantic caching | Request-level | Dual-layer with vector similarity |
| MCP gateway | No | Yes |
| Provider count | 100+ | Major providers + custom |
| Managed cloud | Yes | No (self-hosted only) |
Trade-offs and Limitations
Bifrost is self-hosted only. If you are using LiteLLM Cloud, you have to take on infrastructure operations yourself.
The provider catalog is smaller. LiteLLM supports 100+ providers out of the box. Bifrost covers the major ones (OpenAI, Anthropic, Gemini, Bedrock, Vertex, Azure OpenAI, Ollama, Together, Groq, Cohere) and lets you add custom providers, but if you depend on a niche provider, check the list before migrating.
The community is smaller and the project is newer. Documentation is solid but Stack Overflow answers and community plugins are still building up.
OpenRouter compatibility is broken because of a tool call streaming issue. If your stack routes through OpenRouter today, you cannot keep that path through Bifrost.
Quick Recap
- Application code does not change because Bifrost is OpenAI-compatible
- Provider configs replace LiteLLM model lists, with provider-level filtering via
allowed_models - Virtual keys translate directly, but the four-tier budget hierarchy is new
- Semantic caching upgrades from request-level to dual-layer with vector similarity
- Run side by side first, cutover by changing the base URL, roll back instantly if needed
Links
- Bifrost on GitHub: https://git.new/bifrost
- Bifrost Website: https://getmax.im/bifrost-home
- Bifrost Docs: https://getmax.im/bifrostdocs
Further Reading
- https://docs.getbifrost.ai/quickstart/gateway/setting-up
- https://docs.getbifrost.ai/quickstart/gateway/provider-configuration
- https://docs.getbifrost.ai/features/governance/virtual-keys
- https://docs.getbifrost.ai/features/governance/budget-and-limits
- https://docs.getbifrost.ai/features/semantic-caching
Top comments (0)