Paul Twist

Posted on Jun 23

LiteLLM vs Bifrost: I Tested Both in Production. Here's What Actually Matters.

#ai #webdev #programming #discuss

I spent two weeks running LiteLLM and Bifrost side by side. Same traffic, same models, same infra. I needed to pick one gateway for our team and I wanted real numbers, not marketing pages.

This is what I found.

The Setup

Both gateways sat behind the same load balancer. Traffic split 50/50. Backend was a mix of OpenAI, Anthropic, and Bedrock calls. Nothing synthetic. Real user-facing requests from our agent platform, roughly 200-400 RPS during business hours.

I tested on c5.xlarge instances (4 vCPUs, 8GB RAM). Not the t3.medium you see in most benchmarks. If you're choosing a production gateway, you should test on production hardware.

Providers: 100+ vs 23

This was the first filter. LiteLLM supports 100+ providers. Bifrost supports around 23.

For most teams running OpenAI and Anthropic, 23 is enough. But we also route to Bedrock, Vertex, Groq, Deepseek, and a few custom OpenAI-compatible endpoints. LiteLLM handled all of them with the same config pattern:

model_list:
  - model_name: fast-chat
    litellm_params:
      model: groq/llama-3.1-70b-versatile
  - model_name: fast-chat
    litellm_params:
      model: deepseek/deepseek-chat
  - model_name: fast-chat
    litellm_params:
      model: openai/gpt-4o-mini

Three providers, one model name, automatic load balancing. Adding a new provider is one YAML block. With Bifrost, some of our providers simply weren't supported. That was a dealbreaker before we even got to performance.

Performance: The Honest Version

Bifrost is faster on raw gateway overhead. That's not marketing, it's just Go vs Python. Their benchmark claims 11µs overhead at 5K RPS. I measured around 0.08ms on my hardware, which is still excellent.

LiteLLM's Python proxy added roughly 7-8ms overhead per request. On a single instance at 1K RPS, Bifrost is measurably faster.

But here's what every Bifrost benchmark leaves out: the actual LLM call takes 500ms to 30 seconds. That 7ms overhead is 0.3% of your total latency on a fast model call and effectively invisible on a slow one. I wrote about this in my latency post.

And then there's LiteLLM-Rust. The team just shipped a Rust-based gateway path that brings overhead down to 0.05ms, 15x the throughput on 11x less memory. The single-instance performance gap that Bifrost's entire pitch depends on is closing fast.

# LiteLLM-Rust benchmarks (same workload)
Rust gateway:  6,782 RPS | 32MB RAM  | 0.05ms overhead
Python proxy:    453 RPS | 359MB RAM | 7.5ms overhead

If raw gateway latency is your only criteria, wait three months and re-evaluate.

Spend Tracking: Where It Gets Real

This is where the comparison stops being close. LiteLLM tracks spend automatically across every provider, every key, every team. You get per-key budgets, per-team budgets, daily spend reports, and a UI that shows it all without extra config.

# Check spend for a specific key
curl http://localhost:4000/spend/keys   -H "Authorization: Bearer sk-admin-key"

# Set a hard budget on a virtual key
curl -X POST http://localhost:4000/key/generate   -H "Authorization: Bearer sk-admin-key"   -d '{"max_budget": 100.0, "budget_duration": "monthly"}'

Bifrost has virtual keys with budget limits and rate limiting at the key, team, and customer level. It's functional. But LiteLLM's spend tracking goes deeper. You get cost attribution per model, per provider, per deployment. The /global/spend/report endpoint gives you a breakdown your finance team can actually use.

When you're running 10M+ calls a month across 6 providers, "which team spent how much on which model" is not a nice-to-have. It's the question your CTO asks every Monday.

Routing: More Strategies, More Control

LiteLLM ships five routing strategies out of the box: simple-shuffle, least-busy, latency-based, cost-based, and usage-based. You pick one in your config:

router_settings:
  routing_strategy: latency-based-routing
  routing_strategy_args:
    ttl: 60

Bifrost has weighted load balancing and adaptive routing. Solid for distributing traffic across keys and providers. But I couldn't find a cost-based routing option. If you want "always pick the cheapest model that can handle this request," LiteLLM does that natively.

Observability

Bifrost ships with built-in Prometheus metrics, OpenTelemetry, Datadog integration, and their own Maxim observability platform. The built-in logging to SQLite or Postgres is nice for smaller setups.

LiteLLM integrates with Langfuse, Arize Phoenix, LangSmith, Datadog, and generic OpenTelemetry. It's more of a "bring your own observability" approach, which means you're not locked into anyone's dashboard.

Both are solid here. Bifrost has slightly better out-of-the-box experience. LiteLLM has more integration options.

Community and Ecosystem

LiteLLM: 45K+ GitHub stars. Massive community. Weekly releases. AWS just made it a first-class provider in Bedrock AgentCore. Adobe, Netflix, Spotify run it in production.

Bifrost: ~5.9K stars. Backed by Maxim AI. Active development but smaller community. Last commit was June 8 as of this writing, with a two-week quiet stretch.

The community gap matters when you hit an edge case at 2 AM and need to search GitHub issues.

Where Bifrost Wins

Raw single-instance gateway overhead. If you need absolute minimum latency added per request and your provider list is under 23, Bifrost is genuinely fast. Their MCP Code Mode that reduces token usage for multi-tool agents is also clever engineering. And the zero-config startup experience is clean.

Where LiteLLM Wins

Provider coverage (100+ vs 23). Spend tracking depth. Routing strategy options. Community size and maturity. Enterprise adoption at scale. And LiteLLM-Rust is about to eliminate the performance argument entirely.

My Pick

I went with LiteLLM. The provider coverage was the first filter, the spend tracking was the closer. When your CFO asks "how much did the coding agent team spend on Claude last month," you need a real answer, not a Prometheus query you have to build yourself.

Bifrost is solid engineering. For a team running only OpenAI and Anthropic at moderate scale, it's a legitimate option. But for anything beyond that, the provider breadth and enterprise features in LiteLLM make it the more practical choice.

The "50x faster" benchmark? Run your own test on real hardware with real traffic. The gateway overhead disappears into noise the moment an actual LLM responds.

DEV Community