Why We Needed This
Our team builds AI evaluation and observability tools at Maxim.
We work with companies running production AI systems, and the same question kept coming up:
“Which LLM gateway should we use?”
So we decided to actually test them.
Not just read docs.
Not just check GitHub stars.
We ran real production workloads through 13 different LLM gateways and measured what actually happens.
What We Tested
We evaluated gateways across five categories:
Performance — latency, throughput, memory usage
Features — routing, caching, observability, failover
Integration — how easy it is to drop into existing code
Cost — pricing model and hidden costs
Production-readiness — stability, monitoring, enterprise features
Test workload:
500 RPS sustained traffic
Mix of GPT-4 and Claude requests
Real customer support queries
The Results (Honest Take)
Tier 1: Production-Ready at Scale
1. Bifrost (Ours — but hear us out)
We built Bifrost because nothing else met our scale requirements.
Pros
Fastest in our tests (~11 μs overhead at 5K RPS)
Rock-solid memory usage (~1.4 GB stable under load)
Semantic caching actually works
Adaptive load balancing automatically downweights degraded keys
Open source (MIT)
Cons
Smaller community than LiteLLM
Go-based (great for performance, harder for Python-only teams)
Fewer provider integrations than older tools
Best for:
High-throughput production (500+ RPS), teams prioritizing performance and cost efficiency
Repo: https://github.com/maximhq/bifrost
2. Portkey
Strong commercial offering with solid enterprise features.
Pros
Excellent observability UI
Good multi-provider support
Reliability features (fallbacks, retries)
Enterprise support
Cons
Pricing scales up quickly at volume
Platform lock-in
Some latency overhead vs open source tools
Best for:
Enterprises that want a fully managed solution
3. Kong
API gateway giant with an LLM plugin.
Pros
Battle-tested infrastructure
Massive plugin ecosystem
Enterprise features (auth, rate limiting)
Multi-cloud support
Cons
Complex setup for LLM-specific workflows
Overkill if you just need LLM routing
Steep learning curve
Best for:
Teams already using Kong that want LLM support
Tier 2: Good for Most Use Cases
4. LiteLLM
The most popular open-source option. We used this before Bifrost.
Pros
Huge community
Supports almost every provider
Python-friendly
Easy to get started
Cons
Performance issues above ~300 RPS (we hit this)
Memory usage grows over time
P99 latency spikes under load
Best for:
Prototyping, low-traffic apps (<200 RPS), Python teams
5. Unify
A unified API approach.
Pros
Single API for all providers
Benchmark-driven routing
Good developer experience
Cons
Relatively new
Limited enterprise features
High-scale performance unproven
Best for:
Developers prioritizing simplicity over control
6. Martian
Focused on prompt management and observability.
Pros
Strong prompt versioning
Good observability features
Decent multi-provider support
Cons
Smaller user base
Limited documentation
Pricing unclear at scale
Best for:
Teams prioritizing prompt workflows
Tier 3: Specialized Use Cases
7. OpenRouter
Pay-as-you-go access to many models.
Pros
No API key management
Instant access to many models
Simple pricing
Cons
Markup on model costs
Less routing control
Not ideal for high-volume production
Best for:
Rapid prototyping, model experimentation
8. AI Gateway (Cloudflare)
Part of Cloudflare’s edge platform.
Pros
Runs at the edge
Built-in caching
Familiar Cloudflare dashboard
Cons
Locked into Cloudflare ecosystem
Limited LLM-specific features
Basic routing
Best for:
Teams already heavily using Cloudflare
9. KeyWorthy
Newer entrant focused on cost optimization.
Pros
Cost analytics focus
Multi-provider routing
Usage tracking
Cons
Limited production track record
Smaller feature set
Unknown scaling behavior
Best for:
Cost-conscious teams and early adopters
Tier 4: Niche or Limited
10. Langfuse
More observability than gateway.
Pros
Excellent tracing and analytics
Open source
Strong LangChain integration
Cons
Not a true gateway
No routing or caching
Separate deployment
Best for:
Deep observability alongside another gateway
11. MLflow AI Gateway
Part of the MLflow ecosystem.
Pros
Integrates with MLflow workflows
Useful if already using MLflow
Cons
Limited LLM-specific features
Heavy for simple routing
Better alternatives exist
Best for:
ML teams deeply invested in MLflow
12. BricksLLM
Basic open-source gateway.
Pros
Simple setup
Cost tracking
Open source
Cons
Limited feature set
Small community
Performance not battle-tested
Best for:
Very basic gateway needs
13. Helicone
Observability-first with light gateway features.
Pros
Good logging and monitoring
Easy integration
Generous free tier
Cons
More observability than gateway
Limited routing logic
Not built for high throughput
Best for:
Observability-first teams
Our Real Production Stack
We run Bifrost in production for our own infrastructure.
Requirements
Handle 2,000+ RPS during peaks
P99 latency < 500 ms
Predictable costs
Zero manual intervention
What we tried
Direct OpenAI calls → no observability
LiteLLM → broke around 300 RPS
Portkey → great features, higher cost
Bifrost → met all requirements
Current setup
Bifrost (single t3.large)
├─ 3 OpenAI keys (adaptive load balancing)
├─ 2 Anthropic keys (automatic failover)
├─ Semantic caching (40% hit rate)
├─ Maxim observability plugin
└─ Prometheus metrics
Results
2,500 RPS peak, stable
P99: 380 ms
Cost: ~$60/month infra + LLM usage
Uptime: 99.97% (30+ days, no restart)
Decision Framework
Under 100 RPS
LiteLLM
Helicone (if observability matters)
OpenRouter
100–500 RPS
Bifrost
Portkey
LiteLLM (watch performance)
500+ RPS
Bifrost
Portkey (if budget allows)
Kong (enterprise needs)
Specialized Needs
Prompt management → Martian
Cloudflare stack → AI Gateway
MLflow ecosystem → MLflow AI Gateway
Observability focus → Langfuse + separate gateway
What Actually Matters
After testing 13 gateways, these matter most:
Performance under your load
Benchmarks lie. Test real traffic. P99 > P50.Total cost (not list pricing)
Infra + LLM usage + engineering time + lock-in.Observability
Can you debug failures, latency, and cost?Reliability
Failover, rate limits, auto-recovery.Migration path
Can you leave later? Can you self-host?
Our Recommendations
Most teams starting out: LiteLLM → migrate later
High-growth startups: Bifrost or Portkey from day one
Enterprises: Portkey or Kong
Cost-sensitive teams: Bifrost + good monitoring
Try Bifrost
It’s open source (MIT), so you can verify everything:
git clone https://github.com/maximhq/bifrost cd bifrost
docker compose up
Run benchmarks yourself:
cd benchmarks
./benchmark -provider bifrost -rate 500 -duration 60
Compare with your current setup.
The Honest Truth
There’s no perfect LLM gateway:
LiteLLM: Easy, but doesn’t scale well
Portkey: Feature-rich, expensive at scale
Bifrost: Fast, smaller ecosystem
Kong: Enterprise-grade, complex
Pick based on where you are now, not where you might be.
We went through three gateways before building our own.
Most teams won’t need to.
Links
Bifrost repo: https://github.com/maximhq/bifrost
We’re the team at Maxim AI, building evaluation and observability tools for production AI systems.
Bifrost is our open-source LLM gateway, alongside our testing and monitoring platforms.

Top comments (0)