Debby McKinney

Posted on Dec 14, 2025

We Evaluated 13 LLM Gateways for Production. Here's What We Found

#tooling #performance #llm #ai

Why We Needed This

Our team builds AI evaluation and observability tools at Maxim.

We work with companies running production AI systems, and the same question kept coming up:

“Which LLM gateway should we use?”

So we decided to actually test them.

Not just read docs.

Not just check GitHub stars.

We ran real production workloads through 13 different LLM gateways and measured what actually happens.

What We Tested

We evaluated gateways across five categories:

Performance — latency, throughput, memory usage
Features — routing, caching, observability, failover
Integration — how easy it is to drop into existing code
Cost — pricing model and hidden costs
Production-readiness — stability, monitoring, enterprise features

Test workload:

500 RPS sustained traffic
Mix of GPT-4 and Claude requests
Real customer support queries

The Results (Honest Take)

Tier 1: Production-Ready at Scale

1. Bifrost (Ours — but hear us out)

We built Bifrost because nothing else met our scale requirements.

Pros

Fastest in our tests (~11 μs overhead at 5K RPS)
Rock-solid memory usage (~1.4 GB stable under load)
Semantic caching actually works
Adaptive load balancing automatically downweights degraded keys
Open source (MIT)

Cons

Smaller community than LiteLLM
Go-based (great for performance, harder for Python-only teams)
Fewer provider integrations than older tools

Best for:

High-throughput production (500+ RPS), teams prioritizing performance and cost efficiency

Repo: https://github.com/maximhq/bifrost

2. Portkey

Strong commercial offering with solid enterprise features.

Pros

Excellent observability UI
Good multi-provider support
Reliability features (fallbacks, retries)
Enterprise support

Cons

Pricing scales up quickly at volume
Platform lock-in
Some latency overhead vs open source tools

Best for:

Enterprises that want a fully managed solution

3. Kong

API gateway giant with an LLM plugin.

Pros

Battle-tested infrastructure
Massive plugin ecosystem
Enterprise features (auth, rate limiting)
Multi-cloud support

Cons

Complex setup for LLM-specific workflows
Overkill if you just need LLM routing
Steep learning curve

Best for:

Teams already using Kong that want LLM support

Tier 2: Good for Most Use Cases

4. LiteLLM

The most popular open-source option. We used this before Bifrost.

Pros

Huge community
Supports almost every provider
Python-friendly
Easy to get started

Cons

Performance issues above ~300 RPS (we hit this)
Memory usage grows over time
P99 latency spikes under load

Best for:

Prototyping, low-traffic apps (<200 RPS), Python teams

5. Unify

A unified API approach.

Pros

Single API for all providers
Benchmark-driven routing
Good developer experience

Cons

Relatively new
Limited enterprise features
High-scale performance unproven

Best for:

Developers prioritizing simplicity over control

6. Martian

Focused on prompt management and observability.

Pros

Strong prompt versioning
Good observability features
Decent multi-provider support

Cons

Smaller user base
Limited documentation
Pricing unclear at scale

Best for:

Teams prioritizing prompt workflows

Tier 3: Specialized Use Cases

7. OpenRouter

Pay-as-you-go access to many models.

Pros

No API key management
Instant access to many models
Simple pricing

Cons

Markup on model costs
Less routing control
Not ideal for high-volume production

Best for:

Rapid prototyping, model experimentation

8. AI Gateway (Cloudflare)

Part of Cloudflare’s edge platform.

Pros

Runs at the edge
Built-in caching
Familiar Cloudflare dashboard

Cons

Locked into Cloudflare ecosystem
Limited LLM-specific features
Basic routing

Best for:

Teams already heavily using Cloudflare

9. KeyWorthy

Newer entrant focused on cost optimization.

Pros

Cost analytics focus
Multi-provider routing
Usage tracking

Cons

Limited production track record
Smaller feature set
Unknown scaling behavior

Best for:

Cost-conscious teams and early adopters

Tier 4: Niche or Limited

10. Langfuse

More observability than gateway.

Pros

Excellent tracing and analytics
Open source
Strong LangChain integration

Cons

Not a true gateway
No routing or caching
Separate deployment

Best for:

Deep observability alongside another gateway

11. MLflow AI Gateway

Part of the MLflow ecosystem.

Pros

Integrates with MLflow workflows
Useful if already using MLflow

Cons

Limited LLM-specific features
Heavy for simple routing
Better alternatives exist

Best for:

ML teams deeply invested in MLflow

12. BricksLLM

Basic open-source gateway.

Pros

Simple setup
Cost tracking
Open source

Cons

Limited feature set
Small community
Performance not battle-tested

Best for:

Very basic gateway needs

13. Helicone

Observability-first with light gateway features.

Pros

Good logging and monitoring
Easy integration
Generous free tier

Cons

More observability than gateway
Limited routing logic
Not built for high throughput

Best for:

Observability-first teams

Our Real Production Stack

We run Bifrost in production for our own infrastructure.

Requirements

Handle 2,000+ RPS during peaks
P99 latency < 500 ms
Predictable costs
Zero manual intervention

What we tried

Direct OpenAI calls → no observability
LiteLLM → broke around 300 RPS
Portkey → great features, higher cost
Bifrost → met all requirements

Current setup

Bifrost (single t3.large) ├─ 3 OpenAI keys (adaptive load balancing) ├─ 2 Anthropic keys (automatic failover) ├─ Semantic caching (40% hit rate) ├─ Maxim observability plugin └─ Prometheus metrics

Results

2,500 RPS peak, stable
P99: 380 ms
Cost: ~$60/month infra + LLM usage
Uptime: 99.97% (30+ days, no restart)

Decision Framework

Under 100 RPS

LiteLLM
Helicone (if observability matters)
OpenRouter

100–500 RPS

Bifrost
Portkey
LiteLLM (watch performance)

500+ RPS

Bifrost
Portkey (if budget allows)
Kong (enterprise needs)

Specialized Needs

Prompt management → Martian
Cloudflare stack → AI Gateway
MLflow ecosystem → MLflow AI Gateway
Observability focus → Langfuse + separate gateway

What Actually Matters

After testing 13 gateways, these matter most:

Performance under your load

Benchmarks lie. Test real traffic. P99 > P50.
Total cost (not list pricing)

Infra + LLM usage + engineering time + lock-in.
Observability

Can you debug failures, latency, and cost?
Reliability

Failover, rate limits, auto-recovery.
Migration path

Can you leave later? Can you self-host?

Our Recommendations

Most teams starting out: LiteLLM → migrate later
High-growth startups: Bifrost or Portkey from day one
Enterprises: Portkey or Kong
Cost-sensitive teams: Bifrost + good monitoring

Try Bifrost

It’s open source (MIT), so you can verify everything:

git clone https://github.com/maximhq/bifrost cd bifrost docker compose up

Run benchmarks yourself:

cd benchmarks ./benchmark -provider bifrost -rate 500 -duration 60

Compare with your current setup.

The Honest Truth

There’s no perfect LLM gateway:

LiteLLM: Easy, but doesn’t scale well
Portkey: Feature-rich, expensive at scale
Bifrost: Fast, smaller ecosystem
Kong: Enterprise-grade, complex

Pick based on where you are now, not where you might be.

We went through three gateways before building our own.

Most teams won’t need to.

Links

Bifrost repo: https://github.com/maximhq/bifrost
Docs: https://docs.getbifrost.ai

We’re the team at Maxim AI, building evaluation and observability tools for production AI systems.

Bifrost is our open-source LLM gateway, alongside our testing and monitoring platforms.

DEV Community

We Evaluated 13 LLM Gateways for Production. Here's What We Found

Why We Needed This

What We Tested

The Results (Honest Take)

Tier 1: Production-Ready at Scale

1. Bifrost (Ours — but hear us out)

2. Portkey

3. Kong

Tier 2: Good for Most Use Cases

4. LiteLLM

5. Unify

6. Martian

Tier 3: Specialized Use Cases

7. OpenRouter

8. AI Gateway (Cloudflare)

9. KeyWorthy

Tier 4: Niche or Limited

10. Langfuse

11. MLflow AI Gateway

12. BricksLLM

13. Helicone

Our Real Production Stack

Decision Framework

Under 100 RPS

100–500 RPS

500+ RPS

Specialized Needs

What Actually Matters

Our Recommendations

Try Bifrost

The Honest Truth

Top comments (0)