DEV Community

Cover image for We Evaluated 13 LLM Gateways for Production. Here's What We Found
Debby McKinney
Debby McKinney

Posted on

We Evaluated 13 LLM Gateways for Production. Here's What We Found

Why We Needed This

Our team builds AI evaluation and observability tools at Maxim.

We work with companies running production AI systems, and the same question kept coming up:

“Which LLM gateway should we use?”

So we decided to actually test them.

Not just read docs.

Not just check GitHub stars.

We ran real production workloads through 13 different LLM gateways and measured what actually happens.

Research


What We Tested

We evaluated gateways across five categories:

  1. Performance — latency, throughput, memory usage

  2. Features — routing, caching, observability, failover

  3. Integration — how easy it is to drop into existing code

  4. Cost — pricing model and hidden costs

  5. Production-readiness — stability, monitoring, enterprise features

Test workload:

  • 500 RPS sustained traffic

  • Mix of GPT-4 and Claude requests

  • Real customer support queries


The Results (Honest Take)

Tier 1: Production-Ready at Scale

1. Bifrost (Ours — but hear us out)

We built Bifrost because nothing else met our scale requirements.

Pros

  • Fastest in our tests (~11 μs overhead at 5K RPS)

  • Rock-solid memory usage (~1.4 GB stable under load)

  • Semantic caching actually works

  • Adaptive load balancing automatically downweights degraded keys

  • Open source (MIT)

Cons

  • Smaller community than LiteLLM

  • Go-based (great for performance, harder for Python-only teams)

  • Fewer provider integrations than older tools

Best for:

High-throughput production (500+ RPS), teams prioritizing performance and cost efficiency

Repo: https://github.com/maximhq/bifrost


2. Portkey

Strong commercial offering with solid enterprise features.

Pros

  • Excellent observability UI

  • Good multi-provider support

  • Reliability features (fallbacks, retries)

  • Enterprise support

Cons

  • Pricing scales up quickly at volume

  • Platform lock-in

  • Some latency overhead vs open source tools

Best for:

Enterprises that want a fully managed solution


3. Kong

API gateway giant with an LLM plugin.

Pros

  • Battle-tested infrastructure

  • Massive plugin ecosystem

  • Enterprise features (auth, rate limiting)

  • Multi-cloud support

Cons

  • Complex setup for LLM-specific workflows

  • Overkill if you just need LLM routing

  • Steep learning curve

Best for:

Teams already using Kong that want LLM support


Tier 2: Good for Most Use Cases

4. LiteLLM

The most popular open-source option. We used this before Bifrost.

Pros

  • Huge community

  • Supports almost every provider

  • Python-friendly

  • Easy to get started

Cons

  • Performance issues above ~300 RPS (we hit this)

  • Memory usage grows over time

  • P99 latency spikes under load

Best for:

Prototyping, low-traffic apps (<200 RPS), Python teams


5. Unify

A unified API approach.

Pros

  • Single API for all providers

  • Benchmark-driven routing

  • Good developer experience

Cons

  • Relatively new

  • Limited enterprise features

  • High-scale performance unproven

Best for:

Developers prioritizing simplicity over control


6. Martian

Focused on prompt management and observability.

Pros

  • Strong prompt versioning

  • Good observability features

  • Decent multi-provider support

Cons

  • Smaller user base

  • Limited documentation

  • Pricing unclear at scale

Best for:

Teams prioritizing prompt workflows


Tier 3: Specialized Use Cases

7. OpenRouter

Pay-as-you-go access to many models.

Pros

  • No API key management

  • Instant access to many models

  • Simple pricing

Cons

  • Markup on model costs

  • Less routing control

  • Not ideal for high-volume production

Best for:

Rapid prototyping, model experimentation


8. AI Gateway (Cloudflare)

Part of Cloudflare’s edge platform.

Pros

  • Runs at the edge

  • Built-in caching

  • Familiar Cloudflare dashboard

Cons

  • Locked into Cloudflare ecosystem

  • Limited LLM-specific features

  • Basic routing

Best for:

Teams already heavily using Cloudflare


9. KeyWorthy

Newer entrant focused on cost optimization.

Pros

  • Cost analytics focus

  • Multi-provider routing

  • Usage tracking

Cons

  • Limited production track record

  • Smaller feature set

  • Unknown scaling behavior

Best for:

Cost-conscious teams and early adopters


Tier 4: Niche or Limited

10. Langfuse

More observability than gateway.

Pros

  • Excellent tracing and analytics

  • Open source

  • Strong LangChain integration

Cons

  • Not a true gateway

  • No routing or caching

  • Separate deployment

Best for:

Deep observability alongside another gateway


11. MLflow AI Gateway

Part of the MLflow ecosystem.

Pros

  • Integrates with MLflow workflows

  • Useful if already using MLflow

Cons

  • Limited LLM-specific features

  • Heavy for simple routing

  • Better alternatives exist

Best for:

ML teams deeply invested in MLflow


12. BricksLLM

Basic open-source gateway.

Pros

  • Simple setup

  • Cost tracking

  • Open source

Cons

  • Limited feature set

  • Small community

  • Performance not battle-tested

Best for:

Very basic gateway needs


13. Helicone

Observability-first with light gateway features.

Pros

  • Good logging and monitoring

  • Easy integration

  • Generous free tier

Cons

  • More observability than gateway

  • Limited routing logic

  • Not built for high throughput

Best for:

Observability-first teams


Our Real Production Stack

We run Bifrost in production for our own infrastructure.

Requirements

  • Handle 2,000+ RPS during peaks

  • P99 latency < 500 ms

  • Predictable costs

  • Zero manual intervention

What we tried

  • Direct OpenAI calls → no observability

  • LiteLLM → broke around 300 RPS

  • Portkey → great features, higher cost

  • Bifrost → met all requirements

Current setup

Bifrost (single t3.large)
├─ 3 OpenAI keys (adaptive load balancing)
├─ 2 Anthropic keys (automatic failover)
├─ Semantic caching (40% hit rate)
├─ Maxim observability plugin
└─ Prometheus metrics

Results

  • 2,500 RPS peak, stable

  • P99: 380 ms

  • Cost: ~$60/month infra + LLM usage

  • Uptime: 99.97% (30+ days, no restart)


Decision Framework

Under 100 RPS

  • LiteLLM

  • Helicone (if observability matters)

  • OpenRouter

100–500 RPS

  • Bifrost

  • Portkey

  • LiteLLM (watch performance)

500+ RPS

  • Bifrost

  • Portkey (if budget allows)

  • Kong (enterprise needs)

Specialized Needs

  • Prompt management → Martian

  • Cloudflare stack → AI Gateway

  • MLflow ecosystem → MLflow AI Gateway

  • Observability focus → Langfuse + separate gateway


What Actually Matters

After testing 13 gateways, these matter most:

  1. Performance under your load

    Benchmarks lie. Test real traffic. P99 > P50.

  2. Total cost (not list pricing)

    Infra + LLM usage + engineering time + lock-in.

  3. Observability

    Can you debug failures, latency, and cost?

  4. Reliability

    Failover, rate limits, auto-recovery.

  5. Migration path

    Can you leave later? Can you self-host?


Our Recommendations

  • Most teams starting out: LiteLLM → migrate later

  • High-growth startups: Bifrost or Portkey from day one

  • Enterprises: Portkey or Kong

  • Cost-sensitive teams: Bifrost + good monitoring


Try Bifrost

It’s open source (MIT), so you can verify everything:

git clone https://github.com/maximhq/bifrost cd bifrost
docker compose up

Run benchmarks yourself:

cd benchmarks
./benchmark -provider bifrost -rate 500 -duration 60

Compare with your current setup.


The Honest Truth

There’s no perfect LLM gateway:

  • LiteLLM: Easy, but doesn’t scale well

  • Portkey: Feature-rich, expensive at scale

  • Bifrost: Fast, smaller ecosystem

  • Kong: Enterprise-grade, complex

Pick based on where you are now, not where you might be.

We went through three gateways before building our own.

Most teams won’t need to.


Links

We’re the team at Maxim AI, building evaluation and observability tools for production AI systems.

Bifrost is our open-source LLM gateway, alongside our testing and monitoring platforms.

Top comments (0)