DEV Community

Cover image for We built an LLM gateway 50x faster than LiteLLM (and it's open source)
Kuldeep Paul
Kuldeep Paul

Posted on

We built an LLM gateway 50x faster than LiteLLM (and it's open source)

The Problem

If you've ever built production AI applications, you know the pain: you need a gateway to route requests across multiple LLM providers, handle failover, manage rate limits, and get observability. Most teams reach for LiteLLM.

But here's what nobody talks about: LiteLLM becomes a bottleneck at scale.

We hit this wall hard. At 500 RPS, LiteLLM's P99 latency spiked to multiple seconds. In some cases, requests took 4 minutes. For a component that's supposed to be invisible infrastructure, that's unacceptable.

So we built Bifrost.

The Results

After months of engineering, here's what we shipped:

  • 50x faster than LiteLLM (P99 latency)
  • 11μs overhead at 5,000 RPS
  • 68% less memory consumption
  • 100% open source (MIT license)

Not marketing numbers. Real benchmarks on identical hardware, published on GitHub.

Why Go?

We wrote Bifrost in Go because latency matters. Every microsecond your gateway adds is latency your users feel. Python's async overhead and GIL limitations make it fundamentally unsuitable for high-throughput proxy workloads.

Go gives us:

  • True concurrency with goroutines
  • Predictable memory management
  • Native HTTP/2 support
  • Battle-tested standard library

Architecture Highlights

1. Asynchronous everything
Logging, metrics, plugin execution - all non-blocking. Your LLM requests never wait.

2. Plugin system
Extend Bifrost without forking. Pre-hooks and post-hooks for custom logic:

go

func PreHook(ctx context.Context, req *schemas.CompletionRequest) error {
*// Custom auth, rate limiting, request modification*
return nil
}

3. Built-in observability
Every request traced automatically. Latency, tokens, costs - captured with zero performance impact.

4. Adaptive load balancing
Automatically adjusts traffic based on API key performance. Degraded keys get less traffic, healthy ones get more.

What Makes It Production-Ready

  • In-VPC deployment: Run entirely in your private cloud
  • Enterprise guardrails: Integrate AWS Bedrock, Azure Content Safety
  • Real-time monitoring: Prometheus metrics out of the box
  • Audit logs: SOC 2 / GDPR compliant logging
  • MCP support: Model Context Protocol for tool calling

Open Source from Day One

We didn't build Bifrost to sell licenses. We built it because we needed it, and we're open-sourcing it because every AI team hits this problem.

⭐ Star Bifrost on GitHub

The repo includes:

  • Complete source code
  • Benchmark suite (run it yourself)
  • Docker compose setup
  • Production deployment guides
  • Comparison tests vs LiteLLM

Getting Started (60 seconds)

bash

git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up

Add your API keys via the UI at localhost:8080, and you're routing requests.

When to Use Bifrost

Use Bifrost if you:

  • Route requests across multiple LLM providers
  • Need <50ms P99 latency for your gateway
  • Run high-throughput production workloads (1k+ RPS)
  • Want observability without performance cost
  • Need enterprise compliance (SOC 2, HIPAA, GDPR)

Stick with direct API calls if you:

  • Have a simple app with one provider
  • Process <100 requests/day
  • Don't need failover or load balancing

The Benchmark Details

We published full benchmarks you can reproduce:

MetricBifrostLiteLLMP99 Latency (500 RPS)520ms28,000msMemory (5k RPS)1.4GB4.3GBOverhead11μs~600μsMax RPS (stable)5,000+<1,000

At 500 RPS on identical t3.xlarge instances, LiteLLM breaks. Bifrost keeps running.

Contributing

Bifrost is MIT licensed and we welcome contributions:

  • Plugin development: Share your custom plugins
  • Provider integrations: Add new LLM providers
  • Performance improvements: Help us go faster
  • Documentation: Improve guides and examples

Check out the repo →

What's Next

We're actively working on:

  • Streaming optimization: Even lower latency for streaming responses
  • More guardrails: Additional safety provider integrations
  • Enhanced caching: Semantic caching for cost reduction
  • Community plugins: Plugin marketplace

Try It Today

The fastest way to see the difference:

bash

`# Clone and run
git clone https://github.com/maximhq/bifrost
cd bifrost && docker compose up

# Visit localhost:8080# Add API keys# Start routing`

If Bifrost saves you time (or money on LLM gateway costs), give us a star on GitHub. It helps other teams discover the project.

⭐ Star Bifrost | Read the docs | Join Discord

Top comments (0)