Kuldeep Paul

Posted on Dec 9, 2025

We built an LLM gateway 50x faster than LiteLLM (and it's open source)

#ai #llm #opensource #go

The Problem

If you've ever built production AI applications, you know the pain: you need a gateway to route requests across multiple LLM providers, handle failover, manage rate limits, and get observability. Most teams reach for LiteLLM.

But here's what nobody talks about: LiteLLM becomes a bottleneck at scale.

We hit this wall hard. At 500 RPS, LiteLLM's P99 latency spiked to multiple seconds. In some cases, requests took 4 minutes. For a component that's supposed to be invisible infrastructure, that's unacceptable.

So we built Bifrost.

The Results

After months of engineering, here's what we shipped:

50x faster than LiteLLM (P99 latency)
11μs overhead at 5,000 RPS
68% less memory consumption
100% open source (MIT license)

Not marketing numbers. Real benchmarks on identical hardware, published on GitHub.

Why Go?

We wrote Bifrost in Go because latency matters. Every microsecond your gateway adds is latency your users feel. Python's async overhead and GIL limitations make it fundamentally unsuitable for high-throughput proxy workloads.

Go gives us:

True concurrency with goroutines
Predictable memory management
Native HTTP/2 support
Battle-tested standard library

Architecture Highlights

1. Asynchronous everything
Logging, metrics, plugin execution - all non-blocking. Your LLM requests never wait.

2. Plugin system
Extend Bifrost without forking. Pre-hooks and post-hooks for custom logic:

func PreHook(ctx context.Context, req *schemas.CompletionRequest) error { *// Custom auth, rate limiting, request modification* return nil }

3. Built-in observability
Every request traced automatically. Latency, tokens, costs - captured with zero performance impact.

4. Adaptive load balancing
Automatically adjusts traffic based on API key performance. Degraded keys get less traffic, healthy ones get more.

What Makes It Production-Ready

In-VPC deployment: Run entirely in your private cloud
Enterprise guardrails: Integrate AWS Bedrock, Azure Content Safety
Real-time monitoring: Prometheus metrics out of the box
Audit logs: SOC 2 / GDPR compliant logging
MCP support: Model Context Protocol for tool calling

Open Source from Day One

We didn't build Bifrost to sell licenses. We built it because we needed it, and we're open-sourcing it because every AI team hits this problem.

⭐ Star Bifrost on GitHub

The repo includes:

Complete source code
Benchmark suite (run it yourself)
Docker compose setup
Production deployment guides
Comparison tests vs LiteLLM

Getting Started (60 seconds)

bash

git clone https://github.com/maximhq/bifrost cd bifrost docker compose up

Add your API keys via the UI at localhost:8080, and you're routing requests.

When to Use Bifrost

Use Bifrost if you:

Route requests across multiple LLM providers
Need <50ms P99 latency for your gateway
Run high-throughput production workloads (1k+ RPS)
Want observability without performance cost
Need enterprise compliance (SOC 2, HIPAA, GDPR)

Stick with direct API calls if you:

Have a simple app with one provider
Process <100 requests/day
Don't need failover or load balancing

The Benchmark Details

We published full benchmarks you can reproduce:

MetricBifrostLiteLLMP99 Latency (500 RPS)520ms28,000msMemory (5k RPS)1.4GB4.3GBOverhead11μs~600μsMax RPS (stable)5,000+<1,000

At 500 RPS on identical t3.xlarge instances, LiteLLM breaks. Bifrost keeps running.

Contributing

Bifrost is MIT licensed and we welcome contributions:

Plugin development: Share your custom plugins
Provider integrations: Add new LLM providers
Performance improvements: Help us go faster
Documentation: Improve guides and examples

Check out the repo →

What's Next

We're actively working on:

Streaming optimization: Even lower latency for streaming responses
More guardrails: Additional safety provider integrations
Enhanced caching: Semantic caching for cost reduction
Community plugins: Plugin marketplace

Try It Today

The fastest way to see the difference:

bash

`# Clone and run
git clone https://github.com/maximhq/bifrost
cd bifrost && docker compose up

# Visit localhost:8080# Add API keys# Start routing`

If Bifrost saves you time (or money on LLM gateway costs), give us a star on GitHub. It helps other teams discover the project.

⭐ Star Bifrost | Read the docs | Join Discord

DEV Community

We built an LLM gateway 50x faster than LiteLLM (and it's open source)

Top comments (0)