Debby McKinney

Posted on Mar 28

Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack

#ai #python #opensource #programming

On March 24, 2026, two backdoored versions of LiteLLM (1.82.7 and 1.82.8) were published to PyPI using stolen maintainer credentials. The malware stole SSH keys, AWS/GCP/Azure credentials, and Kubernetes secrets. It deployed persistent backdoors through .pth files. DSPy, MLflow, CrewAI, and OpenHands all pulled the compromised versions as a downstream dependency.

If you're running LiteLLM in production right now, this post is for you.

TL;DR - Five Alternatives Worth Evaluating

Bifrost (Go, open-source) - Compiled binary, zero Python supply chain surface
TensorZero (Rust) - Sub-millisecond overhead, compiled, inference-focused
Cloudflare AI Gateway - Managed service, no self-hosting
Kong AI Gateway - Enterprise API gateway with AI routing plugin
Direct Provider SDKs - Sometimes you don't need a gateway at all

What Actually Happened

Snyk's detailed writeup covers the full timeline, but here's the short version:

An attacker used stolen PyPI credentials to publish two malicious versions of LiteLLM. The backdoor harvested environment variables, cloud credentials, SSH keys, and K8s secrets from any machine that installed or updated the package. On top of that, 73 compromised GitHub accounts were used to spam the disclosure issue with noise, and the attackers used stolen maintainer credentials to close the issue entirely.

LiteLLM gets over 3.4 million daily downloads. That's a massive blast radius.

What to Look for in an Alternative

Before jumping into the list, here's what you should be evaluating:

Supply chain risk. Is the tool written in a language with a dependency ecosystem prone to these attacks? Python's PyPI has seen a pattern of supply chain compromises. Compiled languages like Go and Rust produce a single binary with no runtime dependency resolution.

Overhead. If you're routing every LLM call through a proxy, that proxy's latency matters. Anything above 1-2ms per request starts adding up at scale.

Open source. Can you audit the code? Can you self-host? Can you vendor the binary?

Multi-provider support. The whole point of a gateway is routing across OpenAI, Anthropic, Bedrock, Azure, and others through a single API.

The 5 Alternatives

1. Bifrost (Go, Open-Source)

This is the one I'd look at first if supply chain security is your primary concern.

Bifrost is written in Go. You get a compiled binary. There's no Python runtime, no pip install, no transitive dependency tree that could get poisoned overnight. You run it via npx -y @maximhq/bifrost or pull the Docker image and you're live in 30 seconds.

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.

Quick Start

Go from zero to production-ready AI gateway in under a minute.

Step 1: Start Bifrost Gateway

# Install and run locally
npx -y @maximhq/bifrost

# Or use Docker
docker run -p 8080:8080 maximhq/bifrost

Step 2: Configure via Web UI

# Open the built-in web interface
open http://localhost:8080

Step 3: Make your first API call

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Hello, Bifrost!"}]
  }'

That's it! Your AI gateway is running with a web interface for visual configuration…

View on GitHub

Here's what this means in practice: your LLM gateway has zero exposure to PyPI supply chain attacks. Period.

Performance numbers: 11µs routing overhead per request, benchmarked at 5,000 RPS. That's roughly 50x faster than Python-based proxies.

What you get out of the box:

Zero-config setup. NPX or Docker, no config files needed. A web UI at localhost:8080 handles provider setup, API keys, and monitoring visually.
OpenAI-compatible API. Every request follows OpenAI's format. Point your existing code at Bifrost and swap openai/gpt-4o-mini for anthropic/claude-sonnet-4-20250514. Your integration code doesn't change.
Four-tier budget hierarchy. Set spending limits at the organization, team, project, and individual key level. Useful when multiple teams share one gateway.
Semantic caching via Weaviate. Cache similar queries so repeated or near-duplicate prompts don't burn tokens.
MCP support. If you're building agents with Model Context Protocol, Bifrost handles it natively.

Pros:

Compiled Go binary, immune to Python/PyPI supply chain attacks
11µs overhead, 5,000 RPS throughput
Web UI for visual configuration and monitoring
Zero-config start (NPX or Docker)
Open-source, fully auditable

Cons:

Younger project compared to Kong or Cloudflare
Smaller community (though growing fast)

GitHub | Docs | Website

2. TensorZero (Rust)

TensorZero takes the same "compiled language, no Python runtime" approach but from the Rust side. Sub-millisecond overhead per request. The focus is more on inference optimization and structured generation than on being a full-featured gateway.

If you're specifically looking for inference-time optimizations (prompt caching strategies, structured output enforcement) and want the supply chain benefits of a compiled binary, TensorZero is worth evaluating.

Pros:

Rust binary, same supply chain safety as Go
Sub-millisecond latency overhead
Strong inference optimization features

Cons:

Narrower scope than a full API gateway
Smaller ecosystem and fewer integrations
Less focus on multi-team governance features

3. Cloudflare AI Gateway

If you don't want to self-host anything, Cloudflare AI Gateway is the path of least resistance. It runs on Cloudflare's edge network. You get caching, rate limiting, and logging without deploying a single container.

The supply chain question is different here. You're not installing a package. You're calling a managed service. The attack surface shifts from "can someone poison my dependency" to "do I trust Cloudflare's infrastructure." For most teams, that's a trade they're comfortable making.

Pros:

Zero operational overhead, fully managed
Global edge network, low latency worldwide
No self-hosting, no packages to install
Built-in analytics and logging

Cons:

Not open-source, you can't audit the routing logic
Vendor lock-in to Cloudflare's ecosystem
Limited customization compared to self-hosted options
Pricing can scale unpredictably at high volume

4. Kong AI Gateway

Kong has been doing API gateway management for years. Their AI Gateway plugin adds LLM routing, rate limiting, and auth on top of an already battle-tested platform.

If your team already runs Kong for other API traffic, adding AI routing as a plugin makes sense. You get one gateway for everything. But if you're starting from scratch, Kong's operational complexity is significant. This is an enterprise tool with enterprise setup requirements.

Pros:

Mature, battle-tested API management platform
Rich plugin ecosystem for auth, rate limiting, transforms
Strong enterprise support and documentation
Single gateway for AI and non-AI API traffic

Cons:

Heavy operational footprint for a team that just needs LLM routing
Configuration complexity is high for simple use cases
Enterprise pricing can be steep
Overkill if LLM routing is your only need

5. Direct Provider SDKs

Sometimes the honest answer is: you don't need a gateway.

If you're calling one provider (say, OpenAI) from one service, adding a proxy in the middle adds latency, complexity, and another thing to monitor. Just use the provider's SDK directly.

You lose multi-provider routing, unified logging, and budget controls. But you gain simplicity and eliminate an entire infrastructure component.

Pros:

Zero added latency or infrastructure
No additional attack surface
Simplest possible architecture
Direct access to provider-specific features

Cons:

No multi-provider failover
No unified logging or cost tracking
Provider switching requires code changes
No centralized rate limiting or budget controls

Comparison Table

Feature	Bifrost	TensorZero	Cloudflare AI GW	Kong AI GW	Direct SDKs
Language	Go	Rust	Managed	Lua/Go	N/A
Supply chain risk	None (binary)	None (binary)	None (managed)	Low	Depends on SDK
Latency overhead	11µs	Sub-ms	Edge-dependent	Plugin-dependent	None
Self-hostable	Yes	Yes	No	Yes	N/A
Open-source	Yes	Yes	No	Partial	N/A
Web UI	Yes	No	Yes	Yes	No
Multi-provider	Yes	Yes	Yes	Yes	No
Budget controls	Yes (4-tier)	No	Basic	Via plugins	No
Setup time	~30 seconds	Minutes	Minutes	Hours	N/A

What You Should Do This Week

Today: Check if you're running LiteLLM versions 1.82.7 or 1.82.8. If yes, rotate every credential that machine had access to. SSH keys, cloud IAM keys, K8s secrets, database passwords. All of it.

This week: Audit your LLM infrastructure dependency tree. Run pip list and check what's pulling in what. Transitive dependencies are where supply chain attacks hide.

This month: Evaluate whether your LLM gateway needs to be a Python package at all. If you're routing production traffic, a compiled binary like Bifrost removes an entire category of risk. It takes 30 seconds to set up and test.

The LiteLLM incident wasn't a theoretical risk. It happened, it hit production systems, and the attackers actively suppressed disclosure. Your LLM gateway sits between your application and every API key you own. Pick one you can trust.

Links:

DEV Community

Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack

TL;DR - Five Alternatives Worth Evaluating

What Actually Happened

What to Look for in an Alternative

The 5 Alternatives

1. Bifrost (Go, Open-Source)

maximhq / bifrost

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

Bifrost AI Gateway

The fastest way to build AI applications that never go down

Quick Start

2. TensorZero (Rust)

3. Cloudflare AI Gateway

4. Kong AI Gateway

5. Direct Provider SDKs

Comparison Table

What You Should Do This Week

Top comments (0)