On March 24, 2026, two backdoored versions of LiteLLM (1.82.7 and 1.82.8) were published to PyPI using stolen maintainer credentials. The malware stole SSH keys, AWS/GCP/Azure credentials, and Kubernetes secrets. It deployed persistent backdoors through .pth files. DSPy, MLflow, CrewAI, and OpenHands all pulled the compromised versions as a downstream dependency.
If you're running LiteLLM in production right now, this post is for you.
TL;DR - Five Alternatives Worth Evaluating
- Bifrost (Go, open-source) - Compiled binary, zero Python supply chain surface
- TensorZero (Rust) - Sub-millisecond overhead, compiled, inference-focused
- Cloudflare AI Gateway - Managed service, no self-hosting
- Kong AI Gateway - Enterprise API gateway with AI routing plugin
- Direct Provider SDKs - Sometimes you don't need a gateway at all
What Actually Happened
Snyk's detailed writeup covers the full timeline, but here's the short version:
An attacker used stolen PyPI credentials to publish two malicious versions of LiteLLM. The backdoor harvested environment variables, cloud credentials, SSH keys, and K8s secrets from any machine that installed or updated the package. On top of that, 73 compromised GitHub accounts were used to spam the disclosure issue with noise, and the attackers used stolen maintainer credentials to close the issue entirely.
LiteLLM gets over 3.4 million daily downloads. That's a massive blast radius.
What to Look for in an Alternative
Before jumping into the list, here's what you should be evaluating:
Supply chain risk. Is the tool written in a language with a dependency ecosystem prone to these attacks? Python's PyPI has seen a pattern of supply chain compromises. Compiled languages like Go and Rust produce a single binary with no runtime dependency resolution.
Overhead. If you're routing every LLM call through a proxy, that proxy's latency matters. Anything above 1-2ms per request starts adding up at scale.
Open source. Can you audit the code? Can you self-host? Can you vendor the binary?
Multi-provider support. The whole point of a gateway is routing across OpenAI, Anthropic, Bedrock, Azure, and others through a single API.
The 5 Alternatives
1. Bifrost (Go, Open-Source)
This is the one I'd look at first if supply chain security is your primary concern.
Bifrost is written in Go. You get a compiled binary. There's no Python runtime, no pip install, no transitive dependency tree that could get poisoned overnight. You run it via npx -y @maximhq/bifrost or pull the Docker image and you're live in 30 seconds.
maximhq
/
bifrost
Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.
Bifrost AI Gateway
The fastest way to build AI applications that never go down
Bifrost is a high-performance AI gateway that unifies access to 15+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, and more) through a single OpenAI-compatible API. Deploy in seconds with zero configuration and get automatic failover, load balancing, semantic caching, and enterprise-grade features.
Quick Start
Go from zero to production-ready AI gateway in under a minute.
Step 1: Start Bifrost Gateway
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrost
Step 2: Configure via Web UI
# Open the built-in web interface
open http://localhost:8080
Step 3: Make your first API call
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'
That's it! Your AI gateway is running with a web interface for visual configuration…
Here's what this means in practice: your LLM gateway has zero exposure to PyPI supply chain attacks. Period.
Performance numbers: 11µs routing overhead per request, benchmarked at 5,000 RPS. That's roughly 50x faster than Python-based proxies.
What you get out of the box:
- Zero-config setup. NPX or Docker, no config files needed. A web UI at localhost:8080 handles provider setup, API keys, and monitoring visually.
-
OpenAI-compatible API. Every request follows OpenAI's format. Point your existing code at Bifrost and swap
openai/gpt-4o-miniforanthropic/claude-sonnet-4-20250514. Your integration code doesn't change. - Four-tier budget hierarchy. Set spending limits at the organization, team, project, and individual key level. Useful when multiple teams share one gateway.
- Semantic caching via Weaviate. Cache similar queries so repeated or near-duplicate prompts don't burn tokens.
- MCP support. If you're building agents with Model Context Protocol, Bifrost handles it natively.
Pros:
- Compiled Go binary, immune to Python/PyPI supply chain attacks
- 11µs overhead, 5,000 RPS throughput
- Web UI for visual configuration and monitoring
- Zero-config start (NPX or Docker)
- Open-source, fully auditable
Cons:
- Younger project compared to Kong or Cloudflare
- Smaller community (though growing fast)
2. TensorZero (Rust)
TensorZero takes the same "compiled language, no Python runtime" approach but from the Rust side. Sub-millisecond overhead per request. The focus is more on inference optimization and structured generation than on being a full-featured gateway.
If you're specifically looking for inference-time optimizations (prompt caching strategies, structured output enforcement) and want the supply chain benefits of a compiled binary, TensorZero is worth evaluating.
Pros:
- Rust binary, same supply chain safety as Go
- Sub-millisecond latency overhead
- Strong inference optimization features
Cons:
- Narrower scope than a full API gateway
- Smaller ecosystem and fewer integrations
- Less focus on multi-team governance features
3. Cloudflare AI Gateway
If you don't want to self-host anything, Cloudflare AI Gateway is the path of least resistance. It runs on Cloudflare's edge network. You get caching, rate limiting, and logging without deploying a single container.
The supply chain question is different here. You're not installing a package. You're calling a managed service. The attack surface shifts from "can someone poison my dependency" to "do I trust Cloudflare's infrastructure." For most teams, that's a trade they're comfortable making.
Pros:
- Zero operational overhead, fully managed
- Global edge network, low latency worldwide
- No self-hosting, no packages to install
- Built-in analytics and logging
Cons:
- Not open-source, you can't audit the routing logic
- Vendor lock-in to Cloudflare's ecosystem
- Limited customization compared to self-hosted options
- Pricing can scale unpredictably at high volume
4. Kong AI Gateway
Kong has been doing API gateway management for years. Their AI Gateway plugin adds LLM routing, rate limiting, and auth on top of an already battle-tested platform.
If your team already runs Kong for other API traffic, adding AI routing as a plugin makes sense. You get one gateway for everything. But if you're starting from scratch, Kong's operational complexity is significant. This is an enterprise tool with enterprise setup requirements.
Pros:
- Mature, battle-tested API management platform
- Rich plugin ecosystem for auth, rate limiting, transforms
- Strong enterprise support and documentation
- Single gateway for AI and non-AI API traffic
Cons:
- Heavy operational footprint for a team that just needs LLM routing
- Configuration complexity is high for simple use cases
- Enterprise pricing can be steep
- Overkill if LLM routing is your only need
5. Direct Provider SDKs
Sometimes the honest answer is: you don't need a gateway.
If you're calling one provider (say, OpenAI) from one service, adding a proxy in the middle adds latency, complexity, and another thing to monitor. Just use the provider's SDK directly.
You lose multi-provider routing, unified logging, and budget controls. But you gain simplicity and eliminate an entire infrastructure component.
Pros:
- Zero added latency or infrastructure
- No additional attack surface
- Simplest possible architecture
- Direct access to provider-specific features
Cons:
- No multi-provider failover
- No unified logging or cost tracking
- Provider switching requires code changes
- No centralized rate limiting or budget controls
Comparison Table
| Feature | Bifrost | TensorZero | Cloudflare AI GW | Kong AI GW | Direct SDKs |
|---|---|---|---|---|---|
| Language | Go | Rust | Managed | Lua/Go | N/A |
| Supply chain risk | None (binary) | None (binary) | None (managed) | Low | Depends on SDK |
| Latency overhead | 11µs | Sub-ms | Edge-dependent | Plugin-dependent | None |
| Self-hostable | Yes | Yes | No | Yes | N/A |
| Open-source | Yes | Yes | No | Partial | N/A |
| Web UI | Yes | No | Yes | Yes | No |
| Multi-provider | Yes | Yes | Yes | Yes | No |
| Budget controls | Yes (4-tier) | No | Basic | Via plugins | No |
| Setup time | ~30 seconds | Minutes | Minutes | Hours | N/A |
What You Should Do This Week
Today: Check if you're running LiteLLM versions 1.82.7 or 1.82.8. If yes, rotate every credential that machine had access to. SSH keys, cloud IAM keys, K8s secrets, database passwords. All of it.
This week: Audit your LLM infrastructure dependency tree. Run pip list and check what's pulling in what. Transitive dependencies are where supply chain attacks hide.
This month: Evaluate whether your LLM gateway needs to be a Python package at all. If you're routing production traffic, a compiled binary like Bifrost removes an entire category of risk. It takes 30 seconds to set up and test.
The LiteLLM incident wasn't a theoretical risk. It happened, it hit production systems, and the attackers actively suppressed disclosure. Your LLM gateway sits between your application and every API key you own. Pick one you can trust.
Links:

Top comments (0)