Pranay Batta

Posted on Mar 28

LiteLLM vs Bifrost: Why the Supply Chain Attack Changes Everything for LLM Gateways

#ai #javascript #python #devops

If you're running LiteLLM in production, the March 2026 supply chain attack probably got your attention. Mine too. I spent the past few days digging into what happened, why it happened, and what it means for anyone choosing an LLM gateway in 2026.

This is not a hit piece. LiteLLM is a solid project with massive adoption. But this incident exposed something structural that every engineering team needs to think about. And it happens to make the case for Bifrost, a Go-based alternative, in ways that go beyond the usual performance benchmarks.

Let's break it all down.

TL;DR

Two backdoored versions of LiteLLM (1.82.7, 1.82.8) were published to PyPI on March 24, 2026, via stolen credentials.
The malware stole SSH keys, AWS/GCP/Azure credentials, and Kubernetes secrets. It used Python's .pth persistence mechanism to survive across interpreter restarts.
DSPy, MLflow, CrewAI, OpenHands, and Arize Phoenix all pulled the compromised version.
Bifrost is a Go-based LLM gateway that compiles to a single binary. The attack vector that hit LiteLLM simply does not exist in its architecture.
Beyond security, Bifrost adds 11 microseconds of overhead per request vs LiteLLM's roughly 8ms, supports 20+ providers, offers semantic caching via Weaviate, and has a four-tier budget hierarchy.

What Happened: The Full Attack Chain

Here's the sequence of events, based on Snyk's detailed investigation.

Step 1: The Trivy GitHub Action was compromised. A group called TeamPCP tampered with the widely-used Trivy security scanner GitHub Action.

Step 2: LiteLLM's CI/CD pipeline pulled the compromised Trivy Action. Because LiteLLM's workflow used an unpinned version of the Trivy GitHub Action (not pinned to a specific SHA), the compromised version ran inside LiteLLM's CI environment.

Step 3: The malicious Trivy Action exfiltrated LiteLLM's PYPI_PUBLISH token. With that token, the attackers could publish any package version to PyPI under LiteLLM's name.

Step 4: Two backdoored versions (1.82.7, 1.82.8) were published to PyPI. These looked like normal LiteLLM updates. Anyone running pip install --upgrade litellm got them.

Step 5: The malware deployed a .pth persistence file. This is the part that needs explaining.

What are .pth files?

If you're not deep into Python internals, .pth files might be new to you. They live in Python's site-packages directory and get executed automatically every time the Python interpreter starts up. Not when you import a specific package. Every single time Python runs. Anything.

The attackers placed a .pth file that loaded their malware on every Python interpreter startup. It did not matter whether your code imported litellm or not. If the package was installed in the environment, the malware was active.

What the malware stole:

SSH private keys
AWS, GCP, and Azure credentials
Kubernetes secrets
Crypto wallet keys

Step 6: The attackers used 73 compromised GitHub accounts to spam the disclosure issue with noise and eventually closed it using stolen maintainer credentials, trying to suppress the report.

The backdoored versions were live on PyPI for approximately 3 hours. LiteLLM has 3.4 million+ daily downloads. You can do the math on the blast radius.

Why Architecture Matters More Than You Think

Let's talk about why this specific attack cannot happen to Bifrost.

It is not just "Bifrost is written in Go, so it's safe." That would be a lazy argument. The actual reasons are architectural, and they matter.

No site-packages directory

Python packages install into site-packages. That directory is a shared space where any installed package can drop files, including .pth files that execute on interpreter startup. This is the mechanism the LiteLLM attackers exploited.

Go compiles to a single static binary. There is no site-packages equivalent. There is no shared directory where a compromised dependency could drop a persistence mechanism. The binary is the binary.

No .pth hook mechanism

Python's .pth file execution is a feature, not a bug. It exists for legitimate reasons (configuring import paths, running initialization code). But it also means any package you install can run arbitrary code on every Python startup without your knowledge or consent.

Go has no equivalent mechanism. When you compile a Go binary, what goes in is what comes out. There is no startup hook that third-party code can inject into after compilation.

No transitive pip dependency chain

LiteLLM has a substantial dependency tree. Each of those dependencies has its own dependencies. Each one is a potential attack surface. When you pip install litellm, you're trusting not just the LiteLLM maintainers but every maintainer of every transitive dependency.

Bifrost ships as a compiled binary via npx -y @maximhq/bifrost or Docker (docker pull maximhq/bifrost). Dependencies are resolved and compiled at build time by the Bifrost team. You're running a single binary, not managing a dependency tree.

The CI/CD surface area is smaller

The LiteLLM attack started with a compromised GitHub Action in CI/CD. Go binaries distributed via npm or Docker reduce the CI/CD surface area because the compilation and dependency resolution happen upstream, not in your pipeline.

This is not about Go being "more secure" than Python as a language. It's about the deployment model. A compiled binary distributed as a single artifact has a fundamentally smaller attack surface than a package installed via a package manager with a transitive dependency tree and runtime hook mechanisms.

Side-by-Side Feature Comparison

Here's an honest look at both gateways.

Feature	LiteLLM	Bifrost
Language	Python	Go
Deployment	`pip install`, Docker	`npx`, Docker, Go binary
Provider support	100+ providers	20+ providers (OpenAI, Anthropic, Bedrock, Azure, Gemini, Vertex AI, Groq, Mistral, Cohere, xAI, and more) + custom providers
Overhead per request	~8ms	11 microseconds
Throughput	Varies (Python GIL limits)	5,000 RPS sustained
Caching	Redis-based key-value	Weaviate-powered dual-layer semantic caching
Budget management	Basic spend tracking	Four-tier hierarchy (Customer > Team > Virtual Key > Provider Config)
MCP support	Limited	Full MCP gateway with four connection types, sub-3ms latency
Web UI	Dashboard available	Built-in Web UI for visual setup, monitoring, and governance
OpenAI compatibility	Yes	Yes (drop-in replacement, single URL change)
Supply chain surface	PyPI + transitive deps + .pth hooks	Single compiled binary
Configuration	Config files, environment variables	Zero-config start, Web UI, API, or config.json

Let me be upfront: LiteLLM's provider count is significantly higher. If you need access to 100+ providers through a single gateway, that is a real advantage. Bifrost supports 20+ providers natively with the ability to add custom providers, which covers most production use cases, but it is not the same breadth.

Performance Deep Dive: What the Numbers Actually Mean

You'll see "11 microseconds vs 8 milliseconds" in Bifrost's benchmarks. That's roughly a 50x difference. But what does it mean in practice?

Let's do the math at different scales.

At 10,000 requests per day:

LiteLLM overhead: 10,000 x 8ms = 80 seconds of cumulative gateway latency
Bifrost overhead: 10,000 x 11 microseconds = 0.11 seconds

At 100,000 requests per day:

LiteLLM overhead: 100,000 x 8ms = 800 seconds (~13.3 minutes)
Bifrost overhead: 100,000 x 11 microseconds = 1.1 seconds

At 1,000,000 requests per day:

LiteLLM overhead: 1,000,000 x 8ms = 8,000 seconds (~2.2 hours)
Bifrost overhead: 1,000,000 x 11 microseconds = 11 seconds

At low volume, the difference doesn't matter much. Your LLM provider's response time (hundreds of milliseconds to seconds) dwarfs the gateway overhead either way.

But at scale, the difference becomes real. 13 minutes of cumulative latency at 100K requests/day isn't catastrophic, but it adds up across your user base. And 2.2 hours at a million requests/day starts affecting tail latencies and user experience, especially for streaming responses where gateway overhead is felt on every chunk.

The 5,000 RPS sustained throughput from Bifrost also matters. Python's GIL (Global Interpreter Lock) creates a concurrency ceiling that Go simply doesn't have. If you're running high-concurrency workloads, this is a material difference.

Here's What This Means for Your Stack

If you're evaluating LLM gateways right now, the LiteLLM incident should change your evaluation criteria. Not because LiteLLM is bad software, but because it highlighted a category of risk that most teams weren't thinking about.

Questions to ask about any LLM gateway:

What's the dependency footprint? How many transitive dependencies does it pull in? Each one is a potential attack surface.
What's the deployment model? Is it a package you install into your environment, or a standalone binary/container?
Does it have runtime hook mechanisms? Can dependencies execute code at startup without explicit imports?
How is it distributed? Via a package manager with mutable versions, or via immutable artifacts?
What's in the CI/CD chain? Are GitHub Actions pinned by SHA? Are publish tokens scoped and rotated?

These aren't questions most teams were asking about their LLM gateway a month ago. They should be now.

When LiteLLM Still Makes Sense

I want to be honest about this. There are real scenarios where LiteLLM is the better choice.

You need access to 100+ providers. LiteLLM's provider breadth is unmatched. If you're working with niche or specialized providers that Bifrost doesn't support yet, LiteLLM gets you there faster.
Your entire stack is Python and you want deep integration. LiteLLM plays well with the Python ML ecosystem. If you're already in that world and need tight integration with LangChain, LlamaIndex, or similar frameworks, LiteLLM fits naturally.
You need it as a library, not a gateway. LiteLLM can be imported and used as a Python library within your application code. Bifrost is a standalone gateway service.

If any of these are your primary requirement, LiteLLM may still be right for you. Just audit your versions, pin your dependencies, and check for .pth files in your site-packages directory.

When Bifrost Is the Better Choice

Bifrost wins when your priorities look like this:

Security surface area matters to you. If you're in a regulated industry, handle sensitive data, or simply don't want to worry about Python supply chain attacks in your infrastructure layer, a compiled Go binary is a different risk profile entirely.
Performance at scale. If you're pushing high request volumes and need minimal gateway overhead, 11 microseconds vs 8 milliseconds is not a rounding error.
You want governance out of the box. Bifrost's four-tier budget hierarchy (Customer > Team > Virtual Key > Provider Config) with independent budget checking at each level gives you cost control that's built into the gateway, not bolted on.
Semantic caching. Bifrost's Weaviate-powered dual-layer caching understands the meaning of requests, not just exact matches. Similar queries hit the cache even if they're worded differently.
MCP gateway support. If you're building agentic applications, Bifrost has native MCP support with four connection types and sub-3ms tool execution latency.
Zero-config setup. Run npx -y @maximhq/bifrost and you have a working gateway with a Web UI at localhost:8080. No config files, no environment variables, no setup ceremony.

The Bigger Question

Should your LLM gateway be a Python package at all?

This isn't Python-bashing. Python is great for ML research, data science, prototyping, and application-level code. But your LLM gateway sits in the critical path of every AI request your application makes. It's infrastructure.

Infrastructure components have different requirements than application code. They need to be fast, stable, have minimal dependencies, and present the smallest possible attack surface. This is why web servers, databases, load balancers, and message queues are almost never written in Python. They're written in C, C++, Go, or Rust.

The LiteLLM incident didn't happen because of a bug in LiteLLM's code. It happened because of a structural property of the Python packaging ecosystem. That's a different kind of risk, and it's one that applies to any Python package in your infrastructure layer.

Action Items

If you're currently using LiteLLM:

Check your installed version immediately. Versions 1.82.7 and 1.82.8 are compromised.
Search for .pth files in your Python site-packages directories.
Rotate all credentials that were accessible from environments where LiteLLM was installed (SSH keys, cloud provider credentials, Kubernetes secrets).
Pin your GitHub Actions by SHA, not by tag.
Evaluate whether a compiled gateway is a better fit for your security posture.

If you're evaluating LLM gateways:

Try Bifrost: npx -y @maximhq/bifrost (takes 30 seconds)
Check out the GitHub repo to see the codebase
Read the docs for the full feature set
Visit the website for architecture details

The LLM gateway space is going to look different after this incident. Supply chain security just became an evaluation criterion, and compiled gateways have a structural advantage that no amount of Python dependency scanning can match.

DEV Community