5 Real Issues With LiteLLM That Are Pushing Teams Away in 2026

#ai #python #devops #security

The LiteLLM supply chain attack on March 24, 2026 was the trigger for this post, but not the only reason I wrote it.

Two backdoored versions (1.82.7 and 1.82.8) were published to PyPI using stolen credentials. The malware stole SSH keys, cloud credentials, and K8s secrets. DSPy, MLflow, CrewAI, and OpenHands all pulled the compromised package. If you missed it, Snyk's full breakdown is worth reading.

I've been using LiteLLM in various projects for over a year. After the incident, I spent a few days auditing my own stack and evaluating alternatives. What I found wasn't just a security problem. It was a pattern of issues that compound at scale.

If you're evaluating LLM gateways right now, Bifrost is worth putting on your shortlist. It's a Go-based open-source LLM gateway that sidesteps most of the problems I'm about to describe.

TL;DR

The supply chain attack exploited a Python-specific persistence mechanism that doesn't exist in compiled binaries.
LiteLLM adds ~8ms of latency per request. At scale, that's significant.
The transitive dependency tree is massive. Every pip install is a trust decision.
Multi-provider configuration gets complex fast.
Python-based proxies behave differently under production load than in dev.

1. The Supply Chain Attack Wasn't Just a Credential Leak

Let me be clear: credential theft can happen to any project. The attackers (TeamPCP) compromised the Trivy GitHub Action, which LiteLLM's CI/CD pulled without pinning. That gave them LiteLLM's PYPI_PUBLISH token. From there, they published two backdoored versions.

But the persistence mechanism is what's worth paying attention to.

The malware deployed a .pth file. In Python, .pth files in site-packages execute arbitrary code on every interpreter startup. This is classified as MITRE ATT&CK T1546.018 (Event Triggered Execution). Even if you rolled back to a clean version of LiteLLM, the .pth file stayed. It survived package upgrades. It ran silently every time Python started.

The attackers also used 73 compromised GitHub accounts to spam the disclosure issue and then closed it using stolen maintainer credentials. The community was actively suppressed from learning about the breach.

This wasn't a theoretical risk. LiteLLM has 3.4M+ daily downloads. For roughly 3 hours, every pip install litellm pulled malware that exfiltrated AWS, GCP, Azure credentials, SSH keys, and K8s secrets.

The structural takeaway: .pth persistence is a Python-specific vector. A compiled Go binary has no site-packages directory, no .pth files, no interpreter startup hooks. The attack surface simply doesn't exist.

2. Python Overhead at Scale

LiteLLM adds approximately 8ms of latency per request. For a prototype or a low-traffic internal tool, that's fine. Nobody notices 8ms on a single API call.

At 1,000 requests per second, that's 8 cumulative seconds of added latency per second across your fleet. At 5,000 RPS, you're looking at 40 seconds of overhead per second. In latency-sensitive pipelines (real-time agents, streaming chat, batch evaluation), this compounds into real costs: longer tail latencies, more timeouts, more retries.

For comparison, Bifrost adds 11µs of latency overhead per request. That's roughly a 700x difference. At 5,000 RPS, the total overhead is 0.055 seconds per second.

This isn't a knock on Python as a language. It's a statement about what happens when you put an interpreted runtime in the critical path of every LLM call your system makes.

3. Transitive Dependency Bloat

Run pip install litellm and check what actually lands in your environment. The dependency tree is deep. Every package in that tree is a trust decision you're making implicitly. Every maintainer of every transitive dependency has potential write access to code that runs in your infrastructure.

The March 2026 attack proved this isn't theoretical. The compromise didn't start with LiteLLM's own code. It started with a transitive dependency in the CI/CD pipeline (an unpinned GitHub Action). The blast radius spread because the Python packaging ecosystem doesn't provide the kind of isolation that compiled binaries do.

Bifrost ships as a single compiled Go binary. You can also run it via npx or Docker. There's no site-packages directory. No transitive pip dependencies to audit. No .pth files to inspect after an incident.

# Bifrost: one binary, one trust decision
npx -y @maximhq/bifrost
# or
docker pull maximhq/bifrost

This matters most in regulated environments. If your security team needs to audit every dependency in your LLM gateway, the conversation is very different when the answer is "it's one binary" versus "here are 47 transitive Python packages."

4. Configuration Complexity

LiteLLM uses YAML-based configuration for provider routing. For a simple setup with one or two providers, it works. Once you're managing multiple providers with fallbacks, load balancing, and per-model routing, the config files grow fast.

Here's a simplified example of what a multi-provider LiteLLM config looks like:

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: os.environ/OPENAI_API_KEY
  - model_name: gpt-4
    litellm_params:
      model: azure/gpt-4-deployment
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE
router_settings:
  routing_strategy: least-busy
  num_retries: 3
  fallbacks:
    - gpt-4: [claude-3-opus]

Now multiply this by 10 providers and 30 models. Add custom headers, timeouts per provider, and conditional routing. It becomes a maintenance burden.

Bifrost takes a different approach. It has a web UI for visual provider configuration and supports zero-config deployment. You set environment variables for your provider keys and the gateway auto-discovers available models.

I'm not saying YAML config is inherently bad. But when the config layer becomes the most fragile part of your LLM infrastructure, that's a design signal worth paying attention to.

5. The "It Works in Dev" Problem

This one is subtler but I've seen it repeatedly. A Python-based proxy that performs fine in development behaves very differently at production concurrency.

Python's GIL (Global Interpreter Lock) means that CPU-bound work in the proxy layer serializes across threads. LiteLLM works around this with async patterns, and to their credit, the async implementation is solid for moderate loads. But under heavy concurrent load, the runtime characteristics shift. Garbage collection pauses, thread scheduling overhead, and memory fragmentation all behave differently at 1,000+ concurrent connections versus 10.

Go was designed for this. Goroutines are lightweight (a few KB of stack each), the scheduler is preemptive, and there's no GIL equivalent. Bifrost handles 5,000 RPS with 11µs overhead not because of clever optimization but because the language runtime handles concurrency natively.

If your LLM gateway only needs to handle dev-scale traffic, this doesn't matter. If you're routing production traffic for an application with real users, the runtime choice matters a lot.

Quick Comparison

	LiteLLM	Bifrost
Language	Python	Go
Latency overhead	~8ms/request	11µs/request
Throughput tested	Varies	5,000 RPS
Deployment	pip install + config	Single binary, npx, or Docker
Dependency surface	Large transitive tree	Single compiled binary
.pth persistence risk	Yes (Python-specific)	Not applicable
Config approach	YAML files	Web UI + env vars
License	Open source	Open source

What I'm Doing About It

After the March 24 incident, I audited every Python dependency in my LLM pipeline. That process alone took the better part of a day.

I've started migrating my LLM routing to Bifrost. The migration wasn't difficult. Bifrost uses the OpenAI-compatible API format, so most client code doesn't change. The main work was moving provider configurations out of YAML files and into Bifrost's setup.

I still use LiteLLM for some non-critical internal tooling where the dependency risk is acceptable. But for anything in the critical path of production traffic, having a compiled binary with a minimal attack surface is a different category of trade-off.

If you're re-evaluating your LLM infrastructure after this incident, here are the links that helped me:

Bifrost GitHub: https://git.new/bifrost
Bifrost Docs: https://getmax.im/bifrostdocs
Bifrost Homepage: https://getmax.im/bifrost-home
Snyk Incident Report: https://snyk.io/articles/poisoned-security-scanner-backdooring-litellm/
MITRE ATT&CK T1546.018: https://attack.mitre.org/techniques/T1546/018/

The numbers and claims in this post are sourced from public documentation, the Snyk incident report, and Bifrost's GitHub repository. If anything here is inaccurate, call it out in the comments and I'll correct it.