Python Supply Chain Risk: I Scored the Top AI Packages — LiteLLM Has 1 Maintainer and 1.2K Versions

#security #opensource #python #mcp

LiteLLM serves 97 million downloads per month. In March 2026, attackers stole a PyPI token, uploaded malicious versions, and compromised an estimated 500,000 machines. The package looked healthy by every conventional metric: high downloads, GitHub stars, active issues.

But behavioral signals told a different story.

The Attack Pattern

The LiteLLM supply chain attack followed what security researchers now call the "pre-staged C2 pattern":

Attackers stole a CI/CD token via a compromised dependency (Trivy → Checkmarx KICS → LiteLLM)
A clean decoy package was uploaded 24 hours before the malicious version
The malicious version was uploaded and distributed before detection
300GB of credentials were exfiltrated

This exact pattern appeared in the Axios npm attack months earlier. The toolchain: Trivy → CanisterWorm npm (66+ packages) → Checkmarx KICS → LiteLLM → Telnyx SDK.

What Download Counts Hide

Here's what the behavioral signal profile looks like for the packages at the center of this:

litellm (PyPI):

Age: 2 years
Versions published: 1,285 (releases ~daily — high velocity)
Downloads: ~3.6M/day (growing)
Maintainers: 1
GitHub score: 84/100
Commitment Score: 72/100

The signal that matters: 1 maintainer, 1.2K versions, daily releases. That's a single point of failure at maximum velocity. One stolen token = instant access.

Compare to requests — the gold standard:

Age: 15 years
Versions: 154 (measured cadence)
Downloads: ~44M/day (growing)
Maintainers: 3
GitHub score: 90/100
Commitment Score: 95/100

Requests has slower release cadence (154 versions vs 1,285), multiple maintainers, and 15 years of behavioral consistency. That's what genuine commitment looks like at scale.

Scoring Key Python Packages

I built a behavioral commitment scorer that queries PyPI and pypistats.org with no auth required. Here's where the packages most at risk in the AI supply chain land:

Package	Score	Maintainers	Key Risk
requests	95/100	3	None
fastapi	~82/100	2	-
pydantic	~88/100	3	-
openai	74/100	6	No download data
anthropic	52/100	1	Single maintainer
langchain	52/100	1	Single maintainer
litellm	72/100	1	Single maintainer + 1.2K versions

The AI package ecosystem has a structural problem: most packages powering production AI systems have a single maintainer. One stolen token or one compromised developer account = game over.

Behavioral Signals vs. Declarative Claims

The LiteLLM breach is a perfect case study for why behavioral compliance fails:

✅ Published on PyPI (real package)
✅ MIT licensed (declared)
✅ High downloads (popular)
✅ Active development (GitHub)
❌ 1 maintainer at extreme release velocity
❌ Pre-staged malicious version uploaded before detection

Every declarative check passed. The behavioral anomaly — no matching GitHub releases for the malicious version — was the only signal that caught it. Sysdig spotted the discrepancy between PyPI uploads and GitHub releases.

What To Do With This

If you're using Python packages in production, the risk model is:

High risk: single maintainer + high download volume + rapid release cadence. Any one of these is fine. All three together means a single credential breach can reach millions of systems instantly.

Mitigation: Pin exact versions with hash verification in requirements.txt. Watch for packages that release daily — the legitimate ones usually don't need to.

The scorer I built surfaces these signals automatically. You can use it via MCP (6 tools: GitHub repos, npm packages, PyPI packages, Norwegian businesses, domain behavior).

# Add to your Claude Desktop config:
{
  "mcpServers": {
    "proof-of-commitment": {
      "url": "https://poc-backend.amdal-dev.workers.dev/mcp"
    }
  }
}

Then ask: "Score litellm on PyPI for supply chain risk"

The answer: 72/100, 1 maintainer, 1,285 versions, 3.6M downloads/day. Make your own judgment from there.

Commitment Score is behavioral — package age, download consistency, release patterns, maintainer depth, and linked GitHub activity. No subjective ratings.