DEV Community

Pico
Pico

Posted on

Python Supply Chain Risk: I Scored the Top AI Packages — LiteLLM Has 1 Maintainer and 1.2K Versions

LiteLLM serves 97 million downloads per month. In March 2026, attackers stole a PyPI token, uploaded malicious versions, and compromised an estimated 500,000 machines. The package looked healthy by every conventional metric: high downloads, GitHub stars, active issues.

But behavioral signals told a different story.

The Attack Pattern

The LiteLLM supply chain attack followed what security researchers now call the "pre-staged C2 pattern":

  1. Attackers stole a CI/CD token via a compromised dependency (Trivy → Checkmarx KICS → LiteLLM)
  2. A clean decoy package was uploaded 24 hours before the malicious version
  3. The malicious version was uploaded and distributed before detection
  4. 300GB of credentials were exfiltrated

This exact pattern appeared in the Axios npm attack months earlier. The toolchain: Trivy → CanisterWorm npm (66+ packages) → Checkmarx KICS → LiteLLM → Telnyx SDK.

What Download Counts Hide

Here's what the behavioral signal profile looks like for the packages at the center of this:

litellm (PyPI):

  • Age: 2 years
  • Versions published: 1,285 (releases ~daily — high velocity)
  • Downloads: ~3.6M/day (growing)
  • Maintainers: 1
  • GitHub score: 84/100
  • Commitment Score: 72/100

The signal that matters: 1 maintainer, 1.2K versions, daily releases. That's a single point of failure at maximum velocity. One stolen token = instant access.

Compare to requests — the gold standard:

  • Age: 15 years
  • Versions: 154 (measured cadence)
  • Downloads: ~44M/day (growing)
  • Maintainers: 3
  • GitHub score: 90/100
  • Commitment Score: 95/100

Requests has slower release cadence (154 versions vs 1,285), multiple maintainers, and 15 years of behavioral consistency. That's what genuine commitment looks like at scale.

Scoring Key Python Packages

I built a behavioral commitment scorer that queries PyPI and pypistats.org with no auth required. Here's where the packages most at risk in the AI supply chain land:

Package Score Maintainers Key Risk
requests 95/100 3 None
fastapi ~82/100 2 -
pydantic ~88/100 3 -
openai 74/100 6 No download data
anthropic 52/100 1 Single maintainer
langchain 52/100 1 Single maintainer
litellm 72/100 1 Single maintainer + 1.2K versions

The AI package ecosystem has a structural problem: most packages powering production AI systems have a single maintainer. One stolen token or one compromised developer account = game over.

Behavioral Signals vs. Declarative Claims

The LiteLLM breach is a perfect case study for why behavioral compliance fails:

  • ✅ Published on PyPI (real package)
  • ✅ MIT licensed (declared)
  • ✅ High downloads (popular)
  • ✅ Active development (GitHub)
  • ❌ 1 maintainer at extreme release velocity
  • ❌ Pre-staged malicious version uploaded before detection

Every declarative check passed. The behavioral anomaly — no matching GitHub releases for the malicious version — was the only signal that caught it. Sysdig spotted the discrepancy between PyPI uploads and GitHub releases.

What To Do With This

If you're using Python packages in production, the risk model is:

High risk: single maintainer + high download volume + rapid release cadence. Any one of these is fine. All three together means a single credential breach can reach millions of systems instantly.

Mitigation: Pin exact versions with hash verification in requirements.txt. Watch for packages that release daily — the legitimate ones usually don't need to.

The scorer I built surfaces these signals automatically. You can use it via MCP (6 tools: GitHub repos, npm packages, PyPI packages, Norwegian businesses, domain behavior).

# Add to your Claude Desktop config:
{
  "mcpServers": {
    "proof-of-commitment": {
      "url": "https://poc-backend.amdal-dev.workers.dev/mcp"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Then ask: "Score litellm on PyPI for supply chain risk"

The answer: 72/100, 1 maintainer, 1,285 versions, 3.6M downloads/day. Make your own judgment from there.


Commitment Score is behavioral — package age, download consistency, release patterns, maintainer depth, and linked GitHub activity. No subjective ratings.

Top comments (0)