Pico

Posted on May 8 • Originally published at getcommit.dev

Why I Think axios Is the Next Supply Chain Attack Target

#npm #security #javascript #supplychain

Update — March 31, 2026: The Attack Happened
On March 30–31, 2026, axios was compromised. Stolen npm token for maintainer jasonsaayman. Malicious versions axios@1.14.1 and 0.30.4 published with a cross-platform RAT. npm audit returned zero vulnerabilities for both versions during the attack window.
This post documents the structural case that was visible from public data before the compromise. See the full post-mortem analysis →

I run proof-of-commitment against axios periodically. The output:

npx proof-of-commitment axios

{
  "name": "axios",
  "score": 86,
  "riskFlags": ["CRITICAL"],
  "scoreBreakdown": {
    "longevity": 25,
    "downloadMomentum": 22,
    "releaseConsistency": 20,
    "maintainerDepth": 4,
    "githubBacking": 15
  }
}

CRITICAL. Same as last week. Same as last month.

Then I run npm audit:

found 0 vulnerabilities

One tool is flagging a structural risk. The other sees nothing. The difference is the difference between leading and lagging indicators, and it’s the reason I built Commit.

The question that started this

A year and a half earlier, I was sitting in my apartment in Stavanger trying to understand why AI gives bad recommendations.

Not wrong-in-a-funny-way. Wrong in a consequential way. A friend had asked ChatGPT for a restaurant recommendation for his anniversary dinner. It recommended a place that had been closed for two years and had three health code violations in its last operating year. The AI pulled from reviews, ratings, Google listings — the declarative layer. Everything self-reported or user-generated. None of it reflected reality.

I started thinking about what a trustworthy signal actually looks like. Not what people say about a restaurant, but what they do. Do they go back? Did the health inspector find violations? Is the business financially stable?

That question led me down a rabbit hole. The entire trust infrastructure of the internet is built on declarations. Star ratings are declarations. CVE databases are declarations. SOC2 certifications are declarations. Even npm audit — which the entire JavaScript ecosystem treats as a security tool — is a declaration lookup. It checks whether someone has already discovered, documented, and filed a vulnerability report.

What’s the behavioral equivalent? What would it look like to measure trust by observing what entities actually do, rather than what they or others claim?

For npm packages, I realized the answer was almost embarrassingly simple.

Building the signal

I didn’t start with a grand theory. I started with a spreadsheet.

I pulled the top 200 npm packages and began listing every data point I could extract from the npm registry API and GitHub API without any special access. Creation date. Download counts. Release history. Number of maintainers. Stars, forks, contributors. All public data.

Then I asked: which of these data points would have been predictive for past supply chain attacks?

event-stream in 2018: sole maintainer, abandoned for years, 2 million weekly downloads. ua-parser-js in 2021: sole maintainer, 7 million weekly downloads. The LiteLLM compromise in March 2026: attackers backdoored the Trivy GitHub Action in LiteLLM's CI pipeline — the structural weakness (single maintainer, thin oversight) made the supply chain the attack surface.

The pattern was neither sophisticated nor subtle. One person holding the keys to a package that millions of systems blindly execute every day.

So I wrote a scoring system. Five dimensions, all derived from public data:

Longevity — How long has this package existed? Age as a commitment signal.
Download momentum — How widely is it used? Blast radius measurement.
Release consistency — Is development active and regular? Maintenance signal.
Maintainer depth — How many people can publish releases? Single point of failure.
GitHub backing — Is there genuine community engagement? Social proof of investment.

The score runs from 0 to 100. High is good. But the interesting part isn’t the composite score. It’s the risk flag:

if (maintainerCount === 1 && weeklyDownloads > 10_000_000)
  riskFlags.push("CRITICAL");

One line of code. If you’re the kind of person who reads Hacker News, your instinct right now is that this is too simple to be useful. I shared that instinct initially. Then I ran it against every major npm supply chain attack in history.

Every single one matched.

The number that wouldn’t go away

I released proof-of-commitment as an open-source CLI and GitHub Action in early 2026. It was a side project attached to a larger thesis about behavioral trust data. I figured developers would find it useful as a lightweight CI check — a complement to npm audit, not a replacement.

The first time I scored axios, the output stopped me cold.

Score: 89 out of 100. By any “package health” metric, that’s excellent. Twelve years old. Massive download base. Regular releases. Strong GitHub community. Everything you’d want to see.

Except: one maintainer. One person with publish access to a package installed a hundred million times a week.

maintainerDepth: 4 out of 15 possible points. That single dimension broke the curve. CRITICAL risk flag.

I stared at that number for a while. Axios is the default HTTP client for most JavaScript developers. It’s in nearly every enterprise codebase I’ve touched. And one stolen credential — one phishing email, one leaked token, one compromised machine — would give an attacker code execution on millions of systems.

I wrote about it. I published the data. The structural risk was visible to anyone who looked. npm audit showed 0 vulnerabilities. The package was “healthy” by every conventional metric.

The CRITICAL flag stayed on. Weeks passed. I scored the top 50 npm packages and found that 15 of them — 30% — triggered the same condition. Together, those 15 packages accounted for 2.5 billion weekly downloads. minimatch at 562 million. chalk at 413 million. glob at 332 million. All single-maintainer. All CRITICAL.

Nobody seemed alarmed.

The template: ua-parser-js 2021

In October 2021, Faisal Salman — sole maintainer of ua-parser-js with ~7 million weekly downloads — had his npm account compromised via credential theft. The attacker published three malicious versions containing a cryptominer and a credential-stealing trojan. The malicious versions ran for four hours. Facebook, Microsoft, Amazon, and Google were downstream consumers.

Here’s the ua-parser-js timeline:

        [See full table at article link]

The structural flag was constant. npm audit caught up only after the ua-parser-js attack. Now: axios has the same profile — single maintainer, one credential, 82 million weekly downloads. Not 7 million. 82 million. 12x the blast radius of the ua-parser-js attack.

Leading vs. lagging

Here’s what I keep coming back to.

npm audit is not a bad tool. It does exactly what it claims: it checks your dependencies against a database of known vulnerabilities. But “known vulnerability” means someone already found it, documented it, assigned a CVE, and submitted it to the advisory database. By definition, npm audit can only tell you about attacks that have already happened.

That makes it a lagging indicator. A rear-view mirror. The most popular security tool in the JavaScript ecosystem can only confirm what’s already gone wrong.

Behavioral scoring is a leading indicator. It doesn’t predict which package will be attacked, or when. That would be fortune-telling, and I’m not going to pretend otherwise. What it does is identify the structural conditions that attackers select for — the packages where one compromised credential yields maximum blast radius. These conditions are visible months or years before any attack.

The difference matters for the same reason it matters in every other domain. You don’t wait for a building to collapse before checking whether it can survive an earthquake. You assess the structural properties — foundation, materials, load distribution — and flag the ones that are vulnerable. The assessment doesn’t predict earthquakes. It tells you which buildings to worry about when one arrives.

In software supply chains, the structural assessment is computable from public data in milliseconds. The fact that the entire industry wasn’t computing it before event-stream — or ua-parser-js, or colors.js — says something uncomfortable about how we think about security.

What I got wrong

I want to be honest about the limitations, because this is a story about what worked, and the things that worked are embarrassingly simple. The things that didn’t work are more interesting.

The scoring system can’t detect code-level threats. It wouldn’t have flagged the encrypted payload in flatmap-stream (the event-stream attack vehicle). It can’t distinguish between a legitimate maintainer and an attacker using stolen credentials — the package metadata looks identical before and after compromise.

And the simplicity of the CRITICAL flag means it casts a wide net. Fifteen of the top 50 packages are flagged. That’s useful for prioritization — you know where to focus your attention. But if you’re looking for “which package will be compromised next Tuesday,” this isn’t that tool. Nobody has that tool. Anybody claiming otherwise is selling you something.

What the scoring does well is separate the structural question from the vulnerability question. “Does this package have known bugs?” is useful, and npm audit answers it. “Is this package a single point of failure at enormous scale?” is a different question that nobody was answering systematically. Both matter. Only one gives you advance warning.

Why this matters beyond npm

The ua-parser-js attack wasn’t a supply chain security failure. It was a supply chain security architecture failure. The tools we built assume threats are known before they’re exploited. That assumption works for most classes of software bugs — CVEs get filed, patches get shipped, scanners pick them up. But supply chain attacks aren’t software bugs. They’re trust failures. They exploit the gap between what the declarative layer tells you about a package (0 vulnerabilities) and what the structural reality looks like (one credential away from catastrophe).

The same pattern exists everywhere. SOC2 certifications are declarations — Delve fabricated them for 494 companies. Restaurant reviews are declarations — and AI trained on them recommends closed restaurants with health violations. Benchmark scores are declarations — Berkeley RDI proved 8 out of 8 major AI benchmarks are fully exploitable.

We keep building trust systems on what entities say about themselves, and then acting surprised when the claims don’t match reality. The alternative is measuring what entities do — behavioral signals that require real cost to produce and are structurally harder to fake.

For npm packages, the behavioral signals are five public data points and one if-statement. For the broader trust infrastructure that AI and autonomous agents need, the signals are different but the principle is identical: commitment — costly, observable, verifiable action — is the unfakeable layer.

That’s what we’re building at Commit. Not just for npm. For every domain where the gap between declarations and behavior creates exploitable trust failures.

The ua-parser-js signal was visible for years. For axios, it’s visible right now. So is minimatch. chalk. glob. zod. esbuild. The data is public. The computation is trivial.

npm audit will catch the next attack after it happens. The signal is there for anyone willing to look.

Full dataset: State of npm Supply Chain Trust — Q2 2026 (100 packages, 14 CRITICAL)

Audit your own dependencies: npx proof-of-commitment --file package.json | Web interface | GitHub Action

DEV Community