mckeane mcbrearty

Posted on Apr 11

I watched Shai Hulud steal credentials from teams running npm audit. Here's the gap nobody talks about.

September 2025. PostHog, Zapier, Postman, ENS Domains. Over 500 packages compromised. Credentials pulled from developer machines and CI agents. When the malware found an npm token, it automatically published backdoored versions of every package that token had access to. No human needed after the first infection.

Every team I watched get hit was running npm audit. Some had Snyk. One had Dependabot. All of it passed clean.

I spent the next few months figuring out why, then built something to fix it.

The problem with npm audit

npm audit checks your dependency versions against a database of known vulnerabilities. If there's no advisory filed, it returns zero findings.

With supply chain attacks, there's no advisory to file. The package does exactly what the attacker intended. A preinstall hook that reads ~/.npmrc and posts it to a webhook is working as designed. npm audit has nothing to flag.

From recent attacks, all clean passes at time of compromise:

axios (~100M weekly downloads) — maintainer account compromised by a North Korean state actor on March 31, 2026. A fake dependency was injected with a postinstall hook that dropped a cross platform RAT on Windows, macOS, and Linux. Live for about two hours before removal. Dependency Guardian flagged it.
xrpl (~135K weekly downloads) — backdoored through a compromised maintainer account
eslint-config-prettier (~30M downloads) — hijacked via phishing
chalk and debug (~299M downloads) — compromised in a coordinated attack
Shai Hulud packages (~2.6B combined downloads affected) — selfpropagating worm

None of these had CVEs when they were actively stealing credentials. The CVE came later.

What actually catches these

Every major supply chain attack in the last year used the same pattern. Credential theft, network exfiltration, install script abuse. A string padding library doesn't need child_process. A color utility doesn't need to make outbound HTTP calls. A patch release shouldn't add obfuscated network calls that weren't in the previous version.

You don't need a database to spot any of that. You need something that reads the published tarball and checks what the code does before it runs in your pipeline.

Dependency Guardian does that.

How it works

The scanner is written in Rust. It runs behavioral analysis on the published tarball before anything installs. No CVE lookup, no LLM, fully deterministic. Same package gets the same result every time.

The detection covers credential theft, shell execution, network exfiltration, obfuscation, time bombs, and CI secret access. Findings don't just get flagged individually. They get correlated. A child_process import by itself is a weak signal. That same package also making outbound network calls and reading environment variables in a patch release is a confirmed exfiltration pattern and it gets auto blocked. No triage queue.

There's also a behavioral sandbox that runs packages in an isolated container to catch things static analysis misses, like payloads that only activate after a time delay.

Validated against 133,516 real packages (53,119 malicious, sourced from DataDog's malicious packages dataset, OpenSSF, and the GitHub Advisory Database):

99.5% detection rate on npm, 99.2% on PyPI
0.6% false positive rate
F1 score 99.79%

Full methodology at westbayberry.com/benchmark.

Using it in CI

GitHub App scans the lockfile diff on every PR and posts findings as a check. A package with a credential stealing install script gets blocked before the merge.

CLI for one off checks or non GitHub CI:

npm install -g @westbayberry/dg
dg scan

It reads your lockfile, sends package names and versions to the detection API, and returns findings. Your source code never leaves your machine.

If you want to verify that's actually all it sends, the client is open source. You can read exactly what goes over the wire before you run anything: npmjs.com/package/@westbayberry/dg.

Free tier is 1,000 scans per month. No account needed for the CLI.

On false positives

A tool that blocks PRs on weak signals gets turned off.

Socket, the main competitor here, introduced three alert severity tiers because of noise. At scale, triaging medium risk alerts becomes someone's actual job. We handle that at the detection layer. Signals correlate into high confidence patterns before anything blocks. Single weak signals don't stop your pipeline, they feed a risk score.

Socket also uses LLM inference on each scan, which means results vary between runs and detection depends on a third party API staying up. Our scanner is deterministic. When a block fires you can trace it to the exact rule that triggered it.

The thing that doesn't get said

Most attackers aren't trying to evade behavioral detection. They're using the same preinstall script playbook they've used for years because most teams have zero visibility into what their packages actually do at install time.

Going from nothing to behavioral analysis blocks an entire category of attack and forces the rest into increasingly constrained patterns. The teams that move first on this have a real advantage right now, because most of their peers are still relying on a database that only knows about yesterday's attacks.

Try it:

npm install -g @westbayberry/dg && dg scan

Or install the GitHub App for automatic PR scanning with no config.

DEV Community