DEV Community

Pico
Pico

Posted on • Originally published at getcommit.dev

Proof-of-Commitment Internals: How the Scoring Algorithm Works

npm audit is a CVE scanner. It queries a database of known vulnerabilities
against the package versions in your lockfile. When a CVE is filed, catalogued,
and propagated, it will appear. Before that: silence.

The problem with that model is that supply chain attacks don't announce themselves.
Before the ua-parser-js attack (October 2021, 8M weekly downloads), before the
LiteLLM attack (March 2026), before the Bitwarden CLI incident — every tool
returned clean. The structural preconditions for compromise were visible in public
registry data. The tools just weren't looking.

Proof-of-commitment measures those preconditions. Here's how the scoring works,
from registry data to CRITICAL flag.

Five Dimensions, All Public Data

Every package gets scored across five behavioral dimensions. No proprietary data
sources, no scraping, no access required beyond the public npm registry and GitHub API.

1. Longevity (25 pts)

Package age in years, from pkg.time["created"] in the registry response.
Time is an unfakeable signal — you can't buy 10 years of operational history.

function scoreLongevity(ageYears: number): number {
if (ageYears >= 6) return 25;
if (ageYears >= 4) return 20;
if (ageYears >= 2) return 14;
if (ageYears >= 1) return 8;
if (ageYears >= 0.5) return 4;
return 1;
}
Enter fullscreen mode Exit fullscreen mode

The scoring is deliberately nonlinear. The difference between 1 and 6 years
isn't linear in risk — a 6-year-old package is embedded in thousands of production
systems. A 1-year-old package mostly isn't.

2. Download Momentum (25 pts)

Weekly download volume sets the blast radius. Trend direction (growing vs.
declining) captures attacker attention — growing packages attract more eyes,
malicious and otherwise.

function scoreDownloads(
weeklyAvg: number,
trend: "growing" | "stable" | "declining" | null
): number {
let base = 0;
if (weeklyAvg >= 1_000_000) base = 22;
else if (weeklyAvg >= 100_000) base = 18;
else if (weeklyAvg >= 10_000) base = 14;
else if (weeklyAvg >= 1_000) base = 10;
else if (weeklyAvg >= 100) base = 6;
else base = 3;

const trendMod = trend === "growing" ? 3 : trend === "declining" ? -3 : 0;
return Math.max(0, Math.min(25, base + trendMod));
}
Enter fullscreen mode Exit fullscreen mode

Trend is derived from the last 90 days of daily download data: first half vs.
second half, with >15% change as the threshold. Growing = >1.15×, declining = = 100) base = 15;
else if (versionCount >= 30) base = 12;
else if (versionCount >= 10) base = 9;
else if (versionCount >= 3) base = 6;
else base = 3;

const recency =
daysSincePublish = 5) return 15;
if (count >= 3) return 11;
if (count >= 2) return 7;
if (count === 1) return 4;
return 0;
}



A sole maintainer scores 4/15 — the lowest non-zero value. The reasoning is
structural: one compromised npm token means a malicious version goes to every
downstream install. One phishing email, one credential leak, one maintainer
account selling their package — and the blast radius is the full weekly download
count.


### 5. GitHub Backing (15 pts)


If the package has a linked GitHub repository, we score that repository independently
across five sub-dimensions: longevity, recent commit activity (last 30 days),
contributor count, release cadence, and stars. The resulting 0–100 GitHub score
maps linearly to 0–15 points.


Organization-backed repos score higher than personal repos for the same reason:
organizational accounts have multiple people with access, institutional continuity,
and usually internal security practices. A personal repo is one account away from
full control by whoever compromises it.


## The CRITICAL Flag


The flag has two conditions, both must be true:


1. Single maintainer (`maintainerDepth = 4/15`)
2. Weekly downloads > 10M


Enter fullscreen mode Exit fullscreen mode

const riskFlags: string[] = [];
if (profile.maintainerCount === 1 && weeklyDownloads > 10_000_000) {
riskFlags.push("CRITICAL");
}



The threshold is explicit and deterministic. You can reproduce it from the npm
registry and downloads API with no proprietary data. The reasoning behind 10M:
below that volume, a compromised package causes real damage but doesn't constitute
an infrastructure-level event. Above it, the blast radius is broad enough that
a well-resourced attacker has a meaningful incentive.


Two additional flags exist for adjacent risk profiles:


- **HIGH**: package 1M weekly downloads — rapid adoption without operational track record.
- **WARN**: package hasn't published in >365 days — likely unmaintained but still receiving traffic.


## How the Audit API Works


The `POST /api/audit` endpoint takes a list of package names and returns
scored profiles. The naive implementation — firing one download API request per package
concurrently — hits a documented npm rate limit: parallel requests to the downloads
API return zeros. This was the first failure mode we hit in production.


The fix: batch all unscoped packages into a single bulk request before processing.


Enter fullscreen mode Exit fullscreen mode

// One bulk HTTP request for all unscoped npm packages
const bulkWeekly = await bulkFetchNpmWeeklyDownloads(unscopedNpm);

// All packages processed concurrently — downloads already resolved
const allResults = await Promise.all(
packages.map(async (pkg) => {
const preloadedWeekly = pkg.startsWith("@")
? undefined // scoped: fetch individually
: bulkWeekly.get(pkg); // unscoped: use bulk result
const profile = await buildNpmCommitmentProfile(pkg, preloadedWeekly);
// ...
})
);



The bulk API (`/downloads/point/last-week`) accepts up to 128 packages
per request and returns a map of package → weekly count. Scoped packages
(`@scope/name`) are not supported by the bulk endpoint — they fall back
to individual fetches.


The registry metadata fetch and GitHub scoring run concurrently per package.
For a 20-package audit, this is roughly: 1 bulk download request + 20 registry
requests + up to 20 GitHub requests, all in parallel. Wall-clock time on cold
CF Workers: typically 800ms–1.5s depending on GitHub API latency.


## Benchmarks: Real Packages, Live Data


Actual response from the live API (April 29, 2026):


Enter fullscreen mode Exit fullscreen mode

curl -X POST https://poc-backend.amdal-dev.workers.dev/api/audit \
-H "Content-Type: application/json" \
-d '{"packages": ["chalk", "express", "hono"]}'



| Package | Score | Maintainers | Weekly DL | Age | Last Publish | Risk |
| --- | --- | --- | --- | --- | --- | --- |
| `chalk` | 75 | 1 | 418M | 12.7 yr | 181 days ago | 🔴 CRITICAL |
| `hono` | 79 | 1 | 34.5M | 4.4 yr | 4 days ago | 🔴 CRITICAL |
| `express` | 94 | 5 | 95M | 15.3 yr | 12 days ago | ✅ No flag |


Score breakdowns:


Enter fullscreen mode Exit fullscreen mode

chalk:
longevity: 25/25 (12.7 years)
downloadMomentum: 22/25 (418M/week)
releaseConsistency: 13/20 (181 days since last publish)
maintainerDepth: 4/15 (1 maintainer)
githubBacking: 11/15

hono:
longevity: 20/25 (4.4 years)
downloadMomentum: 22/25 (34.5M/week)
releaseConsistency: 20/20 (4 days since last publish)
maintainerDepth: 4/15 (1 maintainer)
githubBacking: 13/15

express:
longevity: 25/25 (15.3 years)
downloadMomentum: 22/25 (95M/week)
releaseConsistency: 20/20 (12 days since last publish)
maintainerDepth: 15/15 (5 maintainers)
githubBacking: 12/15



chalk and hono both score in the 75–80 range and look reasonably healthy by
conventional package health metrics. Both are CRITICAL. The difference with
express is one number: 5 maintainers vs. 1.


## What This Doesn't Cover


**Code analysis.** Proof-of-commitment doesn't inspect package
contents. Socket.dev does this — static analysis of published code for
suspicious patterns. These are complementary layers, not competitors.
If you want to catch malicious code after it's been published, Socket.dev
is the right tool. If you want to identify packages structurally positioned
for compromise before any malicious code exists, use Commit.


**CVE tracking.** npm audit, Snyk, and Dependabot scan for known
vulnerabilities. Commit doesn't duplicate this. Both layers serve different
threat models: known CVEs (catalogued, after-the-fact) vs. structural exposure
(predictive, before-the-fact).


**Trajectory, not snapshot.** The current implementation scores
the current state of a package. A package that had 5 maintainers for 10 years
and just dropped to 1 gets the same maintainerDepth score as one that's always
had a single maintainer. This is a known gap — the score should track transitions,
not just current state.


**CRITICAL packages that never get attacked will always outnumber
the ones that do.** The flag identifies exposure, not certainty.
Most sole-maintained packages with 100M weekly downloads are run by talented,
security-conscious people. The risk is structural, not behavioral.


## Run It On Your Stack


CLI:


Enter fullscreen mode Exit fullscreen mode

npx proof-of-commitment --file package.json



API (replace with your packages):


Enter fullscreen mode Exit fullscreen mode

curl -X POST https://poc-backend.amdal-dev.workers.dev/api/audit \
-H "Content-Type: application/json" \
-d '{"packages": ["chalk", "zod", "axios", "express", "hono"]}'





Browser: [getcommit.dev/audit](https://getcommit.dev/audit)


The source is at [github.com/piiiico/proof-of-commitment](https://github.com/piiiico/proof-of-commitment).
The scoring functions are in `src/backend/npm.ts`. The audit endpoint
is in `src/backend/worker.ts`.


---


*Related: How Commit Scores npm Packages: The Methodology — the reasoning behind the weights. Why npm audit Returns Zero Vulnerabilities for the Most Dangerous Packages — where each tool fits in the stack. Two Types of npm Supply Chain Attack — what each tool actually covers.*
Enter fullscreen mode Exit fullscreen mode

Top comments (0)