DEV Community

NextGenRails
NextGenRails

Posted on

We Built a Cryptographic Archive of the Entire Software Supply Chain — Before the Next Attack Happens By NextGenRails™

On May 21, 2026, the Megalodon attack pushed malicious commits to 5,718 GitHub repositories in six hours. The next day, the Qilin ransomware group published details of their Semgrep campaign. That same day, 700+ historical versions of laravel-lang/lang were found backdoored with a remote code execution payload — inserted through a compromised GitHub account, quietly, across years of version history.

In every one of these incidents, investigators asked the same question within hours of discovery:

"What did this package look like before it was compromised?"

Before Prechained, there was no reliable answer.


The Gap Nobody Talks About

Package registries don't maintain pre-compromise snapshots. npm, PyPI, crates.io — they overwrite, yank, and update without preserving forensic records. By the time an attack surfaces — often days or weeks after the initial compromise — the original clean version may no longer exist anywhere in authoritative form.

This isn't a theoretical problem. It's the first practical problem every incident responder hits:

  • XZ Utils (2024): Backdoor inserted into a compression library used by SSH across Linux distributions. The question wasn't just what was inserted — it was when, and what did it look like before?
  • SolarWinds (2020): Build pipeline compromised, malicious updates pushed to 18,000 organizations. Forensics required reconstructing what the software looked like before the attackers touched it.
  • Polyfill.io (2024): A CDN domain was sold. The JavaScript payload changed overnight. Millions of sites were serving malware from a URL they'd trusted for years. No one had a snapshot of what the original code produced.

The supply chain attack playbook works precisely because trust is established before the attack. By the time you know something is wrong, the evidence of what "right" looked like may be gone.


What We Built

Prechained is a free, public, open source cryptographic archive of the global software supply chain.

Every 10 minutes, it automatically crawls 8 major package ecosystems, captures every package and every version it finds, computes a SHA-384 cryptographic fingerprint, stores the complete forensic manifest permanently in a public GitHub repository, and records the Bitcoin block height at the exact moment of capture.

The result: a tamper-evident, independently verifiable, time-anchored record of what every tracked package looked like before any attack occurred.

License: AGPL-3.0

Source: github.com/ngr-dev1/prechained

Archive: github.com/ngr-dev1/prechained-archive


The Technical Architecture

The Crawler

A Netlify serverless function (crawler-all.js) runs on two triggers:

  • Scheduled: Every 10 minutes via Netlify cron (*/10 * * * *)
  • On-demand: Triggered on every page load via frontend JavaScript

Eight ecosystem-specific crawlers run in parallel via Promise.allSettled(). Each one:

  1. Fetches the current top packages dynamically from the ecosystem's own popularity API
  2. Falls back to a comprehensive hardcoded seed list if the dynamic fetch fails
  3. For each package, fetches all known versions
  4. Checks Supabase for already-captured versions (deduplication)
  5. For each new version, builds a complete forensic manifest
  6. Stores the manifest permanently in the prechained-archive GitHub repo
  7. Computes the SHA-384 fingerprint
  8. Inserts the record into Supabase with the current Bitcoin block height

No crawl run ever fails hard. If a discovery API is down, the seed list takes over. If an individual ecosystem times out, the others continue.

Dynamic Package Discovery

Rather than a fixed list, the crawler dynamically fetches the most popular packages on every run:

Ecosystem Discovery Method Coverage
npm registry.npmjs.org search by popularity Top 250 + seed list
PyPI hugovk.github.io/top-pypi-packages Top 300 by monthly downloads
Cargo crates.io/api/v1/crates?sort=downloads Top 100 by all-time downloads
RubyGems rubygems.org/api/v1/search Most downloaded gems
Packagist packagist.org/explore/popular.json ~250+ packages across 5 pages
NuGet azuresearch-usnc.nuget.org/query Top 250 by total downloads
GitHub github.com/search API Top security/supply-chain repos
Maven Extended seed list Full POM parsing

The list expands automatically as new packages rise in popularity. We also add packages manually in response to active incidents — laravel-lang was added within hours of the RCE disclosure.

What Gets Captured — The Forensic Manifest

This is where most supply chain tools stop short. We don't just capture a version number and a checksum. We capture everything forensically relevant.

For npm:

{
  "name": "express",
  "version": "4.4.1",
  "ecosystem": "npm",
  "scripts": {
    "install": null,
    "preinstall": null,
    "postinstall": null,
    "prepare": null,
    "prepublish": "npm prune",
    "prepublishOnly": null
  },
  "maintainers": [
    { "name": "dougwilson", "email": "doug@somethingdoug.com" }
  ],
  "_npmUser": { "name": "dougwilson", "email": "doug@somethingdoug.com" },
  "publishedAt": "2014-06-03T01:27:48.550Z",
  "dist": {
    "integrity": "sha512-...",
    "shasum": "...",
    "tarball": "https://registry.npmjs.org/express/-/express-4.4.1.tgz",
    "fileCount": 69,
    "unpackedSize": 210432
  },
  "dependencies": { ... },
  "devDependencies": { ... },
  "captured_at": "2026-05-23T13:40:07.488Z",
  "captured_by": "prechained.com",
  "crawler_sha384": "100cea91..."
}
Enter fullscreen mode Exit fullscreen mode

Every field is deliberate:

  • scriptsinstall, postinstall, preinstall are the #1 malicious vector in npm attacks. A compromised package often differs from a clean one only in a single postinstall line. We capture these verbatim.
  • maintainers — everyone who had publish rights at this exact version
  • _npmUser — the specific account that pushed this version (may differ from the maintainers list — that discrepancy is itself a forensic signal)
  • publishedAt — the registry's own timestamp, not our capture time
  • dist.fileCount + dist.unpackedSize — sudden size increases are a consistent early indicator of payload injection

For GitHub repos, we go further:

  • commit_sha — full 40-char commit hash
  • commit_author vs commit_committer — these differ in attack scenarios (force pushes, rebase attacks)
  • commit_verified — was this commit GPG signed?
  • commit_verification_reason — why it was or wasn't verified
  • commit_parents — merge commit detection
  • is_fork + parent_repo — typosquatting and fork-based attack detection

For PyPI, we capture every individual file in the release with its own SHA256 and upload timestamp — not just the package-level checksum.

For Cargo, dependencies are fetched from a separate endpoint (crates.io/api/v1/crates/{name}/{version}/dependencies) and stored alongside published_by — the specific crates.io account that pushed the version.

The full manifest specification for all 8 ecosystems is documented in the Prechained Technical Overview.

Cryptographic Fingerprinting

For every captured version:

  1. The package payload is serialized to a canonical JSON string
  2. SHA-384 is computed: crypto.createHash('sha384').update(payload).digest('hex')
  3. The fingerprint is stored in Supabase
  4. The fingerprint is displayed publicly on every package page

We use SHA-384, not SHA-256, for three specific reasons:

  • 192-bit security against collision attacks
  • Not vulnerable to length-extension attacks
  • In the SHA-2 family trusted by NIST, CMMC, and FedRAMP — relevant for the compliance use cases this data feeds into

Bitcoin Anchoring

At the moment of each capture, we fetch the current Bitcoin block height from blockstream.info/api/blocks/tip/height. This block number is stored with every snapshot.

What this actually proves: the package was captured no later than the block at that height. Bitcoin blocks are immutable and globally timestamped by the network itself — no central authority can alter them retroactively. If a package was captured at block #950,607 and an attack was discovered at block #950,800, the Prechained record mathematically proves the capture predates the attack.

This isn't blockchain hype. It's a specific, practical mechanism for establishing a tamper-evident chronology without requiring anyone to trust us.

The Receipt System

Every snapshot is issued a unique Receipt ID: NGR-PC-XXXXXXXXXXXXXXXX

Receipts include:

  • Package name, version, ecosystem
  • SHA-384 fingerprint
  • Capture timestamp
  • Bitcoin block number and confirmation status
  • Crawler SHA-384 — a fingerprint of the crawler code itself, proving what code produced the receipt

That last field is the one most people miss. We don't just fingerprint the data — we fingerprint the code that produced it. Anyone can independently verify that the crawler running today is the same crawler that produced a historical receipt.


Independent Verification — No Trust Required

Prechained is designed so no one needs to trust us. Here's how to verify any record independently:

  1. Go to the package page, copy the SHA-384 fingerprint
  2. Find the manifest JSON at github.com/ngr-dev1/prechained-archive/{ecosystem}/{package}/{version}/manifest.json
  3. Compute sha384(manifest_payload) yourself
  4. Compare with the stored fingerprint
  5. Look up the Bitcoin block number at blockstream.info
  6. Confirm the block timestamp predates any known attack disclosure

The crawler source is public and fingerprinted. You can audit exactly what code produced any receipt.


Real-World Incident Coverage

This isn't theoretical. Here's where Prechained has already provided pre-compromise records for active incidents:

Semgrep / Qilin Ransomware (May 22, 2026)

The Qilin ransomware group published details of their Semgrep attack. Prechained had already captured semgrep/semgrep at commit v238ad257ba97 on May 21 at 11:23 PM — 9 hours before publication. Bitcoin Block #950,477 confirmed. Receipt: NGR-PC-MP6J9YB08PDYQI.

Laravel-Lang RCE Backdoor (May 22, 2026)

The makowskid GitHub account inserted RCE backdoors across 700+ historical versions of laravel-lang/lang, laravel-lang/http-statuses, and laravel-lang/attributes. Prechained added all laravel-lang/* packages to the active crawl list the same day the attack was disclosed.

Megalodon Attack (May 21, 2026)

5,718 malicious commits pushed to 5,561 GitHub repositories in 6 hours. Prechained launched the same day, with GitHub repo tracking active from day one.


What Prechained Does Not Do

We want to be precise about scope:

  • Does not scan for vulnerabilities in real time — we capture and fingerprint. CVE correlation is tracked in a vuln_states table but active scanning is not the primary function.
  • Does not cover private packages — only public registries. Private package monitoring is what cbomcompliance.com handles.
  • Does not guarantee 100% coverage — we cover the top packages by popularity. Low-download packages may not be in the archive yet.
  • Does not store binaries — manifests and metadata only. The actual .tgz, .whl, .gem files are not stored.
  • Does not alert in real time — there is no push notification system yet. That is a planned feature.
  • Does not retroactively capture — if a package was never crawled before an attack, there is no pre-attack record. This is exactly why broad dynamic coverage is critical.

The Database Schema

packages (
  id UUID PRIMARY KEY,
  name TEXT,
  ecosystem TEXT,
  description TEXT,
  latest_version TEXT,
  total_versions INT,
  first_captured_at TIMESTAMPTZ,
  last_captured_at TIMESTAMPTZ
)

snapshots (
  id UUID PRIMARY KEY,
  package_id UUID REFERENCES packages(id),
  version TEXT,
  ecosystem TEXT,
  sha384_fingerprint TEXT,
  merkle_root TEXT,
  receipt_id TEXT UNIQUE,
  jws_receipt TEXT,
  btc_anchored BOOLEAN,
  btc_block BIGINT,
  captured_at TIMESTAMPTZ,
  raw_metadata JSONB,
  ots_proof TEXT,
  manifest_path TEXT,
  xrpl_ledger TEXT,
  xrpl_txid TEXT
)
Enter fullscreen mode Exit fullscreen mode

raw_metadata stores the complete registry API response as JSONB for full fidelity. manifest_path points to the permanent GitHub archive location. The schema is designed to grow — XRPL anchoring fields are already reserved for when we activate the XRP Ledger as a secondary timestamp layer.


Live Stats (May 23, 2026)

  • 37,000+ snapshots captured and Bitcoin anchored
  • 1,400+ packages tracked across 8 ecosystems
  • 10-minute crawl cadence
  • $0 cost to users — no ads, no tracking, no login required
  • Bitcoin Block ~#950,611 — current anchor height

The Relationship to SBOM and Compliance

Prechained is the free public layer of a broader cryptographic trust infrastructure. Every package page links to cbomcompliance.com — the paid, compliance-grade layer that processes private SBOMs into formally signed JWS receipts accepted by C3PAOs and auditors under CMMC Level 2, EU CRA, ISO 27001, and NIST SP 800-171, with zero data retention.

Prechained itself has been receipted by cbomcompliance: Receipt NGR-CBOM-8ED22D90DD7D — CLEAN, 0 issues, Bitcoin anchored.

The relationship is simple: Prechained covers public packages. cbomcompliance covers private software. Together they cover the full supply chain — before the attack and after.


Try It

If you maintain a package in any of the 8 ecosystems we cover, your package is probably already in the archive. Look it up.

If you're doing incident response and need a pre-compromise baseline for a package you can't find elsewhere — check Prechained first.

Trust is not declared. It is computed.


Built by NextGenRails™ · AGPL-3.0 · Free forever

Top comments (0)