erasmus

Posted on Mar 20

Why no tool can currently prove your code was reviewed and why that gap is now a crisis

#webdev #programming #devops #security

In March 2025, a GitHub account compromise triggered one
of the most damaging software supply chain attacks of the
year. In September, the Shai-Hulud worm tore through 800
npm packages via self-propagation — the first known
self-replicating open source malware. In October, F5
Networks' development environment was breached by a
China-linked group who stole BIG-IP source code containing
encryption keys and configuration files. In November,
trojanized versions of packages from PostHog, Zapier and
Postman were pushed to npm via compromised maintainer
accounts.

And in almost every post-incident analysis, the same
question surfaced: When exactly did the clean version
become the compromised version, and how would anyone know?

That question doesn't have a good answer today. This
article explains why and what answering it properly
actually requires.

The scale of the problem

Software supply chain attacks more than doubled globally
in 2025. Over 70% of organisations reported experiencing
at least one third-party or software supply chain-related
security incident. Global losses from software supply
chain attacks are projected to reach $60 billion.

More importantly, 35% of attacks originated through
compromised software dependencies, 22% targeted CI/CD
pipelines and build environments, and 20% involved
poisoned or unverified container images. Dependencies,
build pipelines and containers now represent 75% of all
supply chain attack entry points.

Fewer than 50% of enterprises currently monitor more
than 50% of their extended software supply chain.

Runtime security controls consistently detected threats
too late.

The attack surface has fundamentally changed. Threat
actors are no longer targeting deployed applications —
they are compromising software at the point of creation.

What auditors actually need to see

When a breach happens and statistically, it will —
the first question is not "what was compromised" but
"when did this start?" Investigators need to establish
a timeline. That requires being able to point to a
specific version of a specific codebase and say:

"This version was clean. Here is the proof. Here is
the exact point where the record breaks."

Without that, you're doing forensics in the dark.
Consider what an auditor actually needs:

A timestamped record tied to a specific commit.
Not a report generated today about yesterday's code.
A cryptographic record that proves a specific commit
hash was reviewed by a specific process at a specific
moment in time.

A chain of custody across versions. If something
broke between version 1.4.2 and 1.4.3, the auditor
needs to be able to see that version 1.4.2 passed
verification and version 1.4.3 either wasn't reviewed
or failed. Without a linked chain of verification
records, there is no way to establish when clean
became compromised.

Independence from the organisation being audited.
If the only entity that can verify the review record
is the organisation that produced it, the record is
not evidence — it is an assertion. A regulator or
insurer needs to be able to verify the record without
trusting the party that generated it.

Explicit scope. The record must state what was
reviewed and critically what was not. A review
that checked static patterns but not runtime behaviour
must say so. A review claiming to cover everything
when it covered half the surface is misleading evidence,
not useful evidence.

None of the mainstream SAST tools produce all four of
these. Most produce none of them.

What developers and security teams actually face

The gap between what teams think they have and what
they actually have is significant. Here's what the
real technical landscape looks like:

Scan results are not evidence. When you run Snyk,
Semgrep, or CodeQL, you get a report. That report
lives in a dashboard or a PDF. It has a timestamp on
it — but that timestamp can be changed. The report
itself is not signed. There is no mechanism to prove
the scan ran against the commit it claims to have
scanned, rather than a different version. An auditor
has to trust you. They cannot verify.

Median time to remediate leaked secrets is 94 days.
Analysis of over 400,000 public GitHub repositories
found that the median time to remediate leaked secrets
discovered in a repository was 94 days. That's 94 days
of exposure after discovery not after introduction.
And that's just for known leaks in public repos.
Private repos and unknown leaks are a different
problem entirely.

CI/CD pipelines are a primary attack vector.
22% of supply chain attacks targeted CI/CD pipelines
directly. When your build pipeline is compromised,
code that looked clean when reviewed can be modified
before it ships. A verification system that only
checks source code and not the build process provides
a false sense of security.

LLM-assisted attacks are generating convincing
cover commits. Security researchers found that
"the malicious injections don't arrive in obviously
suspicious commits — the surrounding changes are
realistic: documentation tweaks, version bumps,
small refactors, and bug fixes that are stylistically
consistent with each target project. This level of
project-specific tailoring strongly suggests the
attackers are using large language models to generate
convincing cover commits."

This is the most technically challenging problem.
A backdoor that looks exactly like a legitimate
commit — stylistically consistent, contextually
appropriate will pass any review that relies on
human inspection. The only defence is a cryptographic
record that ties a specific hash to a verified state,
so any subsequent modification is detectable
regardless of how convincing it looks.

JPMorgan Chase's CISO said it publicly.
JPMorgan Chase CISO Patrick Opet published an open
letter urging software providers to treat supply chain
risk as systemic rather than a niche AppSec issue,
arguing that downstream enterprises cannot practically
absorb the compounding risk from their many insecure
software suppliers. When the CISO of the world's
largest bank is writing open letters about this, it
is no longer a niche concern.

The regulatory pressure is now real

DORA — the EU Digital Operational Resilience Act
came into force in January 2025. It applies to every
financial entity operating in the EU. Article 9
requires demonstrable evidence of secure development
processes. Not descriptions. Not policies. Evidence
that can be presented to regulators and independently
verified.

"We use Snyk" is not DORA evidence. A signed,
independently verifiable record of what was reviewed,
when, against which commit, and what was found is.

The FCA's operational resilience requirements in the
UK impose similar expectations. So does PCI-DSS v4.0,
which came into full effect in 2024. SOC 2 Type II
auditors are increasingly asking for evidence of
continuous verification rather than point-in-time
reports.

The direction of travel is clear: regulators want
proof, not assertions.

What proof actually requires

For a code review to be genuinely verifiable, the
output needs to satisfy four properties:

Tamper-evident. If the result was modified after
the fact, that modification must be detectable. This
requires cryptographic signing — an Ed25519 or similar
signature that ties the result to a specific key
holder and a specific point in time.

Deterministic. The same code reviewed by the same
process must always produce the same result. If two
independent parties run the same review on the same
commit and get different outputs, neither output is
trustworthy. Determinism is what makes independent
verification possible.

Independently replayable. A third party must be
able to re-run the review themselves and confirm the
result, without access to your systems or needing to
trust the party that generated the original record.

Honest about scope. The record must explicitly
state what was NOT checked. Runtime behaviour, dynamic
testing, fuzzing, business logic — if these weren't
covered, the record must say so. A review that implies
complete coverage when it only checked static patterns
is misleading evidence.

What a genuine audit trail looks like

Here is what a verifiable review record actually
needs to contain:

{
  "proof_id": "sha256:a3f8c2d1...",
  "repo_url": "github.com/org/repo",
  "commit_hash": "f4a91b3e...",
  "scan_timestamp": "2026-01-15T09:14:00Z",
  "verdict": "VERIFIED",
  "gates_passed": 5,
  "result_hash": "sha256:9e72d1...",
  "supersedes": "sha256:prev_proof_id...",
  "signature": "Ed25519:AMT-SIG-v1:...",
  "NOT_verified": [
    "runtime behaviour under production load",
    "dynamic security testing (fuzzing, DAST)",
    "semantic correctness of business logic",
    "concurrency correctness validation"
  ]
}

The supersedes field is the critical one for
incident response. It chains every verification
record to the previous one. If something was clean
at commit f4a91b3e on January 15th and compromised
at commit a7c3d2f1 on February 3rd, the chain shows
exactly where the record breaks. An investigator
doesn't need to reconstruct history — it's already
there, cryptographically linked and tamper-evident.

The NOT_verified field is equally important. In a
regulatory context, overstating the scope of a review
is worse than admitting its limits. A record that
honestly discloses what it didn't check is more
credible evidence than one that implies it checked
everything.

The honest benchmark problem

One more thing worth saying directly.

A verification system that verifies 90% of
real-world repositories is not being honest.

Real code has real problems. Repositories that lack
detectable structure, that claim to implement features
not present in the code, that fail build checks —
these exist in large numbers in the wild.
As one recent industry report put it: "2025 marked
the point when software supply chain risk became
measurable — highlighting the growing need for
verifiable foundations over unchecked delivery speed."

A system that verifies everything is telling you
what you want to hear. A system that verifies 41%,
marks 38% as partial, and refuses to verify 20% is
telling you something true — and giving you
information you can actually act on.

Honesty about what a tool cannot verify is itself
a trust signal. It is also a regulatory requirement —
DORA explicitly requires that evidence of review
accurately represents the scope of what was done.

Where this goes

Supply chain security is increasingly influencing
procurement, audit and insurance decisions, with
software provenance and SBOM disclosures emerging
as commercial requirements rather than best practices.

The gap between "we ran a scan" and "here is
independently verifiable proof that this specific
code was reviewed by this process at this point in
time" is going to close. Regulatory pressure,
supply chain attacks, and the proliferation of
AI-assisted malware are all pushing in the same
direction.

The tools that survive the next five years will be
the ones that produce evidence, not assertions.

Nucleus Verify produces cryptographically signed,
independently replayable verification certificates
for any codebase — with a tamper-evident chain
linking every scan to the previous. Free tier at
altermenta.com. GitHub
Action available at the
GitHub Marketplace.