JinHyuk Sung

Posted on Jun 21

CI gates for AI-generated PRs need re-derivable evidence

#github #cicd #security #ai

When a CI gate flags an AI-generated PR, the important question is not only "what did it flag?"

It is also:

"Could someone else come back later and re-derive why this finding fired?"

That is the reason I added evidence snapshots to Agent Gate v0.2.1.

What Agent Gate is

Agent Gate is a GitHub Action for AI-generated pull requests.

It does not review code with an LLM. It checks deterministic merge evidence in CI:

PR scope escapes
GitHub Actions permission escalation
AGENTS.md / .mcp.json drift
missing test-file evidence
high-risk path changes

The Action does not checkout PR code, call LLMs at runtime, or execute repository scripts.

Why finding IDs were not enough

In v0.2.0, Agent Gate added stable finding IDs.

That gave every finding a short audit handle, for example:

agf_987ab9ddb8c1b299

That is useful for references, comments, future override workflows, and log-based debugging.

But an ID by itself is not proof. If someone sees the ID later, they still need to know what recorded material produced it.

What v0.2.1 adds

v0.2.1 adds evidenceSnapshot to public findings.

The split is:

findingId = short audit handle
evidenceSnapshot = canonical material used to derive that handle

The snapshot is intentionally boring. It contains stable rule material such as:

rule id
severity
path or line when present
normalized evidence label/value pairs

It does not include timestamps, report order, risk score, version, commit SHA, or mutable display text.

A real report shape

Example compact log output:

Agent Gate: NEEDS HUMAN DECISION
Decision: warn
Risk score: 49 / 100
Why: Agent-generated PRs must include an agent-gate contract.
Recommended next step: Add a PR contract before relying on scope checks.
Policy status: warning today; eligible to become a merge gate after tuning.

Findings:
- error agf_be0c2c2a66312aff contract/missing
- error agf_987ab9ddb8c1b299 risk/high-risk-path .github/workflows/agent-gate.yml
- warn agf_6016e753491255d7 workflow/dangerous-pattern .github/workflows/agent-gate.yml

The compact log stays short, but the JSON and Markdown reports carry the fuller evidence.

Example JSON shape:

{
  "findingId": "agf_987ab9ddb8c1b299",
  "ruleId": "risk/high-risk-path",
  "severity": "error",
  "path": ".github/workflows/agent-gate.yml",
  "evidenceSnapshot": {
    "ruleId": "risk/high-risk-path",
    "severity": "error",
    "path": ".github/workflows/agent-gate.yml",
    "evidence": [
      {
        "label": "changed_file",
        "value": ".github/workflows/agent-gate.yml"
      }
    ]
  }
}

Why this matters

For me, the bar for promoting a finding from warning to blocking is:

A third party should be able to re-derive the finding from recorded evidence.

That does not mean the check is magically correct.

It means the failure mode is visible, reproducible, and tunable.

A repo can start in warn mode, observe which findings are useful, and only later promote low-noise findings into merge gates.

What this does not solve yet

Agent Gate still does not prove semantic correctness.

Matching test-file evidence is not proof that the tests cover the behavior. It is change evidence / self-consistency evidence.

Maintainer override storage is also not implemented yet. That is probably the next hard design question: if someone bypasses a finding, where should that override live so it is durable enough to inspect later?

CODEOWNERS / reviewer evidence and package dependency drift are also future work.

Try it

If you maintain a repo where coding agents open PRs, I would love feedback on whether this kind of evidence is useful or too noisy in observe mode.

Repo:

https://github.com/sjh9714/Agent-Gate

Disclosure: I maintain Agent Gate. v0.2.1 is still a prerelease; I would start in warn mode before treating any finding as a merge gate.

Top comments (2)

Pete Miloravac • Jun 29

Good framing. The ID-alone problem is something I've bumped into — you can have a perfectly stable finding identifier, but if the runner state underneath isn't reproducible, someone trying to re-derive the finding later is still guessing.

Shared/mutable runners make this sneaky. At Semaphore we run each job in a fresh ephemeral VM so "same code → same CI state" is actually true rather than just assumed. The structured test reports help too — they carry stable IDs per test across runs, so the "missing test evidence" check has something real to anchor against rather than whatever happened to show up in that build's log.

Also curious — is evidenceSnapshot meant to be portable across CI platforms, or is it GHA-specific? The permission escalation check in particular seems like it'd look pretty different outside of GHA.

(Disclosure: I work at Semaphore.)

JinHyuk Sung • Jun 30

Thanks — this is a really good distinction.

I agree that a stable findingId plus an evidenceSnapshot does not magically make the whole runner state reproducible.

The way I’m thinking about it is:

findingId is just the audit handle
evidenceSnapshot is the deterministic preimage that the rule actually consumed
it is not a full claim that the entire CI world state can be recreated later

So for the current GHA rules, the snapshot is mostly trying to answer:

“Given the rule input Agent Gate saw, can someone re-derive why this finding fired?”

not:

“Can someone recreate the exact runner environment that produced the whole CI run?”

That boundary matters.

For the portability question: I’d like the evidence model to stay portable, but individual rules can absolutely be platform-specific.

workflow/permission-escalation is GHA-specific by design. Outside GitHub Actions, I’d expect the evidence preimage to look different — maybe job identity, runner image, pipeline config diffs, permission/capability changes, or whatever that CI system exposes as stable structured input.

Your point about fresh ephemeral VMs and structured test reports is especially useful for the “missing test evidence” side. Today Agent Gate’s missing-test check is intentionally conservative: it is change/self-consistency evidence, not proof of behavioral coverage. Stable test IDs from structured reports would make that rule much stronger than trying to infer too much from logs.

So I think the clearer framing is:

Agent Gate evidence should be re-derivable from the material the rule records
runner/platform reproducibility is a separate layer
better CI substrates can make the recorded evidence much stronger
some rules are portable in shape, but not in their concrete evidence source

This is probably worth spelling out in the evidence model docs. Appreciate the Semaphore perspective.