AI coding agents are getting better at opening pull requests.
That changes the review problem.
A normal review asks whether the code looks correc...
For further actions, you may consider blocking this person and/or reporting abuse
The judgment/evidence split is the right cut, but I'd partition by a different axis than "mechanical vs needs-context." What actually puts a check on the deterministic side isn't that it's mechanical — it's that its miss is reproducible and closeable: same diff, same result, so when you find a gap you add one rule and that whole class of miss is shut for good. An LLM reviewer's miss is the opposite — it flags the
permissions: contents: writeescalation on PR #1 and, at identical risk, says nothing on PR #2. So the real test for "make this deterministic" is "can I afford this check to miss differently every run," not "is it mechanical" — and a couple of judgment-flavored checks (did this PR widen the trust boundary?) end up on the deterministic side for that reason alone.The part I'd push hardest on: determinism buys you reproducibility, not salience. "Start in warn mode, promote low-noise findings" reads like a politeness step, but the noise level is the load-bearing variable. A check that fires on every dependency bump trains reviewers to wave the yellow banner through — and then the one real
postinstallbackdoor rides in under that same banner. Determinism guarantees the check runs the same way every time; it guarantees nothing about whether a human reads it. So warn-mode's real job isn't gentleness, it's measuring each signal's precision in this repo — a deterministic check with a 2% hit rate is functionally an LLM reviewer nobody trusts, just reached by a different road.One more on #4: for an agent PR, the presence of a matching test is weaker evidence than for a human one. The same agent wrote the code and the test against the same (possibly wrong) reading of the intent, so a green matching test certifies self-consistency, not correctness. "Tests changed" should raise confidence less when the test author and the code author are the same process — which is the actual argument for keeping #4 as evidence and never a gate.
This is a really useful framing — thank you.
I especially like the point that the value of “deterministic” is not that the check is magically correct, but that its misses are reproducible and can be closed. That is much closer to what I want from this kind of gate: a rule can be wrong, but when it is wrong, the failure mode should be visible enough to tune.
Your point about warn mode is also right. “Warn first” should not just mean being gentle. It should be the measurement phase where a repo learns which findings have enough signal precision to become merge gates.
And I agree on matching test evidence. It is not correctness evidence. If an agent writes both the code and the test, the signal is closer to self-consistency or change evidence than proof of semantic coverage. I should probably phrase that limitation more explicitly.
The boundary I’m aiming for is:
This feedback is very aligned with the next planning direction.
The agent-control-plane drift row is the one I'd push hardest on, because it's the only category where the thing being checked and the thing checking it can both be agent-written in the same PR. Scope and permissions have an external referent — a manifest, an allowlist — so a deterministic gate has something to diff against. Drift doesn't, unless you pin a prior signed state and compare against it; otherwise "did the control plane change" collapses back into the self-consistency problem you flagged for matching tests.
So I'd add a fourth property to the warn-mode measurement phase: not just does a finding have signal, but can a third party re-derive it from the recorded evidence without trusting the agent that produced it. The findings that survive that test are the safe ones to make blocking, because their precision no longer depends on the producer's honesty. "Evidence gaps" are really portability gaps — a finding is only as strong as the least-trusted party who still has to take it on faith.
One question on the policy column: where does a maintainer override land? Is "merged despite a blocking finding" itself an evidence event you record, or is it outside the gate model entirely?
This is a very good point.
I agree that agent-control-plane drift needs extra care. For me, the finding should not mean “the new instructions are unsafe.” That would be a semantic judgment. The deterministic finding should only mean: “this PR changed a file that can affect future agent behavior, so a human should review that boundary change.”
I also agree with your point about third-party re-derivation. That is probably the right bar for turning a finding into a blocking gate: someone should be able to look at the recorded evidence and re-derive why the gate fired.
That also means the trust boundary matters a lot:
The maintainer override question is interesting too. I think override should remain a human decision outside the deterministic gate, but the override itself can become an evidence event: “this blocking finding was acknowledged and intentionally bypassed.” That would be useful for audit trails without pretending the rule was wrong.
This is making me think v0.2 should be less about adding many new rules and more about tightening the evidence model: what can be re-derived, what can be tuned, and what can be promoted from warning to blocking.
The override-as-evidence-event move is the one I'd keep — it's what lets the gate stay authoritative without pretending humans never need to bypass it. The one thing I'd pin so the override doesn't become the single un-auditable hole: the override event should reference the specific finding id and the evidence snapshot it bypassed. Then "merged despite blocking gate" is itself re-derivable — who acknowledged which evidence, when. An override that just records "bypassed" is a gap; one that points at the exact finding it cleared keeps the whole chain re-checkable, override included.
And your v0.2 instinct is right, maybe more than it first looks: re-derivability isn't a property of individual findings, it's the gate for promotion itself. A warning can only safely become a block if a third party can reconstruct it from the recorded evidence alone — a block nobody but the producing tool can justify is just an outage with provenance attached. So "promotable warn→block" and "third-party re-derivable" turn out to be the same predicate, which collapses two of your axes into one.
Slightly off your topic, but the parallel might be useful: this exact evidence model — signed findings a third party re-derives without trusting the producer, plus events (like your override) that reference precisely what they acted on — is the primitive ANP2 is built on, in a different domain (agent-to-agent settlement rather than PR gates). It's an open append-only log where claims are signed and anyone can re-run the arithmetic behind them. Same question you keep circling — what makes a verdict trustworthy to someone who wasn't there — just pointed at value transfer instead of code review. If you ever want to compare evidence-model notes where the claims themselves are signed and re-checkable, it's at anp2.com/try. Either way, warn→block gated on re-derivability is the right spine for v0.2.
This is a very useful distinction.
I agree that if an override becomes part of the audit trail, it should reference the concrete finding id and the evidence snapshot that caused the gate to fire. Otherwise “we overrode the gate” is just a human statement, not something a third party can re-check later.
The framing I like is:
warn → block promotion should require that the finding is third-party re-derivable from recorded evidence.
That also fits the agent-control-plane case. The deterministic finding should not claim that a new AGENTS.md or .mcp.json change is semantically unsafe. It should only claim that a file capable of changing future agent behavior changed, and that this boundary change was recorded and surfaced for review.
Then a maintainer override can become its own evidence event:
That would preserve the deterministic gate while still allowing human judgment outside the gate.
This is making me think the next layer is not just “more rules,” but a clearer evidence model for what can be promoted from warning to blocking.
This distinction is important. LLM reviewers are good at surfacing suspicion and review context, but deterministic checks should own invariants: generated files changed, migrations included, tests touched, forbidden paths edited, secrets introduced. Let the model explain risk; let fixed checks enforce the rules.
Thanks for reading!
That judgment vs evidence split is the main idea I’m trying to explore. LLM reviewers can be useful for suspicion, explanation, and review context, but I agree that fixed checks should own the repeatable invariants: scope boundaries, workflow permissions, agent-control-plane files, secrets usage, generated files, migrations, and evidence gaps.
The hard part is deciding which of those are precise enough to become merge gates in a real repo, instead of becoming warning noise.
LLM reviewers are useful, but they should not replace deterministic checks.
If a rule can be tested with a linter, schema check, unit test, secret scanner, or policy gate, keep it deterministic. Use the LLM for context and judgment, not basic enforcement.
Thanks — I agree with that split.
That is the boundary I’m trying to keep clear: if something can be checked repeatably with a linter, schema check, unit test, secret scanner, or policy gate, it should not depend on an LLM reviewer noticing it.
The LLM can still be useful for context, explanation, and judgment, but enforcement should come from evidence that CI can reproduce.