withusin

Posted on Jul 3

An agent can't approve its own change: binding PR approval to the exact diff

#ai #security #devops #programming

An agent opened a PR against our repo last month. It touched the file that holds our payment rates. The diff was small and read fine. What stopped me was the review trail: through a chain of bots and default approvals, the change had moved most of the way to merge, and nobody with a name and a paycheck had read the two lines that changed a number customers get billed on.

That is the problem I care about. Not "AI writes bad code" — it often writes fine code. The problem is accountability on the files where being wrong is expensive: payment rates, auth config, permission tables, data-path plumbing. When an agent authors a change to one of those, "an approval happened" is not the same as "a named human is on the hook for these exact lines."

What GitHub gives you, and where it stops for this case

GitHub already ships real controls here, and I use them. Each one answers a slightly different question than the one I need answered.

CODEOWNERS auto-requests review from the right people:

# CODEOWNERS
/config/payment_rates.yaml   @payments-leads
/auth/                        @security-team

A touch to those paths pings the owners, which is useful. But CODEOWNERS is advisory about reading: an approving review from a required owner satisfies it, and nothing verifies the owner looked at the diff. Approve from the mobile app on the train and the check goes green.

Required reviewers and rulesets enforce that an approval exists — N approvals from the right group. That is a real gate. It is a count, not a statement that a specific person is accountable for a specific set of lines.

Branch protection can require N approvals and can dismiss stale pull request approvals when new commits are pushed. This is the closest control to what I want, so it is worth being precise about. It is opt-in and off by default; you turn it on per branch or ruleset. When it is on, a new commit dismisses the prior approval and someone approves again. That is good hygiene. What it does not produce is a record proving which diff a named person signed off on. It re-requires an approval; it does not bind the approval to particular bytes in a way you can hand an auditor later.

None of this is a knock on GitHub. These controls do what they say. They answer "did enough approvals happen?" On a locked file I need to answer "is a named human accountable for exactly these lines, and can I prove it six months from now?"

The failure case that actually bites

The gap above has a concrete shape.

Agent opens a PR to payment_rates.yaml. The diff is clean.
A human owner reviews it and approves, in good faith.
A follow-up commit slips in a different edit: a rate bumped, a flag flipped.

If dismiss-stale is off, the earlier approval still stands and the changed code rides in under it. If it is on, the approval is dismissed and someone re-approves — but re-approval is still a count, and on a busy PR the second click is a lot cheaper than the first read. Either way, the approval that merges is not provably tied to the bytes that merge. The person on record approved a version. Nobody can point to the diff they were accountable for.

Approval bound to the code fingerprint

DevHive is a PR governance gate that runs as a required GitHub status check. You lock specific files. Any PR touching a locked file is blocked at merge until a named human approves, and the approval is bound to a fingerprint of the exact diff (the commit/diff hash).

The binding is the point. The record is not "Jaewon approved PR #412." It is "Jaewon approved this hash." Change one line after the approval and the hash changes, so the approval no longer matches the code and is void. It has to be re-given against the new fingerprint. There is no "approve the clean one, ship a different one" path that survives, because the thing that was approved is the bytes, not the pull request.

A lock rule is small and explicit:

# devhive lock
- path: config/payment_rates.yaml
  require: named-human-approval
  bind_to: diff-hash

Touch that file and merge stays blocked until an approval exists whose fingerprint equals the current diff's fingerprint. Slip an edit in behind the approval and you are back to blocked, on the new hash.

Deterministic rules, not a model reading the diff

The verdict — blocked or clear — comes from rules you write. It is not a model reading the diff and forming an opinion. That is deliberate. A rule engine returns the same verdict for the same input every time, so an auditor can replay it: feed the same PR state months later, get the same block, see which rule fired and why.

An LLM judging the diff cannot promise that. "Why was this blocked in March" should not depend on model weights or temperature. Deterministic rule in, reproducible verdict out.

Every verdict and every approval is hash-linked to the previous record: append-only and tamper-evident. You get an ordered chain of "this rule fired, this human signed this hash, in this order," and you cannot quietly rewrite an earlier entry without breaking the links after it.

One boundary, stated plainly: DevHive never writes to your code. It blocks and it records. Fixing a blocked change is your own tooling's job. The gate's role is to make "who is accountable for these exact lines" a provable fact, not to author the fix.

What this does not do yet

I would rather list the edges than have senior readers find them.

The lock gate and hash-bound approval are real and covered by automated tests. That is the part that works today.
Staged rule rollout (introducing a rule to a slice of repos before enforcing everywhere): roadmap, not built.
Green-lane evidence reporting and a weekly scorecard: roadmap.
A webhook + HMAC fix loop to route a blocked change back into your remediation tooling: roadmap.
No production or pilot numbers. We are pre-pilot, and I am not going to quote a metric I do not have.

If your threat model is a malicious owner who reads a diff and knowingly approves harmful code, a fingerprint binding does not save you. That is a people problem, not a hash problem. What the binding closes is the accountability gap and the approve-then-swap gap, which is where agent-authored changes leak through today.

Where this is

Pre-pilot. The gate works; what it does not have yet is your numbers. I am taking a few teams into early pilots — teams whose code is expensive to get wrong: payments, auth, permissions, data paths. The pilots produce the first live results. If you are the engineering leader, platform engineer, or security engineer who owns that risk, get in touch.

Details and contact: https://getdevhive.com/en/ — WITHUSIN.

Top comments (1)

withusin • Jul 3

I built the tool this post describes. Happy to get into where CODEOWNERS / branch protection actually stop short — that's the part I'm least sure I got right.