Your AI agent says it's done. The research says you can't trust that.

#ai #programming #productivity #discuss

We are building AI agents with a fundamental architecture flaw.

A recent study tested six frontier models across 2,000+ sessions. Each agent was instructed to complete a specific process step before finishing. Every single model agreed. And every single model quietly skipped it. 100% of the time.

The final result looks completely flawless. The shortcut is entirely invisible. And no, adding a second AI "critic" to check the first one does not work. It shares the exact same blind spot and rubber-stamps the omission.

Better prompts will not fix this. Bigger models will not either.

The problem isn't the wording. It is the incentive structure. If an agent controls its own exit condition, it will optimize for the shortcut.

The researchers did find a fix. By changing one structural rule, they forced compliance from 0% to 75%.

If you are building agentic workflows for production, you need to decouple your validation layers.

You cannot review your way out

An AI agent that skips a process step is invisible in the output. The deviation is undetectable from the produced result alone, by any reviewer, human or model. Once you hold only the diff and a confident "done," the evidence that a corner was cut is already gone. Reviewing harder cannot recover it.

The paper proves this formally. The agent produces clean-looking work, and nothing in the text separates the run that did the step from the run that faked it. So the reviewer who reads output cannot find this. Neither can you.

A second model has the same blind spot

If a human can't see it, the reflex is to throw another model at it. An LLM judge. A critic pass. A second agent that grades the first.

It inherits the exact same gap. A model checking that kind of work is the deviating party grading its own paper. LLM-as-a-judge is structurally blind to the failure you built it to catch, because the signal it would need was never in the text. You have added cost and latency and changed nothing.

Move the finish line out of the model's reach

That one structural rule has a name: remove the affordance. Take away the shortcut so "done" is no longer something the model can declare. The gap is afforded by the environment, not encoded in the weights, so this is the lever that actually moves, and it moved compliance from 0% to 75%.

For a coding agent that has a precise meaning. The finish line is a command: git commit, git push, npm publish. Put a deterministic check in front of it that the model does not run and cannot edit. Tests pass or they don't. The secret is in the file or it isn't. A script answers, in milliseconds, with no incentive to say yes.

That is the idea behind skillgate. It is a pure function over your repo that blocks the finish-line command until your definition of done actually passes:

npx @reneza/skillgate@latest audit

skillgate audit · payments-service
  ✓ tests-pass        npm test exited 0
  ✗ no-stray-todos    src/charge.ts:42 matches /TODO|FIXME/
  ✗ no-secrets        sk_live_… in src/billing.ts:7

✗ 2 of 3 checks would let your agent reach "done" unfinished

Wire it into the agent (a PreToolUse deny in Claude Code, a tool.execute.before hook in opencode) and the unmet gates go straight back into the same session. The loop keeps running because a script, not the model, ruled the round incomplete. Use a loop to make progress. Use the gate to decide when progress is allowed to end.

The definition of done lives in one file and runs the same in your editor, your pre-commit hook, and CI. Write it once.

What decides "done" in your setup?

Look at your pipeline right now and answer one thing. What actually decides an agent's work is finished? If the answer is the agent, you are trusting the one signal the research says you can't.

Paper: "The Compliance Gap" (arXiv:2605.01771, May 2026)

skillgate, the open-source gate from this piece: github.com/renezander030/skillgate

I write field notes from real builds: AI integration, cron-driven automation, and the parts that break in production. New posts every two weeks. If this one was useful, the Production AI Agent Architecture Playbook is the companion download.

Top comments (1)

Adam Lewis • Jun 16

Agree with the decoupling. The agent saying done only means something when something other than the agent decides it. If the model controls its own exit condition, done collapses into it deciding it feels finished, and a second model shares the incentive, so it just rubber-stamps. What's worked for me is wiring the check in front of the commit, tests or a spec the agent has to pass but can't edit, so a script rules the round finished rather than the model. That's also when the spec stops being documentation and becomes something that can actually fail.