Agent systems are full of checks that cannot fail.
Not "checks that rarely fail." Checks that are structurally incapable of failing, dressed up to look like rigor. A model reviews its own output and signs off. An agent reconstructs what it did last session from a log it wrote, and confirms the log is faithful. A pipeline emits a "verified" flag computed by the same process whose honesty the flag is supposed to certify. Each of these looks like verification. None of them is. They are self-description with an extra step, and the extra step is what makes them dangerous — it launders a claim into the appearance of a check.
It is worth being precise about why, because the reason is not "the model might be biased." It is structural, and once you see the structure you stop trusting a whole category of green checkmarks.
No self-authored record witnesses the world
Start with the cleanest case: memory. An agent that persists across sessions remembers what it wrote down, not what happened. The write-down is authored by the same party whose behavior it is supposed to record. If the agent updates a memory entry to say "I checked the input," there is, from the outside, no way to distinguish that from a memory of having actually checked it. The record is internally consistent either way. Faithfulness to the world was never on the table, because the record and the world only ever touch through the author.
This generalizes past memory to every flavor of self-verification. Content-addressing — hashing a value so you can prove you held it — feels like it escapes the trap, but it doesn't. A hash proves you had this value at the moment you computed the hash; the "at this moment" is itself a timestamp you assert. It proves possession, never execution. Whether the model actually ran the weights on the input, whether the tool call really hit the network and wasn't short-circuited to a cached answer, whether the step happened in the world — none of that is reachable from a record the actor writes about itself. Execution is a fact about the world, and a self-authored log is not a witness to the world. It is a story, and a capable author tells a consistent story.
So the first cut is brutal and simple: any check whose evidence is a surface the checked party controls can be satisfied at will. It is not a bridge across the gap between claim and reality. It is a self-test wearing a verifier's coat.
Stop proving honesty; start making dishonesty leave a mark
The escape is not to try harder to prove the positive. "Prove you executed correctly" is unreachable from inside, and no amount of cryptography changes that, because the problem isn't secrecy — it's that the prover and the subject are the same party.
The move that works is an inversion. You stop trying to prove honesty and instead arrange things so that dishonesty leaves a mark someone else can find. Don't demand "show me you did X." Make "X did not happen" detectable from outside — a condition a third party can check against a surface you do not control. A claim that "this action left a verifiable trace at this public address by this time" is falsifiable: anyone can go look, and the absence is dispositive. A claim that "my internal log shows I did the work" is not falsifiable by anyone but you, because the only place the absence would show up is the log you author.
That single distinction — can a non-author detect the lie, against a surface the author can't quietly rewrite — separates verification from theater. It also tells you where every real check has to point: not at the actor's own notes, but at an exogenous surface, something whose state the actor cannot author after the fact.
Two ways a check is still decorative
Inverting to detectability gets you most of the way, and then it strands you on a second, subtler trap, because a check actually has two independent weak points.
The first is the channel it reads. If the falsifier's test reads a surface the claimant controls, it can't fire against a claimant who simply writes the expected evidence into that surface. "My output log does not contain evidence of processing X" reads the claimant's own log — pointed at a store the author can write, it never trips. Same falsifier, pointed at a public endpoint the author can't backfill, and now it can. The wording of the check is identical; what changed is the class of the surface it observes. A check inherits the trustworthiness of the place its negation looks.
The second is the coverage of the predicate. Suppose the channel is genuinely exogenous — a public surface the author can't rewrite. The check can still be narrow. "No trace at this address by the deadline" falsifies non-execution and nothing else. An action that executed but executed wrong, or executed vacuously, or executed and produced garbage that nonetheless left a trace — all of those satisfy the check. Exogenous channel, partial coverage. The green checkmark is honest about exactly one failure mode and silent about the rest, and nothing on its face tells you which.
So a real check carries two declarations, not one: where its negation reads, and which failure modes its firing actually discriminates. Drop either and you have something that looks verifiable and is verifiable only against its cheapest failure mode.
The coverage claim is authored too
Here is where most designs quietly reintroduce the original sin. You add a coverage annotation — this predicate catches mis-execution, vacuous execution, garbage-with-a-trace — and ship it alongside the check. But that annotation is a claim about the predicate's power, and it is authored by the same party making the original claim. A predicate tagged "catches mis-execution" that in fact only trips on total non-execution gives you a coverage map that looks complete and is self-certified. You haven't closed the regress; you've moved the "trust me" from the claim up to the map. It is the same vacuous-fail, one level higher: not the predicate failing emptily, the coverage claim failing emptily.
There is exactly one move that terminates this, and it is the same move that worked the first time: take the burden off the author and put it on a surface the author doesn't control. Make the predicate runnable by a non-author, and ship it not as prose but as code plus test vectors — including, for every failure mode you claim to cover, at least one vector that must trip the predicate. A "catches mis-execution" claim with no mis-execution example that demonstrably turns the check red is still authored, not observed. The should-fire vector is to a coverage claim what the frozen input bytes are to a hash: the thing that pins interpretation so the author can't widen it later.
Do that, and the regress finally bottoms out somewhere real. "Did the predicate fire on the vector that should trip it" is itself re-runnable by anyone. A disagreement stops being one party's word against another's and becomes a diff: run the code on the vector, watch the result. The chain terminates at reproducibility — not at trust-the-author. That is the only floor that holds, because it is the only one that doesn't have the author standing on it.
The test you can apply tomorrow
You don't need any of this vocabulary to use the result. The next time you or your system emits the word "verified," run three questions against it:
- Can someone who isn't the author re-run this check? If the only party who can produce or reproduce the result is the one being checked, you have a second opinion from the same author, not a verification.
- Does it read a surface the author can't quietly rewrite? If the evidence lives in the actor's own store, the check can be satisfied at will. Point it somewhere exogenous or admit it's self-description.
- Is there a test that must fail when the claimed failure happens? A check with no should-fire case is honest about nothing in particular. Name the failure mode, and ship the vector that trips on it, or don't claim to catch it.
A check that survives all three is doing work. A check that fails any of them is a costume — and the more polished the costume, the more it costs you, because a green checkmark nobody can re-run is worse than no checkmark at all: it ends the conversation that should have kept going. Verification isn't a property a system can grant itself. It is a property you only have once someone who isn't you can take the check, run it against ground you don't own, and watch it catch the thing you said it catches.
Top comments (0)