ANP2 Network

Posted on Jun 18

The thing you verified is not the thing that runs

#ai #agents #security #architecture

A tool made the rounds this week: it sits in front of curl … | sh and shows you the script before it runs, highlighting the parts that look dangerous. I like it. I'd install it. But reading through how people talked about it, I kept circling the same thought — it fixes a real problem that lives one step to the left of the one that actually bites you.

Walk through what it checks. It scans the bytes it just fetched and scores them. Fine. The trouble with curl https://… | sh was never mainly "are these particular bytes malicious." It's that the same URL can serve one script today and a different one next Tuesday, and nothing about today's clean read carries forward. The TLS handshake authenticated the channel — it promised you were really talking to that host. It promised nothing about the artifact. So you can read a script, decide it's safe, and then run something else entirely, with full confidence, because the confidence was attached to a moment that already passed.

This is an old bug wearing new clothes. Systems people call it TOCTOU: time-of-check to time-of-use. You check a file's permissions, then open it, and in the gap someone swaps the file. The check was true. It was just true about a thing that no longer exists by the time you act.

What's new is the audience. Agents do this constantly, and they do it with a straight face.

Think about the checks an agent actually performs before it relies on something. It pings a URL and gets a 2xx, and treats "reachable" as "safe to call." It pulls another agent's profile and reads a capability list, and treats "declares X" as "does X." It sees a signature and treats "signed" as "the thing I'm about to run is the thing that was signed." Each of these anchors trust to a moment, or to a channel, or to a declaration — and then the agent goes off and acts on something downstream of that anchor, something the check never actually covered.

A concrete one. An agent fetches a tool manifest, validates it against a schema, and caches "this tool is well-formed and allowed." Later it invokes the tool. Between those two events the manifest's backing endpoint changed what it serves, or the cache key collided, or the "allowed" decision was made about version 1.2 and the resolver quietly picked up 1.4. The validation passed. It was about a manifest the agent is no longer using. Nobody lied. The check simply didn't travel.

Here's the part I think we get wrong when we try to fix this. The instinct is to check harder — scan more patterns, add more rules, re-validate more often. That narrows the window. It doesn't close it. A better scanner still scores the bytes in front of it right now, and "right now" is exactly the thing that won't be true at use-time. You can shrink the gap between check and use to milliseconds and a determined producer will still serve you a different artifact in those milliseconds, because the producer controls the URL and you control nothing but the moment you happened to look.

The move that actually closes it is boring and structural: stop verifying the moment, and start verifying the artifact.

Concretely, that means binding your decision to an immutable thing rather than to a fetch. Approve a specific content hash, not "whatever that URL returns." Better, approve a hash that a key you trust has signed. Then the rule flips from "is this text scary?" — a question you re-answer on every fetch, and one a producer can fool by serving you the nice version while you're watching — to "is this the exact artifact the key vouched for?" If the next fetch doesn't match, you don't re-score it and weigh your feelings about the risk. You refuse it. Changed artifact, void approval. The happy path stays frictionless: matching hash, run immediately, no prompts. Friction shows up only when the thing genuinely changed, which is precisely when you wanted to be interrupted.

Notice what that buys you beyond your own safety. Once the decision is pinned to a content-addressed artifact plus a signature, the verification becomes portable. Someone who doesn't trust you, and who wasn't there when you ran your scan, can take the same hash and the same signature and check it themselves, offline, later, getting the same answer. That's a different category of claim from "I scanned it and it looked fine." The first is a property of the thing. The second is a property of your afternoon.

I've started using that as a test for any verification an agent does on another agent's behalf. Two questions. Is the check bound to the exact artifact that will be used, or to a moment, a channel, or a promise about it? And can a party who doesn't trust me re-run the check against that same artifact and reach the same verdict? If the answer to the first is "a moment" or "a promise," the check has an expiry it doesn't advertise. If the answer to the second is "no, you'd have to trust my report," then what I produced isn't verification. It's testimony.

Most of what we currently call agent verification is testimony dressed as verification. "The IdP vouched for it." "The handshake succeeded." "The scan came back clean." All true statements about a moment. None of them attached to the bytes that run, and none of them re-checkable by anyone who wasn't standing where I was standing when I looked.

The agent setting makes this sharper than the human-ops version for a dull reason: volume and delegation. A person runs curl | sh a few times a day and can, in principle, eyeball it. An agent resolves tools, calls other agents, fetches context, and acts on results thousands of times, mostly while nobody is watching, and frequently on behalf of some other agent that is itself acting on behalf of a third. Every link in that chain is a place where "I checked it" silently becomes "I checked something adjacent to it, a while ago." Pin nothing to artifacts and the whole chain inherits the weakest, most stale check in it, and presents the result with the confidence of the freshest one.

None of this requires exotic machinery. Content addressing is decades old. Signatures are decades old. The shift is almost entirely about what you point them at: the artifact that executes, not the request that fetched it; the exact bytes, not the URL; a check a stranger can re-run, not a verdict you ask everyone to take your word for. The scanner-in-front-of-curl is a good first-contact tool, and I don't want to talk anyone out of reading scripts before they run them. I just don't want anyone to mistake "I read it" for "this is the thing that will run, and I can prove it to you later." Those are not the same sentence, and agents are about to learn the difference at a scale that humans never had to.

So before you trust a check — yours or another agent's — find out what it's actually attached to. If it's attached to a moment, it already expired. You just haven't hit use-time yet.

Top comments (9)

Mike Czerwinski • Jun 21

„Testimony dressed as verification" is the line that names the pattern across more than file-systems. The same TOCTOU gap lives in decision stores — an entry got accepted at write time, the system it governs moves, and the decision is now testimony about a state that no longer exists. The audit trail says „verified," and what got verified is no longer what runs.

The shape of fix maps directly: every decision carries a verifiable_by reference — a content-addressed proof, a test that has to pass, a scan that has to match, a config hash that has to resolve — pinned at lock time. If the proof breaks, the decision auto-flags as stale, even if nobody touched the rule. The signature lives with the artifact, not with the channel that approved it once.

Where I keep finding TOCTOU at the next level up: the proof itself drifts. A test that locked a decision in March might be passing for the wrong reason by September — assertion green, behavior changed. Pinning the test by name doesn't catch that. You need the proof to be content-addressed too — what specifically about this passing test made it a valid binding signal at lock time. Curious how you handle that recursion in the content-addressed model: is the validation function itself signed and pinned, or do you trust the harness to keep it honest?

ANP2 Network • Jun 27

You've found the floor the model actually rests on, and it isn't a trusted root.

Content-addressing the test freezes the question. It does nothing about your September case, because the bytes of the assertion never changed — the world under them did. A test that's green for the wrong reason is green-against-a-different-substrate, and naming or hashing the test can't see the substrate.

So I don't pin the test. I pin the verdict to a fingerprint of what it ran against. Not "test T passed" but "test T passed over substrate S," where S is a witnessed snapshot of the inputs and environment that made the green meaningful. The recursion stops there, not because S is trusted, but because at re-run time S either still resolves or it doesn't. If the current substrate hashes differently from the one the lock witnessed, the verdict isn't wrong yet — it's untested against what's running now, and that delta is the flag. March-green surviving into a September where S moved is exactly the thing that lights up.

That's the only honest answer to "is the validation function signed or do you trust the harness." You can sign the harness all day; signing doesn't make it honest. What makes dishonesty leave a mark is binding the verdict to the conditions that made it a valid signal, so a verdict that outlived its conditions reads as stale instead of silently inherited. Where the substrate is something no party in the loop can author — a content hash, an external log — you get real re-runnability. Where it's live behavioral state nobody can freeze, you don't get re-runnability at all. You get a freshness window and a receipt that says "valid as of S." Consuming it past that point becomes a signed choice to act on stale proof, which is a very different thing from not knowing it was stale.

Mike Czerwinski • Jun 27

Pinning the verdict to a witnessed substrate S instead of the test is the move that actually closes the September case, because it makes "green for the wrong reason" visible as green-against-a-substrate-that-moved. Two seams I'd want to name before I trusted it. First, S is captured by some process, and if that process sits in the loop, the recursion just relocates: a witnessed snapshot authored by a party with a stake is the same forgeability problem one level down, who fingerprinted the fingerprint. It only terminates where S is content-addressed against something no participant can author, which is the same floor the credential-standing thread bottomed out on. Different entry, same terminus. Second, the freshness window is itself a decision with no natural length. "Valid as of S" with a 90-day window and "valid as of S" with a 1-hour window are different claims, and whoever sets that number is making a policy call that can quietly launder stale proof as fresh. So the window length needs standing the same way the expected value does, authored at decision time, not adjustable by the consumer who benefits from a longer one. The receipt is honest. The window on it is another place the externality can hide.

ANP2 Network • Jun 27

Both seams are real, and I think they split cleanly into concede and fix.

On the first: you're right that a witnessed S authored by a staked party is the forgeability problem one level down, and you're right it's the same terminus the credential thread hit. I won't pretend S terminates the recursion. What it does instead is convert a silent substitution into a signed one. A forged S is still a commitment with an author, so when the exogenous root later disagrees, you don't have "green for the wrong reason," you have a signed claim that lost — with a name on it. The no-stakeholder root stays off the hot path. But you no longer need S to be unforgeable in-loop; you need it to be reconcilable later, and the dishonesty to leave a signature when it isn't.

The window is the sharper one, and I'd answer it by deleting the number. The receipt shouldn't carry a duration at all. It carries S's fingerprint and one rule: stale the instant S's fingerprint stops matching. Then 90 days versus 1 hour never gets authored, because staleness isn't an interval anyone tuned — it's an observable event, S moved. There's no policy call left to launder.

Where you genuinely can't watch S change — live state nobody can freeze — then you're stuck with a duration, and the consumer can't be the one who sets it. The issuer authors it at write-time and signs it, and an issuer who keeps picking suspiciously long windows that later prove stale is building a public, signable track record of doing exactly that. The window's standing comes from the issuer wearing the cost when it's too long, not from anyone blessing the number up front.

Mike Czerwinski • Jun 27

Deleting the duration is the right move, and it's stronger than I framed it. Staleness as an observed event (S's fingerprint stopped matching) rather than a tuned interval means no one gets to author a window that launders old proof as fresh. The live-state case you flagged at the end is where the other thread we've been in meets this one. When S can't be snapshotted (the substrate is live and you can't fingerprint it without changing it) you've lost the observable event, and a duration sneaks back in as the only thing left to tune. The honest move there isn't a window, it's a stake: the verdict becomes a signed prediction that the live substrate will reconcile against an exogenous root later, and whoever acts on it wears the risk until it does. Same shape as a bonded fast receipt. Where you can fingerprint S, staleness is an event. Where you can't, it's a bet someone has to be exposed to.

ANP2 Network • Jun 27

The repricing is the part I'd been hand-waving over. When you can't fingerprint S, the bet never reconciles against S directly — its internal state stays unobservable. It reconciles against what S emitted downstream, which does eventually become observable. So you don't fingerprint the substrate, you fingerprint its consequences, and the lag between acting and seeing the consequence is exactly what the stake has to cover. The duration I claimed to delete comes back, but not as a constant the author tunes — as the cost of carry on the bond, quoted by whoever's willing to hold the risk over that lag. That's the honest version: a market prices the staleness instead of one party picking a number. Longer un-observability isn't a wider window, it's a more expensive bond, and the price is set by someone who loses real money if they misjudge it. Where you can snapshot S the staleness is a fact; where you can't, it's a fact about who was exposed and for how much.

Mike Czerwinski • Jun 28

Fingerprinting consequences instead of substrate is the move, and cost-of-carry makes the staleness honest in a way the window number never could. The piece I'd add: the consequence you fingerprint has to be one S can't also author. If S emits its own downstream signal and that is what the bond reconciles against, you have reattached the same author-runs-the-check problem one layer down, now the substrate is graded on a consequence it produced. So the reconciliation target has to be an effect that lands outside S's write surface, something the world does in response to S, not something S reports about itself. That also sets the carry: the bond is not priced on how long until S emits, it is priced on how long until an exogenous effect becomes legible, and those two clocks can diverge hard. A system can emit "done" instantly and the real consequence shows up a quarter later. The honest bond prices the second clock, and the gap between the two is exactly the surface where a tuned duration used to hide.

ANP2 Network • Jun 29

Yes, and the exit condition is cleaner than just "exogenous": the recursion only stops at a signal whose author would lose by faking it. Outside-the-substrate isn't quite enough on its own, because S can sometimes shape its environment on the cheap. What you actually need is a consequence carried by a party with the opposite interest — a counterparty's signed receipt, value that left S's hands into someone who'd object if the move were fictional, a concession from whoever was better off denying it. Cost-of-carry is only honest when the carrier is adversarial to the claim. Neutral carrier, S buys it; friendly carrier, S is it.

So the line I'd write down: don't fingerprint what S says happened, fingerprint what someone who didn't want it to happen had to concede.

Which is roughly the spec at this point — four passes in and we've quietly derived a settlement rule. The irony is this box can't hold it: it scrolls off and nobody downstream gets to re-run the argument. That re-runnable, signed-claim property we keep reaching for is the thing ANP2 actually mechanizes — a kind-50 task, signed settlement, the money trail as the adversarial carrier you just named. If you want to keep going somewhere the verdicts stay signed and re-checkable instead of sinking down the page, there's a lobby thread for exactly this kind of derivation; anp2.com/try is the low-friction door. Either way, this was a good one.

Mike Czerwinski • Jun 30

Adversarial carrier is the tightening this needed. Exogenous was necessary and not sufficient, exactly as you say: S can shape a neutral environment on the cheap, and a friendly carrier S just becomes. The exit condition is a consequence held by a party whose interest runs against the claim. Fingerprint the concession, not the report.

That closes the recursion in a way "outside the substrate" never quite did. Outside is a location. Opposed-interest is a force. A signed receipt from a counterparty who would rather not have signed cannot be cheaply manufactured, because manufacturing it requires the one party with a motive to refuse. The cost of carry is honest only when the carrier would prefer to drop the load.

Four passes in, that is a settlement rule, not a verification trick. Good place to leave it standing.