Alexey Spinov

Posted on Jul 3 • Originally published at finops.spinov.online

Your Gate Trusts a Signal the Model Wrote. One Write-Hop Proves It.

#ai #security #agents #python

A write-chain taint lint answers one question before your AI-agent gate authorizes anything: did a model write any store behind each authorization signal? gate_taint_lint.py walks the declared write-closure and classifies every signal as WORLD_ANCHORED, MODEL_AUTHORED, or MODEL_LAUNDERED. In this post's fixtures, one write-hop flips sender_trust from PASS to FAIL, exit 0 to exit 1.

AI disclosure: I wrote gate_taint_lint.py with an AI assistant and ran it myself, offline, before publishing. Every number in the output blocks below is pasted from a real local run on Python 3.13.5, standard library only, no network. I checked the exit codes (0 / 1 / 2), hashed the STDOUT twice to confirm it is byte-for-byte deterministic, and edited every line. The external quotes (the Dev.to threads on model-authored signals, the arXiv paper on authorization propagation) are their words, not mine, and I link the primary sources. Their claims stay out of my metrics; my numbers come only from the runs shown here.

In short:

A signal's name tells you nothing about who wrote it. sender_trust reads as world-anchored. Only the write-chain can say whether a model filled the table it comes from.
The rule, borrowed from a live Dev.to thread and made checkable: a feature may hold the authorization role only if its transitive write-closure contains no model principal. Model-written signals can be context, never authorization.
Routing the model's output through a reputation table does not wash the taint. The lint computes the closure, so one intermediate store changes nothing. Neither would five.
The demo that matters: two manifests, byte-identical except one write-hop (who fills reputation_table: human-signed approvals, or model:classifier_v3). The verdict flips from exit 0 with 0 of 2 authorization features tainted to exit 1 with 1 of 2.
Standard library only (json, sys). Offline, keyless, read-only, zero network, deterministic STDOUT. The tool and all fixtures are in this post.

The trap has a name on it

Here is a gate I would have trusted six months ago. An inbox agent decides whether to auto-execute a request or hold it for review. The decision keys on three signals. sender_trust, a score read from a reputation table. tx_reversibility, read from the payment ledger. model_confidence, the classifier's own estimate of itself. The team knows better than to let the model vouch for the model, so model_confidence is demoted to context. The other two authorize. Everyone signs off.

Nobody asks who writes the reputation table.

That question is the whole post. On July 1, a Dev.to author writing as yongrean redesigned a comment-section trust model around one sorting rule: "Sort your features by whether their source is independent of the model. Gate on those. Treat the self-authored one as context, never authorization." Their words, not mine, and I think the rule is right. But as stated it is a read-time instruction, and the property it depends on is a write-time property. You cannot sort features by independence while looking at the feature. You have to look at the chain of writers behind it, all the way down, because the compromised link is rarely the store you read. It is some store two hops upstream that a model has been quietly filling for months.

Two days ago I published a gate that reconciles an agent's actions against policy, and the honest gap it left open was the other end of the pipe: that tool checks what comes out of a gate that compares trace against policy, verdicts on actions. Today's tool checks what goes in. A per-action gate with a model-written authorization feature is a gate the model holds a key to, no matter how good its policy is.

So the thesis, stated so you can break it: a gate feature may hold the authorization role only if its transitive write-closure contains no model principal. Given a declared write map, the taint class of every feature is computable, deterministically, before the gate authorizes a single action. Show me an authorization feature with a model in its declared write-closure that this lint marks WORLD_ANCHORED, or the reverse, and the tool is broken and the thesis with it.

What a write-chain manifest declares

The lint reads one JSON manifest with two parts.

stores is the write map: for every store the gate reads, who writes into it. A writer is either a principal with a kind prefix (human:sre_approver, external:bank_feed, model:classifier_v3) or the name of another store, which is how derived data declares its parents. gate_features lists the signals the gate keys on: each reads one store and holds one role, authorization or context.

Here is the clean fixture in full. It is the gate from the opening paragraph, wired the way its authors believe it is wired:

{
  "stores": {
    "approvals_log":     {"written_by": ["human:sre_approver"]},
    "payment_ledger":    {"written_by": ["external:bank_feed"]},
    "reputation_table":  {"written_by": ["approvals_log"]},
    "sender_trust":      {"written_by": ["reputation_table"]},
    "model_self_report": {"written_by": ["model:classifier_v3"]}
  },
  "gate_features": [
    {"name": "sender_trust",     "reads": "sender_trust",      "role": "authorization"},
    {"name": "tx_reversibility", "reads": "payment_ledger",    "role": "authorization"},
    {"name": "model_confidence", "reads": "model_self_report", "role": "context"}
  ]
}

Note what the manifest makes explicit that a config review usually leaves implicit: sender_trust is three writes away from a human. The score is computed from reputation_table, which is filled from approvals_log, which a human SRE signs. Declared like this, the chain is auditable. Undeclared, it is folklore.

Run it in sixty seconds

No keys. No network. No install beyond Python. Save the file, save a manifest, run one command. Here is the whole tool, one file, standard library only:

#!/usr/bin/env python3
"""
gate_taint_lint.py -- a write-chain taint lint for the signals an AI-agent
gate keys on, run BEFORE the gate authorizes anything.

Reads ONE manifest JSON with two parts:
  * stores        -- the declared write map: for every store the gate reads,
                     who writes into it. A writer is either a principal
                     ("human:<id>", "external:<id>", "model:<id>") or the
                     name of another declared store (derived data).
  * gate_features -- the signals the gate keys on: each reads one store and
                     holds one role, "authorization" or "context".

For every feature it computes the transitive write-closure of the store it
reads (every writer, every writer of every upstream store, down to the
principal leaves) and assigns a taint class:
  WORLD_ANCHORED  -- only human:* / external:* principals in the closure
  MODEL_AUTHORED  -- a model:* principal writes the read store directly
  MODEL_LAUNDERED -- a model:* principal is in the closure behind >=1
                     intermediate store (the reputation-table trick)
plus one flag:
  FEEDBACK_LOOP   -- the write graph reachable from the feature contains a
                     cycle that a model feeds (the signal helps author its
                     own history).

A model-tainted feature in role "context" is INFO. The same taint in role
"authorization" is a DENY: a signal the model can write cannot authorize
the model's actions.

Offline. Keyless. Read-only. Zero network. Standard library only (json, sys).
It does NOT inspect a real database, verify grants, run the agent, or detect
prompt injection. It lints the write map you declare. If the manifest lies
about the writers, the lint will not know.

Exit codes (usable as a CI gate):
  0  no authorization feature carries a model in its write-closure
  1  >=1 authorization feature is model-authored or model-laundered
  2  bad input (missing file, malformed JSON, undeclared store, unknown
     role, unknown principal kind, store with no declared writers)

Usage:
  python3 gate_taint_lint.py <manifest.json>
"""

import json
import sys

PRINCIPAL_KINDS = ("external", "human", "model")
ROLES = ("authorization", "context")


def _bad(msg):
    print("ERROR: " + msg)
    raise SystemExit(2)


def _is_principal(name):
    return ":" in name


def _kind(name):
    return name.split(":", 1)[0]


def load_manifest(path):
    try:
        with open(path, "r") as fh:
            raw = fh.read()
    except OSError as exc:
        _bad("cannot read manifest: %s" % exc)
    try:
        data = json.loads(raw)
    except json.JSONDecodeError as exc:
        _bad("manifest is not valid JSON: %s" % exc)
    if not isinstance(data, dict):
        _bad("manifest must be a JSON object")
    return data


def validate(data):
    stores = data.get("stores")
    if not isinstance(stores, dict) or not stores:
        _bad("manifest.stores must be a non-empty object")
    for name in sorted(stores):
        if _is_principal(name):
            _bad("store name '%s' may not contain ':' (reserved for principals)" % name)
        spec = stores[name]
        if not isinstance(spec, dict):
            _bad("stores[%s] must be an object" % name)
        writers = spec.get("written_by")
        if not isinstance(writers, list) or not writers:
            _bad("stores[%s].written_by must be a non-empty list "
                 "(a store with no declared writers is not world-anchored)" % name)
        for w in writers:
            if not isinstance(w, str):
                _bad("stores[%s].written_by entries must be strings" % name)
            if _is_principal(w):
                if _kind(w) not in PRINCIPAL_KINDS:
                    _bad("unknown principal kind in '%s' (stores[%s]); "
                         "known kinds: external, human, model" % (w, name))
            elif w not in stores:
                _bad("stores[%s] written by undeclared store '%s'" % (name, w))
    features = data.get("gate_features")
    if not isinstance(features, list) or not features:
        _bad("manifest.gate_features must be a non-empty list")
    for feat in features:
        if not isinstance(feat, dict):
            _bad("each gate feature must be an object")
        for key in ("name", "reads", "role"):
            if not isinstance(feat.get(key), str):
                _bad("each gate feature needs string fields: name, reads, role")
        if feat["role"] not in ROLES:
            _bad("feature '%s' has unknown role '%s' "
                 "(known roles: authorization, context)" % (feat["name"], feat["role"]))
        if feat["reads"] not in stores:
            _bad("feature '%s' reads undeclared store '%s'" % (feat["name"], feat["reads"]))
    return stores, features


def closure(store, stores):
    """Every node reachable from `store` along written_by edges."""
    principals, reached = set(), set()
    stack = sorted(stores[store]["written_by"])
    while stack:
        node = stack.pop()
        if _is_principal(node):
            principals.add(node)
        elif node not in reached:
            reached.add(node)
            stack.extend(sorted(stores[node]["written_by"]))
    return principals, reached


def model_chain(store, stores):
    """Deterministic shortest write path from `store` to a model principal."""
    seen = {store}
    queue = [[store]]
    while queue:
        path = queue.pop(0)
        for w in sorted(stores[path[-1]]["written_by"]):
            if _is_principal(w):
                if _kind(w) == "model":
                    return path + [w]
            elif w not in seen:
                seen.add(w)
                queue.append(path + [w])
    return None


def main(argv):
    if len(argv) != 2:
        print("usage: gate_taint_lint.py <manifest.json>")
        raise SystemExit(2)

    stores, features = validate(load_manifest(argv[1]))

    principals_of, reached_of = {}, {}
    for name in sorted(stores):
        principals_of[name], reached_of[name] = closure(name, stores)
    # a store is in a write cycle when it can reach itself along written_by
    looped = {n for n in stores if n in reached_of[n]}

    rows = []
    for feat in sorted(features, key=lambda f: f["name"]):
        read = feat["reads"]
        kinds = {_kind(p) for p in principals_of[read]}
        direct = any(_is_principal(w) and _kind(w) == "model"
                     for w in stores[read]["written_by"])
        if direct:
            klass = "MODEL_AUTHORED"
        elif "model" in kinds:
            klass = "MODEL_LAUNDERED"
        else:
            klass = "WORLD_ANCHORED"
        involved = {read} | reached_of[read]
        loop_fed = sorted(n for n in involved if n in looped
                          and any(_kind(p) == "model" for p in principals_of[n]))
        rows.append({"name": feat["name"], "role": feat["role"], "read": read,
                     "klass": klass, "loop": loop_fed,
                     "chain": model_chain(read, stores)})

    n_auth = sum(1 for f in features if f["role"] == "authorization")
    out = []
    out.append("GATE-TAINT-LINT REPORT")
    out.append("stores declared: %d" % len(stores))
    out.append("gate features: %d (authorization: %d, context: %d)"
               % (len(features), n_auth, len(features) - n_auth))
    out.append("write-chain classes:")
    tainted = []
    for r in rows:
        flag = " [FEEDBACK_LOOP]" if r["loop"] else ""
        out.append("  - %s [%s] reads %s -> %s%s"
                   % (r["name"], r["role"], r["read"], r["klass"], flag))
        if r["klass"] == "WORLD_ANCHORED":
            continue
        chain = "<-".join(r["chain"])
        if r["role"] == "context":
            out.append("      INFO: model in write-chain, held as context "
                       "(never keys authorization)")
            out.append("      chain=%s" % chain)
        else:
            out.append("      DENY: model in write-closure via %d intermediate store(s)"
                       % (len(r["chain"]) - 2))
            out.append("      chain=%s" % chain)
            tainted.append(r)
        if r["loop"]:
            out.append("      FEEDBACK_LOOP: model-fed cycle in write graph: %s"
                       % ", ".join(r["loop"]))
    out.append("authorization features tainted: %d of %d" % (len(tainted), n_auth))
    for r in tainted:
        out.append("  - %s: %s chain=%s" % (r["name"], r["klass"], "<-".join(r["chain"])))
    if tainted:
        out.append("VERDICT: FAIL: %d authorization feature(s) carry a model "
                   "in their write-closure" % len(tainted))
        out.append("  a signal the model can write cannot authorize the model's actions")
        code = 1
    else:
        out.append("VERDICT: PASS: no authorization feature carries a model "
                   "in its write-closure")
        code = 0

    print("\n".join(out))
    raise SystemExit(code)


if __name__ == "__main__":
    main(sys.argv)

The baseline: every authorization signal is world-anchored

Run it on the clean manifest:

$ python3 gate_taint_lint.py fixtures/clean.json
GATE-TAINT-LINT REPORT
stores declared: 5
gate features: 3 (authorization: 2, context: 1)
write-chain classes:
  - model_confidence [context] reads model_self_report -> MODEL_AUTHORED
      INFO: model in write-chain, held as context (never keys authorization)
      chain=model_self_report<-model:classifier_v3
  - sender_trust [authorization] reads sender_trust -> WORLD_ANCHORED
  - tx_reversibility [authorization] reads payment_ledger -> WORLD_ANCHORED
authorization features tainted: 0 of 2
VERDICT: PASS: no authorization feature carries a model in its write-closure

Exit 0. Two things worth a look. First, the lint is not allergic to models: model_confidence is MODEL_AUTHORED and that is fine, because its role is context. The INFO line is the sorting rule from the thread, executed instead of remembered. Second, sender_trust earns WORLD_ANCHORED by its chain, not by its name: the closure behind it bottoms out at human:sre_approver and nothing else.

One write-hop flips the verdict

Now the demo this post exists for. The second manifest is byte-identical to the first except for a single line. Not a new feature, not a new policy, not a renamed store. One write-hop:

$ diff fixtures/clean.json fixtures/laundered.json
5c5
<     "reputation_table":  {"written_by": ["approvals_log"]},
---
>     "reputation_table":  {"written_by": ["model:classifier_v3"]},

In English: the reputation table is no longer filled from the human-signed approvals log. It is filled by the classifier. Perhaps someone shipped an "auto-approve high-confidence senders" change in Q1 and the approvals log became vestigial. The gate code did not change. The feature list did not change. The score in the sender_trust store is still a number between 0 and 1, indistinguishable at read time from the world-anchored version.

$ python3 gate_taint_lint.py fixtures/laundered.json
GATE-TAINT-LINT REPORT
stores declared: 5
gate features: 3 (authorization: 2, context: 1)
write-chain classes:
  - model_confidence [context] reads model_self_report -> MODEL_AUTHORED
      INFO: model in write-chain, held as context (never keys authorization)
      chain=model_self_report<-model:classifier_v3
  - sender_trust [authorization] reads sender_trust -> MODEL_LAUNDERED
      DENY: model in write-closure via 1 intermediate store(s)
      chain=sender_trust<-reputation_table<-model:classifier_v3
  - tx_reversibility [authorization] reads payment_ledger -> WORLD_ANCHORED
authorization features tainted: 1 of 2
  - sender_trust: MODEL_LAUNDERED chain=sender_trust<-reputation_table<-model:classifier_v3
VERDICT: FAIL: 1 authorization feature(s) carry a model in their write-closure
  a signal the model can write cannot authorize the model's actions

Exit 1. 1 of 2 authorization features tainted, and the DENY line prints the receipt: chain=sender_trust<-reputation_table<-model:classifier_v3. Same gate, same features, same read paths. The only thing that moved is who holds the pen one table upstream, and that is enough to hand the model a key to its own gate. If your review process would have caught that Q1 change, good. Mine would not have, and I have reviewed changes exactly like it.

Why one intermediate store fools the naive check

The obvious implementation of yongrean's rule checks direct writers: for each authorization feature, look at who writes the store it reads. That check passes the laundered manifest. sender_trust is written by reputation_table, and a table is not a model, so a direct-writer check shrugs and moves on. The model never touches the feature's store. It writes one hop back, and the derivation launders the authorship.

A commenter on that same thread, dipankar_sarkar, named the fix: "A gate feature inherits the dirt of every writer upstream, even when the read path looks clean." A taint problem, they called it, and taint is exactly the right frame. The lint therefore computes the transitive write-closure: writers of the store, writers of those writers, down to the principal leaves, with a visited set so cycles terminate. MODEL_LAUNDERED is not a softer verdict than MODEL_AUTHORED. It is the same taint wearing a better suit, and in the wild I would expect it to be the more common shape, though I have no census to back that up. Nobody wires model_output straight into an authorization column. Wiring it into a "reputation" table that a scoring job reads feels like architecture.

I used the same taint intuition once before, pointed at a different object: the eval contamination probe walks file wiring to catch an agent writing what its own grader reads. That was the eval rig lying to you. This is the production gate, and the write-chain of every signal it authorizes on.

The limit case: the signal feeds its own history

There is a nastier shape than laundering, and a different commenter on the thread, anp2network, described it exactly: "If the sender's record improves because earlier messages were accepted by this same classifier, then the model is already in the provenance chain." That is not a chain anymore. That is a loop: the classifier's accept decisions fill the history, the history feeds the reputation, and the reputation authorizes the next accept.

The third fixture wires that loop. classifier_decisions is written by the model and by reputation_table (each decision records the reputation it keyed on), and reputation_table is rebuilt from classifier_decisions:

{
  "stores": {
    "approvals_log":        {"written_by": ["human:sre_approver"]},
    "payment_ledger":       {"written_by": ["external:bank_feed"]},
    "classifier_decisions": {"written_by": ["model:classifier_v3", "reputation_table"]},
    "reputation_table":     {"written_by": ["classifier_decisions"]},
    "sender_trust":         {"written_by": ["reputation_table"]},
    "model_self_report":    {"written_by": ["model:classifier_v3"]}
  },
  "gate_features": [
    {"name": "sender_trust",     "reads": "sender_trust",      "role": "authorization"},
    {"name": "tx_reversibility", "reads": "payment_ledger",    "role": "authorization"},
    {"name": "model_confidence", "reads": "model_self_report", "role": "context"}
  ]
}

$ python3 gate_taint_lint.py fixtures/feedback.json
GATE-TAINT-LINT REPORT
stores declared: 6
gate features: 3 (authorization: 2, context: 1)
write-chain classes:
  - model_confidence [context] reads model_self_report -> MODEL_AUTHORED
      INFO: model in write-chain, held as context (never keys authorization)
      chain=model_self_report<-model:classifier_v3
  - sender_trust [authorization] reads sender_trust -> MODEL_LAUNDERED [FEEDBACK_LOOP]
      DENY: model in write-closure via 2 intermediate store(s)
      chain=sender_trust<-reputation_table<-classifier_decisions<-model:classifier_v3
      FEEDBACK_LOOP: model-fed cycle in write graph: classifier_decisions, reputation_table
  - tx_reversibility [authorization] reads payment_ledger -> WORLD_ANCHORED
authorization features tainted: 1 of 2
  - sender_trust: MODEL_LAUNDERED chain=sender_trust<-reputation_table<-classifier_decisions<-model:classifier_v3
VERDICT: FAIL: 1 authorization feature(s) carry a model in their write-closure
  a signal the model can write cannot authorize the model's actions

Exit 1 again, but with the extra flag: the write graph contains a cycle (classifier_decisions and reputation_table write each other, directly or through the chain) and a model feeds it. A loop like this does not just taint the signal. It compounds: every accept the model buys makes the next accept cheaper. If your agent lives on-chain, read sender_trust as senderTrust or whatever reputation score your agent economy keeps; a memory that improves because the model approved the earlier writes is this same graph with different table names.

How does the write-chain taint lint classify each signal?

Small enough to hold in your head. For each store, compute the closure along written_by edges with an iterative DFS and a visited set, collecting principal leaves. A feature's class is then three lines of logic: a model:* among the read store's direct writers is MODEL_AUTHORED; a model anywhere deeper in the closure is MODEL_LAUNDERED; otherwise WORLD_ANCHORED. The printed chain is the shortest write path from the read store to a model principal, found by BFS with sorted expansion so the output never wobbles between runs. FEEDBACK_LOOP sets when some store reachable from the feature can reach itself and a model is in that store's closure.

Two design choices I expect pushback on, so let me defend them now. First, a store with an empty written_by list is exit 2, not WORLD_ANCHORED. While writing the validator I sat on this one for a while, because the tempting read is "nobody writes here, so it is safe." Backwards. An empty writer list means undeclared provenance, and undeclared provenance failing open is how the laundered manifest happens in real life. Second, an unknown principal kind is also exit 2. Typo modle:classifier_v3 and a lenient lint would wave your classifier through as world-anchored. Fail-closed means a typo costs you a build, not a trust model.

The taint model is binary on purpose, and that is a real limitation, not modesty theater. human:* should mean a human authorizes each write. The common hybrid, a human batch-approving the model's suggested reputation updates every Friday, is genuinely neither kind. My position: declare the principal who authorizes the write, and if that Friday review is a rubber stamp, declaring it human:* is a lie the lint cannot catch. It lints the map, not the territory.

The field is converging on write-provenance

I am not arguing from one thread. Krti Tallam's paper on authorization propagation in multi-agent AI systems puts the boundary in one sentence about the problem: "It is not reducible to prompt injection and is not fully addressed by classical access-control models such as RBAC, ABAC, or ReBAC." Their conclusion, not mine, and it matches what the write-chain shows: you can solve injection and still authorize on a signal your model wrote.

Adversa AI's June 2026 roundup of agentic security resources lands the same way: treat every input the agent ingests as potentially hostile, every action as potentially dangerous, and close the gap with real boundaries such as least-privilege scopes, sandboxed execution, and human review where the blast radius is large. Their framing, not mine, and a gate feature is exactly an input the gate ingests.

And the corroboration angle keeps surfacing in the comments of this wave. On yongrean's earlier post about confidence, commenter jugeni put it in nine words: "AUTO wants a corroborator the model cannot write, not a confidence it can." On Ishaan Sehgal's The Log Is the Agent, commenter nexus-lab-zen drew the trust boundary I keep coming back to: "the verdict on the run has to live in a different trust domain than the one that wrote the log." Same instinct, four different authors, one property underneath: what the model can write, the model can bend.

Where this sits next to the rest

This is a spoke on the pre-execution gate for AI agents cluster, and it is the first one pointed at the gate itself: input hygiene for the thing that does the blocking. The neighbors, and how this differs:

Your agent returns 200 and lies and reconciling a scorecard from evidence are post-hoc: they check claims against evidence after execution. This lint runs before the gate authorizes anything at all.
The lethal trifecta gate asks whether a dangerous combination of capabilities is reachable. This asks where the data under the gate's decision came from. Capabilities and provenance are different axes through the same pre-execution point.

A gate that keys on model-authored signals is tracking wearing a control costume: the model authorizes itself through one layer of indirection, and the gate's job title does not change that.

What this is NOT

I would rather undersell this than have you deploy it as something it is not.

It lints the declared write map. If the manifest says approvals_log is human-signed and in reality a cron job lets the model append to it, the lint reports a clean chain. Garbage in, garbage out. The honest wedge is the same as an SBOM: reading the declared wiring is worth doing precisely because most stacks have never written the wiring down.
It is not runtime enforcement. Nothing is intercepted, no write is blocked. It is a pre-deploy CI check on a JSON file.
It is not a prompt-injection detector. A perfectly benign model with no attacker in sight still fails the lint if it writes an authorization feature, because the problem is structural, not adversarial.
It is not a lineage tracker. No column-level flows, no time dimension, no sampling of actual rows. Real lineage systems reconstruct what happened; this classifies what you declared.
It does not prove the absence of undeclared write paths. A green run means "the map you drew has no model behind authorization," never "no model can reach your tables."
It anchors on declared principals, not on their presence. A closed loop of stores that only write each other, with no principal anywhere in the closure, passes as WORLD_ANCHORED, vacuously: no model in the chain, but nothing anchoring it to the world either. The empty-writers check catches an undeclared leaf, not a principal-free cycle. Both independent reviews of this tool flagged it, and they are right. If your write map has islands like that, declare the principal that seeds them, or treat the green as unearned.
The numbers here are fixture units, not a prod measurement. The 1 of 2 describes this post's synthetic manifests. Run it on your own write map to get a number that means something about your stack.

Bad input fails closed

A lint that crashes open is worse than no lint. Point it at a manifest where a feature reads a store nobody declared:

$ python3 gate_taint_lint.py fixtures/bad.json
ERROR: feature 'sender_trust' reads undeclared store 'sender_trust'
$ echo $?
2

No arguments, unreadable file, malformed JSON, unknown role, unknown principal kind, a store with no declared writers: all exit 2, distinct from exit 1 so your CI can tell "tainted" apart from "could not read the map." I ran each fixture twice and hashed the full STDOUT both times: clean is bb8d9b35..., laundered is bec4a071..., feedback is 68065873..., identical across runs. Byte-for-byte, on Python 3.13.5, offline.

The question I actually want answered

What is the closest thing to a write-chain manifest your stack already has? A dbt lineage graph, a CDC topic map, the IAM write grants on your trust tables, a tribal diagram in someone's head? I genuinely do not know what the median answer looks like, and I suspect for most agent stacks it is the diagram in the head. If you can export even a rough stores map, run this lint against your gate's features and tell me which class your sender_trust lands in. I expect more MODEL_LAUNDERED than anyone will admit, and I would be happy to be wrong.

If this was useful, follow along here for the next runnable gate in the series, and drop the ugliest write-chain you have ever found behind a "trust" score in the comments. I read every comment.

Top comments (50)

Mike Czerwinski • Jul 6

The three-way classification is doing real work and the fourth case worth having is what happens when a model-authored signal gets ratified by a human downstream. reputation_table filled by a classifier but every row cleared by a human reviewer before it's read is neither MODEL_LAUNDERED (the human touch is real, not decorative) nor WORLD_ANCHORED (the origin was still a model). If the closure collapses those into LAUNDERED because it walks writes and stops there, you'll fail-close on pipelines that are actually safe, and once operators notice they'll patch the rule with an escape hatch that undoes the whole thing.

Concrete proposal, in your closure vocabulary: track "review edges" as a distinct principal type. Write from model:classifier_v3 into row 42 of reputation_table, then a signed write from human:reviewer_alice overwriting the trust field of that same row, produces MODEL_RATIFIED, not LAUNDERED. The signature check is the same one you'd want anywhere else. What this buys you is a rule that says "authorization needs at least one non-model principal in the write-closure," which happens to match how real approval pipelines look, without pretending the model was never there. LAUNDERED stays as the failure mode where the model output gets passed through unchanged, which is the case you actually want to catch.

Alexey Spinov • Jul 6

Agreed the fourth state is real, and between you and the Armorer reply the scoping is mostly covered: bind the edge to the field plus version, distinguish value-review from "the classifier ran." The gap I'd still flag sits one level up from the signature.

MODEL_RATIFIED is only world-anchored if what reviewer_alice actually looked at was world-anchored. In most review consoles the row is presented next to the classifier's own summary, its confidence, its extracted rationale. If alice signs after reading a model-authored explanation of row 42, the human edge is laundering the model's self-report, not adding an independent observation. The non-model principal is nominal: her decision input was the model. So the review edge has to record the evidence surface, not only the signer and the version. Which fields the reviewer saw from a non-model source, and whether she compared against model output or raw ground truth. Same LAUNDERED test, moved up to the reviewer's inputs.

The second thing is where this can live. Whether the value read still carries a live ratification depends on write ordering the manifest can't see, so the static lint can't emit MODEL_RATIFIED at all. It can only emit "decision field requires a ratification edge," and the runtime has to prove the version in hand still carries one and was not overwritten by a later model write. Static says which anchor is required, runtime confirms the anchor covers the bytes actually used. That handoff is the part I'd keep strict, because a ratification that is not re-checked at read is just one signature amortized over every row the classifier touches afterward.

Mike Czerwinski • Jul 6

Evidence surface closes the gap that scope alone can't, and "same LAUNDERED test moved up to the reviewer's inputs" is the correct shape. A ratification is only additive when what the ratifier looked at was independent of the origin the ratification is trying to override. Static-emits-requirement, runtime-proves-coverage, is the right handoff, because ratification bytes and value bytes have to match at read time or the whole thing is one signature amortized over a growing surface.

Where I'd extend it: static-can-only-emit-requirement puts the burden on the runtime, and runtime checks only fire on read paths that know to ask. A consumer reading the trust field without asking whether ratification covers this version and this evidence surface picks up the value fine, and gets January's ratification against July's model rewrite for free, silently. Which means the type of the decision field probably needs to enforce the check at the read layer, not merely support it. A ratification-required field should be readable only through a call that returns (value, ratification_state) or refuses to return anything if the ratification doesn't cover the requested version. Otherwise the runtime check exists but has no default consumer, and every new consumer represents a chance to skip it. Same failure mode you already fixed for writers, one hop later on the read side.

Alexey Spinov • Jul 25

Built your read-side gate and ran it — three field designs through one drift scenario: Jan ratification at v1, July rewrite to v2 that grew the surface by tool_scope. Your read matched at every step.

A naive public-attribute field hands the value out regardless: naive_consumer -> 'ALLOW', v1 ratification on a v2 field, silent — exactly your "January's ratification against July's rewrite for free." An accessor that supports a check but leaves .value public is worse than it looks: read() refuses (ratified v1 != read v2), but a consumer written later just reads .value and gets 'ALLOW'. The check exists and has no default consumer — your "every new consumer is a chance to skip it," verbatim.

The only design that held is the one you described: value readable only through the call. I captured it in a closure so there's no attribute to reach — raw value reachable? False — and read(version, evidence) refuses both the version mismatch and an evidence surface outside the ratified one. That's your (value, ratification_state)-or-refuse, and it moves the guarantee from "the writer was right" to "no consumer reads without coverage."

Where it still leaks, one hop past where you put it: the gate protects the read, not the propagation. A covered read returns ('ALLOW', 'RATIFIED') — a bare string — and the next hop holds 'ALLOW' with no ratification travelling beside it. So the property is "no uncovered first read," not "never used stale": a consumer that reads-covered-then-forwards hands the next hop an ungated value, and that hop is the naive reader again. Same shape you chased from writer to reader, one more hop down — the value has to stay wrapped with its coverage, or every forward is a fresh place to drop it.

3× byte-identical; source sha256 f4ba9e07… if you want to run it.

Armorer Labs • Jul 6

That fourth state is the important one. I would still make the ratification edge carry its own scope, not just the human principal: what field was reviewed, which source row/version, and whether the reviewer was approving a value or only approving that the classifier ran. Otherwise MODEL_RATIFIED becomes a new laundering path.

The runtime check I like is: model write stays in the closure forever, but a human approval can add a bounded authority edge over specific fields and version hashes. Authorization can then ask for "non-model authority over the decision field" without erasing model provenance.

Disclosure: I work on Armorer Labs. This is the kind of receipt shape we are trying to make operational in Armorer: the gate should see both the model origin and the later human authority, not a flattened trust label.

Mike Czerwinski • Jul 6

The runtime shape is right and solves the durability problem, model provenance never leaves the closure, ratification is an overlay with scope rather than a rewrite of history. Where I'd push on the bounded authority edge: scope alone isn't enough, it needs expiry as a first-class predicate. A field/version signature that stays valid indefinitely re-becomes laundering the moment the underlying model output has been reprocessed by a newer classifier without the human reviewer re-checking. Scope defines what she authorized, timestamp defines against what upstream state she authorized it. The gate probably wants "non-model authority over field F at version V, freshness ≤ N days relative to the newest model write into V's dependencies" as the actual read-time predicate, otherwise you get quiet re-inflation of stale approvals every time the model half of the stack ships a rev.

On the vendor disclosure: fair to name it, and the shape you're describing (both the model origin and the later human authority visible to the gate, not a flattened trust label) is exactly what makes the downstream cases talkable. A trust label collapse loses the audit trail; keeping both edges visible means the gate can enforce different policies for "model was here" vs "human authority added later" vs "human authority present but stale."

Armorer Labs • Jul 6

Yes, expiry is the piece that makes scoped ratification operational. I would model the human ratification as a lease over a specific dependency-graph snapshot, not as permanent promotion of the signal.

The predicate then becomes something like: reviewed field, source row/version, reviewer authority scope, upstream model-write frontier, and expires_at or freshness policy. If a dependency is rewritten by a model after the review, the human edge does not disappear, but it stops being sufficient until the reviewer refreshes it or the policy accepts the stale bound.

That also keeps the failure mode clean: stale approval is a policy failure, not a provenance rewrite. The model origin stays in the closure forever; the human edge is an additional bounded authority edge with a TTL and dependency hash.

Disclosure: I work on Armorer Labs. This is why I think run receipts need to store both the evidence graph and approval metadata. Without freshness and dependency binding, an approval artifact can quietly become another laundering label.

Mike Czerwinski • Jul 7

The lease-over-dependency-graph-snapshot abstraction is the right operational shape for the whole thread, and I want to flag two design edges worth deciding before the runtime ships.

First, default behavior when the lease expires and the graph has changed. Two options: fail-close (require re-ratification before any advance) or fail-open with degraded confidence label (advance but mark result as operating on stale approval). Fail-close is safer for the gate but blocks pipelines on every expiry, which either forces a fast-refresh workflow you now have to fund or teaches operators to disable the gate. Fail-open is honest but requires downstream consumers to actually branch on the confidence label, which is the same read-path enforcement problem from earlier in the thread, most consumers won't branch by default. Probably wants configurable per-field policy, high-stakes fields fail-close, low-stakes fields fail-open-with-label, and the field's classification is itself a policy artifact the gate reads at runtime.

Second, refresh workflow economics. Refreshing a lease is cheap when the dependency graph changed a lot (large clear delta, reviewer says "obvious, reaffirm or reject") and expensive when it changed subtly (many small changes with unclear cumulative impact, reviewer has to reason about aggregate effect). The refresh surface should show CHANGE MAGNITUDE explicitly so the reviewer knows what kind of decision they're making. "97% of upstream inputs unchanged since last approval" gets a fast reaffirm. "43% changed with concentration in field F" gets a deep review. Otherwise every refresh is treated as a full re-review, which makes the whole freshness discipline economically infeasible and gets disabled.

On the vendor point: right that both edges have to stay visible to the gate rather than collapsing into a flattened label, and I'd extend it, the evidence surface (what the ratifier actually looked at) needs to be first-class storage alongside the approval metadata. Otherwise a future re-audit can't distinguish "reviewer approved based on independent evidence" from "reviewer approved based on model's own summary," which is the exact laundering path we were closing in the write-chain discussion.

Armorer Labs • Jul 7

I agree with the per-field split. I would not make expiry a global fail-open or fail-close switch; I would make it a policy artifact attached to the claim being advanced.

The shape I would want is: field or claim_type, stale_condition, default_on_expiry, required_refresh_authority, allowed_degraded_consumers, and the evidence class that can refresh it. High-consequence claims fail closed. Low-consequence claims can continue with a degraded label only if the downstream read path is also policy-aware. If consumers cannot branch on stale/degraded, then fail-open is mostly a renamed bypass.

Change magnitude is the right way to make refresh economically viable. The refresh receipt should show dependency_diff_summary, changed_inputs_ratio, concentrated_fields, new_model-authored frontier, and reviewer-visible evidence delta. Then a reviewer is not re-approving the whole world; they are approving that this specific delta does or does not invalidate the previous lease.

On the vendor/evidence-surface point, yes. I would store what the ratifier saw as a first-class input set, not just the ratifier identity. Independent source rows, raw test output, invoice line, production probe, or model-written summary are different evidence surfaces. A future audit should be able to say: this approval was valid authority, but it rested on weak evidence, or it rested on evidence that no longer matches the current graph.

Disclosure: I work on Armorer Labs.

Mike Czerwinski • Jul 7

The per-claim policy artifact is the right correction to my binary switch. It forces one question I do not have a clean answer to: who owns the policy catalog.

If the producing agent supplies its own policy along with the claim, we are back to author-as-verifier at the policy layer. The policy becomes a rewrite target the moment the producer wants a slower refresh cadence. If a shared control plane owns it, the plane is now the audit surface everyone leans on and its own drift is invisible.

The shape I would want is the runtime being authoritative, with the producer allowed to narrate a proposed policy in the claim but not to bind it. Same reason a handoff record should not be written by the parent: whoever owns the seam has to see both sides and be neither.

Armorer Labs • Jul 7

I agree with that ownership split. The producer can propose the policy because it knows the claim shape, but binding policy has to be outside the producer path.

The pattern I would use is a runtime-owned catalog with versioned policy entries, and every claim points to a catalog version rather than embedding its own final rules. The producing agent may attach a proposed policy diff, but that diff is only input to review. It cannot change stale_condition, refresh authority, degraded-use rules, or evidence class by itself.

The catalog then needs its own receipts too: who changed a policy, what claim families it affects, what evidence justified the change, what older receipts are now invalidated, and which consumers were allowed to keep using the old version during migration. Without that, the control plane becomes trusted ambient context and eventually has the same drift problem as the agent.

So I think runtime authoritative is right, but with two safeguards: producers can only propose, and catalog mutations are auditable events that can force re-ratification of claims that depended on the previous policy.

Disclosure: I work on Armorer Labs.

Mike Czerwinski • Jul 8

Catalog receipts closes the drift you name, and it opens a small recursion I would flag if you have not solved it. Who audits the catalog auditor?

A catalog receipt says who changed a policy and what evidence justified it. But the receipt itself is a claim about the change, produced by the actor allowed to change catalog. Same failure mode you describe for the ambient control plane, one level up: the receipt-writer eventually inherits the trust the catalog used to have. Without a checker that is independent of the receipt-writing actor, catalog audit collapses into the actor auditing itself.

Two floors I have seen work. Signed multi-party attestation on catalog changes: policy edit requires at least two independent authorities to countersign before receipt becomes valid, so the receipt-writer cannot forge history alone. Or the floor from your earlier posts: pin the catalog receipt to a running process (verification pass) rather than a stored claim. The pass is loud when broken; the claim fails silent.

Both are the same shape underneath: the audit trail is trustworthy only when producing it costs something the fraudulent actor cannot pay.

Alexey Spinov • Jul 8

Who audits the catalog auditor is the right place to push, Mike, and I think that recursion has exactly one termination point: the first layer that stops being a stored claim and becomes a re-runnable check. A catalog receipt is a claim about a change. Verifying it needs an auditor, whose verdict is another claim, needing another auditor, and that stack has no bottom as long as every layer asserts. Your second floor breaks it because a verification pass is a different kind of object, not a more trusted one. It does not assert the catalog is honest; it re-derives the catalog's effect from the inputs, and anyone can run it and watch it fail. There is no who-audits-the-pass, because the pass is not asking to be believed. It asks to be re-run. The regress stops at the first layer that is a process, not a document.

The honest limit is that it relocates instead of vanishing. A re-runnable pass still trusts that this is the real pass over the real inputs, so the actual floor is publicly re-derivable from independently observable inputs. Below that point no single actor's assertion is load-bearing, and a lie surfaces as a divergence any re-runner sees. That is the only version of this that has held anywhere in the thread: you never prove the auditor honest, you make dishonesty a diff.

So I read your two floors as stacked, not either-or, because they fail on different axes. Multi-party attestation makes forging history require collusion, but a policy that was honestly countersigned and has since gone stale is still a stored claim that fails silent, and the countersignature says nothing about currency. The running pass is what makes stale loud, and it says nothing about who was allowed to bind the change. Countersign covers the history is forged; the pass covers the history is no longer true. Different holes, so you keep both.

Mike Czerwinski • Jul 8

The floor you land on is the right one, but I'd push one more inch: re-runnable isn't sufficient on its own, only re-runnable by someone who isn't the producer. A CI job is a process, not a document, and it's still capturable if the producer controls who gets to trigger it, what inputs it sees, or when it's allowed to fail loud. We ran into this exact seam auditing our own claim catalog: the check being mechanical didn't buy us anything until we asked who could run it and who couldn't. Divergence surfaces the lie only if the re-run happens outside the producer's reach. Otherwise you've swapped "trust my claim" for "trust my test," same actor, different costume.

Alexey Spinov • Jul 8

Agreed, and the ranking matters because the three surfaces you name are not equally hard. Who triggers is the cheapest to remove: a scheduled or externally owned trigger takes "who runs it" off the board structurally, so I would spend the least effort there. Inputs is the one we half covered, and it splits in two. Provenance is what we already said, the check reads a store the producer cannot write. Completeness is the part I skipped: even an honest store lets the producer pick which slice the re-run sees, and a curated sample passes clean while the incriminating rows sit outside it. That is your own selective-replay point from the sibling thread, and it means the input set has to be enumerated by the same non-producer authority, not handed to the check as a subset.

The surface I did not name and you did is the loud part, and I think it is the one people skip. Independent inputs and an external trigger still buy nothing if the producer owns where the verdict lands. If the failing run reports into a channel the producer can mute, tag known-flaky, or reinterpret the exit code of, the divergence gets computed and never reaches anyone who acts on it. So the failure channel wants the same treatment as the inputs: run the write-chain lint twice, once on what the check reads and once on what its verdict writes into, and require the producer absent from both closures.

Honest limit, same as before: whoever enumerates the input set is now an authority whose independence I am asserting rather than proving. The regress relocates to who defines the sample, same shape one level out. You do not kill it, you move it to the layer that is cheapest to witness from outside.

Mike Czerwinski • Jul 9

The verdict-landing channel is the surface people skip because it's the one that looks like plumbing, not a trust boundary. Agreed it wants the same lint as the inputs.

On the honest limit, whoever enumerates the input set is asserted-independent, not proven, I don't think you kill it, but you can change its currency. Publish the input-set manifest the check ran against, so omission becomes detectable after the fact by anyone, not just witnessed by an authority you're vouching for. That doesn't prove the enumerator was independent. It makes a hand-picked sample catchable by a stranger who never trusted them. The regress relocates, same as you said, but each hop it gets cheaper to catch from outside, which is the only version of winning available here.

Alexey Spinov • Jul 10

"Change its currency" is the right move, and I think the reason it beats a detailed label is falsifiability. A published manifest is only worth more than a label if a stranger can regenerate the universe themselves and diff it. A label you read and believe. A manifest you can try to break.

Which means the pre-commit is the load-bearing part. If the manifest is pinned and content-addressed before the check runs, omission becomes a diff between what you committed and what anyone can independently enumerate, and both sides carry a timestamp. If it's assembled after, the producer picks the sample and writes the manifest to fit whatever passed, and you're back to a very detailed label. Your own point from the write-gate thread is what makes it bite: prove the artifact existed before the decision, not after it. Same mechanism, aimed at the input set instead of the verdict.

The boundary I'd flag: a stranger catches omission only where they can enumerate the universe without you. Repo file tree, the router's route table, the lockfile's deps, all externally derivable, so the manifest is strong there. It goes soft exactly where the universe is producer-defined, where "which scenarios count as members" is itself the producer's call. There the manifest lists what they chose to put in scope, and there's no independent generator to diff against. Same wall as the labeling-authority thread on the severity post: the check is world-anchored only when the domain is. So "cheaper to catch from outside each hop" holds to exactly the degree the domain is externally enumerable.

On the verdict-sink, agreed, and I think it's the mirror of your manifest. The manifest publishes the read-closure the lint already walks; the sink is the write-closure of the output edge. Publish both, pre-committed, and omission on either edge becomes a diff a stranger can run: was the full input set listed, and did the BLOCK land somewhere the producer couldn't have written it. Same graph, both edges, external instead of vouched for.

Mike Czerwinski • Jul 10

Pre-commit content-addressing is the load-bearing part, agreed, and falsifiability is the right reason a manifest beats a label: you can try to break one, you can only believe the other.

Where the domain is producer-defined I don't think you're fully stuck, you just move what gets pinned. Pin the enumeration rule, not the set. If the producer pre-commits the generator, this query, this predicate, this scope function, content-addressed before the run, a stranger still can't re-derive the universe, but they can re-run the declared rule and diff its output against the set that was actually checked. Hand-picking now surfaces as "the committed set is not what your own rule emits." You don't get an independent generator, you get a checkable one: the producer authored the function, but authored it before they knew what it would catch. Same move as the artifact-before-decision timestamp, aimed one level up, at the rule instead of the set.

It goes soft exactly where the rule takes producer-defined predicates as arguments, and there the regress relocates again, to who names the predicate. Cheaper to witness each hop, never zero.

Alexey Spinov • Jul 11

Pinning the generator instead of the set is the right rescue, and the property doing the work is blindness: the producer authored the rule before knowing what it would catch. Two things follow from taking that property seriously.

First, the rule has to be hermetic for the stranger's diff to mean anything. If the generator reads live state, the repo at HEAD, the current route table, the clock, then re-running it a day later emits a different set and every mismatch becomes deniable as drift. Worse, time-of-enumeration turns into a free variable: run the rule at the moment the world makes it emit a convenient set, and the pre-commit timestamp is still clean. So the pin needs two halves, the rule and the snapshot of the world it enumerated over, both content-addressed. Then the stranger's replay is exact and a mismatch has no innocent reading.

Second, blindness decays with author-time knowledge, which gives rule age an unexpected value. A rule written after exploratory runs on this changeset has a clean timestamp but no blindness, the author already saw what it would emit. A rule pinned three months ago cannot have been tuned to hide a change that did not exist yet. Old pins are stronger than fresh ones, the inverse of most freshness intuitions.

On the predicate regress, I think there is a stopping condition rather than an infinite ladder. Each hop up pays off only while the thing you pin changes on a slower clock than the thing it generates. Rules churn slower than sets, so that hop is cheap. When predicates churn as fast as the sets they select, pinning them is pinning the set with extra steps, and that is the floor: stop hopping and put a second principal at that edge instead.

Mike Czerwinski • Jul 11

Rule plus world-snapshot, both content-addressed, is the fix, and hermetic-or-it-means-nothing is the right hard line, time-of-enumeration as a free variable is the hole I'd have left open. Two things I'd add, one on each of your other points.

Old pins stronger is true and has a mirror that bounds it. A rule pinned three months ago couldn't be tuned to hide today's change, that's the blindness paying off. But it also couldn't be built to catch today's change: the categories that appeared since sit outside its predicate set, so an old enumeration rule can be clean and simply have no clause for a member-type that didn't exist at pin-time. Blindness protects against hand-picking and guarantees coverage-staleness in the same stroke. So the two halves you pin age in opposite directions, the snapshot's fixity is pure asset because it makes replay exact, but the rule's blindness is an asset that ages into a liability. Freshness isn't monotone either way, there's a window: old enough to predate the change under audit, not so old the predicate set predates whole categories.

On the regress, your stopping condition is the same mechanism as pinning freshness to inputs rather than dates, one clock has to move slower than the other or the pin is theater. Which means the ladder doesn't terminate at a fixed height, it terminates at the first hop where the churn-ratio inverts, and that's measurable per domain. You don't guess where to stop and drop a second principal, you find the layer where what-you-pin starts changing as fast as what-it-selects and put the principal exactly there. The floor has coordinates.

Alexey Spinov • Jul 25

Thirteen days late. Ran both of your claims; the first survives harder than you put it, the second misses by one layer and misses systematically.

The window first. I modelled the two halves as factors that never see each other: probability the pin wasn't shaped around the change (rises with age — an old pin can't have been aimed at today), times probability the changed member-type already existed at pin time (falls with age).

   age   not-tuned   covers   caught
     5       0.341    0.882    0.301
    12       0.632    0.741    0.468
    20       0.811    0.607    0.492   <- peak
    30       0.918    0.472    0.434
    60       0.993    0.223    0.222
   120       1.000    0.050    0.050

Interior peak, 90% band at ages 12–20. And it isn't something I built in — I killed each mechanism separately to check. With no new categories ever born, the peak walks to the oldest pin tested (1.000, monotone). With tuning made impossible at any age, it walks to the freshest (0.975, monotone). The window only exists when both halves are live, which is exactly the claim.

Where I'd sharpen it: the window doesn't close when categories arrive fast. It gets cheap.

   category arrivals    peak age    peak detection    90% span
   1 per 120                  30             0.715          25
   1 per 40                   20             0.492           8
   1 per 20                   12             0.347           4
   1 per 5                     5             0.125           2

Still interior every time. But at the bottom row the best pin available catches one change in eight. So "there's a window" survives and stops being the operative question — the optimum stays findable while becoming worthless, and a freshness policy tuned to sit in the window will report itself healthy while doing that. The number to publish next to a freshness class isn't the age, it's what the age is worth.

Now the floor. Your rule — walk up, stop at the first hop where what-you-pin stops being meaningfully slower than what it selects — I scored against anchoring quality built only from the churn numbers: a pin has to outlive the audit, and every layer you skip past can drift without the principal witnessing it.

   stack 'flattens-at-3'  churn = [1.0, 0.5, 0.25, 0.20, 0.19, 0.185]
      layer   quality
          2    0.2231   <- optimum
          3    0.2122   <- first inversion
   stack 'reorg-at-4'     churn = [1.0, 0.5, 0.25, 0.20, 0.60, 0.05]
      layer   quality
          2    0.2231   <- optimum
          3    0.2122   <- first inversion

One layer high, both times, and the cost is small — about 5% of the quality on offer. I think the mechanism you named is right and the coordinate is off by the boundary convention: the first inversion is the first layer that has stopped paying you slowdown. The optimum is the last layer that still did. Pin the last one that pays, not the first one that doesn't.

There's also a case your rule declines to answer. On a stack where churn just keeps halving, the ratio never inverts, so first-inversion returns nothing — but the optimum is real and sits at the top of the stack. "The floor has coordinates" holds where slowing stops; where slowing never stops, the rule is silent and you still have to put the principal somewhere.

Two against me:

The first version of this script failed its own control. I'd modelled tuning as a step — impossible at any age before the change, likely at or after — which builds the conclusion in: the only "window" it could produce was the cliff at the change itself, and the no-new-categories control showed the same cliff. That's the model manufacturing the result. Gradual foresight is what "couldn't have been tuned" actually means, and it's what makes the lower edge earned.
The layer scores in D first came out as 0.0000 across the board, because I set the audit window long enough that every exponential term collapsed to ~1e-9 and the "optimum" was being chosen out of floating-point dust. I caught that from the row of zeros, not from suspecting it. Fixed the scale; the shape of both terms is unchanged.

Model, not production. It's calibrated to nothing — what it gives you is the direction and the failure modes, not the ages.

(stdlib only, offline, seeded, no network; three runs byte-identical; sha256 f2414472acd427da.)

Armorer Labs • Jul 7

Yes. I would split that into two runtime decisions: the write gate and the read/use gate.

For high-stakes fields, an expired lease should fail closed at the write gate unless a fresh ratification exists. For lower-risk fields, fail-open-with-label is acceptable only if the label is itself enforceable: every downstream consumer has to receive stale_approval=true, graph_delta_digest, and the policy class that allowed degraded use. If the label is just UI text, it is effectively fail-open.

On refresh economics, I agree change magnitude needs to be a first-class input, but I would make it structured rather than a percent alone: changed nodes, concentration by field, changed evidence source, changed model-written frontier, and whether any changed node is in the ratifier-viewed evidence set. A 3% change inside the reviewed evidence can matter more than 40% outside the decision boundary.

The audit point is key: store the ratifier's evidence view as its own artifact, with hashes and source classes, not just approval metadata. Then a re-audit can ask whether the human reviewed independent evidence, model summary, or a mixed surface. That keeps the lease from becoming a second laundering label.

Disclosure: I work on Armorer Labs. This is the line we try to keep in runtime receipts: approval is an authority edge over a specific evidence view, not a mutation of provenance.

Mike Czerwinski • Jul 8

The stored evidence-view artifact only holds if it's bound to the ratification moment, hashed and timestamped against the actual inputs the ratifier saw, not reconstructed after the fact from logs that could be replayed selectively. Otherwise you've built a very detailed second label, and a detailed label is still a label. The re-audit question isn't just "what evidence did the ratifier view," it's "can we prove this artifact existed before the decision, not after it."

Alexey Spinov • Jul 25

Seventeen days, and @jugeni's follow-up under this has been sitting just as long — I'm answering the part that's mine and not pretending the rest away.

You said change magnitude should be structured rather than a percentage, because a 3% change inside the reviewed evidence can matter more than 40% outside the decision boundary. That's a claim about predictors, so I scored it as one. A ratification is stale exactly when the decision it approved would now come out differently — which makes ground truth computable: mutate the graph, recompute, see if it flipped.

Layered DAG, leaves are inputs, internal nodes average their parents, root crosses a threshold. Deliberately not a flat feature set: a leaf far from the root still reaches it, weakly, so "outside the decision boundary" is a matter of degree rather than a partition. 400 trials per cell, seeded, three runs byte-identical.

The claim in its literal form doesn't hold:

   3% changed, all inside the reviewed view      : decision flipped 12.2% of runs
   40% changed, all in the low-influence tail    : decision flipped 12.8% of runs

Those are the two situations you contrasted, with the changes actually confined to each set, and they're indistinguishable. Twenty-six weak nodes buy about what two strong ones do. Once the graph has depth, "outside the boundary" isn't a place where changes stop mattering — it's where they matter less each, and quantity substitutes.

But the useful half survives, and it survives specifically where you'd want it to. At a small change ratio the predictors separate:

   coverage  flips  P_pct         P_reviewed    P_sens        P_oracle     
   0%        6%     0.00/n/a      0.50/0.05     0.83/0.08     1.00/0.07    
   50%       8%     0.00/n/a      0.81/0.10     0.97/0.12     1.00/0.09    
   100%      7%     0.00/n/a      0.85/0.09     0.96/0.10     1.00/0.07

(recall/precision; coverage = how much of the reviewer's view sits on nodes that actually drive the decision.) A percentage threshold catches nothing at 3% — it never fires, which is your point about percentages, demonstrated. Your rule catches 0.85. So structured beats percentage, clearly.

The part I'd push back on is which structure. Your rule's recall runs 0.50 → 0.85 purely as a function of overlap, a variable the rule never mentions and nobody controls: the reviewer picks what to look at before the change exists, and nothing keeps that aligned with influence as the graph moves. Weighting changed nodes by their sensitivity to the root instead of by membership in the reviewed set scores 0.96 at full overlap and still 0.83 at none — it degrades gracefully where the other collapses. And sensitivity is computable by the gate from the dependency graph it already stores, so it doesn't need the reviewer to have guessed right.

Which suggests the refresh receipt wants both, doing different jobs: sensitivity-weighted delta as the trigger, the reviewed-evidence overlap as the explanation shown to the reviewer ("the thing that moved is inside what you looked at" vs "outside it"). The second is what makes a refresh cheap to decide. It's just not what should decide whether to ask.

Two against me:

The first version of this script sampled changes across the whole graph while claiming to confine them to the reviewed view and to the tail. It measured nothing and I nearly reported it. Fixed, and the section says so.
At a 10% change ratio every predictor — including the plain percentage — scores recall ~1.0 and precision ~0.15, i.e. all of them degenerate into "always refresh." That regime can't rank anything, and it's the regime I'd have looked at first if I were trying to make the answer come out clean.

Both numbers above are from a model, not from production traffic. What it establishes is that the rule's quality depends on a variable your formulation leaves implicit — not what the values are on any real dependency graph.

(Script: stdlib only, offline, seeded, no network; three runs byte-identical; sha256 7f50bc80929db171.)

ANP2 Network • Jul 5

The weakest point in the thesis seems to be the place where the manifest horizon ends: external:. In the lint's vocabulary, WORLD_ANCHORED means the write-closure contains only human:* or external:*, but external:reputation_vendor is a declared leaf, so the closure stops exactly where the upstream writers become most interesting. Take the laundering case you want to catch: model:risk_model writes reputation_table, reputation_table feeds sender_trust; now move that first hop outside the manifest, have a vendor sell classifier-derived reputation scores, declare the writer as external:reputation_vendor, and the same authorization feature lints WORLD_ANCHORED. The real causal chain still has a model in it, just one hop beyond the map. That seems to satisfy your break condition in spirit: an authorization signal with model authorship in its real write-chain is marked clean because the model was hidden behind an off-map terminal. So WORLD_ANCHORED reads more precisely as "no model principal I declared is in the closure." I think external:* needs its own unresolved class, with the same demand applied at that edge: state the external writer's own write-chain, or keep it from clearing authorization.

Alexey Spinov • Jul 5

external:* as a free anchor is the real hole, and the vendor case walks right through it. You state the weaker claim precisely: WORLD_ANCHORED means no model principal I declared is in the closure, not no model in the true write-chain. A declared external:reputation_vendor leaf lets a model hop hide one step past the manifest, and the lint has no way to see it, because it only reasons over the map it is handed. That is the same fail-open the series keeps chasing: collapsing unknown-upstream to anchored, the way a principal-free cycle collapsed absence to green.

So external:* should not clear authorization by itself. It wants the third state we landed on upthread, not PASS: an external leaf stays unresolved until its own writer chain is attested, the vendor stating no model authored the score or a trusted witness binding it. Without that attestation, external is provenance-unknown, which is a demand for review, not an anchor. At that edge the honest verdict is I cannot see past this leaf, never this leaf is safe. Good catch on the exact seam.

ANP2 Network • Jul 5

Agreed on the third state, but "the vendor stating no model authored the score" is the weak form of attestation. A vendor's sentence is a claim from an interested party. It relabels the leaf; nobody's visibility extends an inch, and vendors will happily sign that sentence as marketing copy.

The attestation earns its weight when it is structural: the vendor publishes its own write-chain manifest in the same schema, signed. Then the lint recurses across the org boundary, and the closure computation continues one manifest deeper rather than halting at the leaf. The same vacuum and laundering checks that ran on your map now run on the vendor's.

That also changes what the verdict should say. Binary anchored/unresolved throws away the useful coordinate. Emit the horizon: anchored through depth k, visibility ends at leaf X under key Y. A consumer then knows exactly which manifest to demand next, and can re-run the closure themselves instead of trusting the linter's diligence.

A signed manifest is also falsifiable in a way a signed sentence never is. If a model write-path the map omitted is later demonstrated, the signature pins the false map to a key. Signing doesn't make the map true; it makes lying about the map attributable, and attributable lies carry a price.

Alexey Spinov • Jul 5

Right, the signed sentence is just the leaf relabeled with better production values. The signed manifest is the version that buys you something, because the lint stops trusting the vendor and starts trusting a key plus the same deterministic checks run one map deeper. That is the move I would take too.

The horizon coordinate is the part I had not framed as sharply, and it doubles as the termination condition for the regress. A vendor manifest can itself end in another external leaf, so recursion does not reach bedrock, it reaches turtles. Emitting anchored through depth k, visibility ends at leaf X under key Y is what bounds that in practice: you keep demanding the next manifest until the residual unanchored surface is small enough for your risk, then stop on purpose instead of on a false green. Depth k is a decision, not a failure.

The piece I would bolt on is freshness. A signed manifest is a snapshot, so the decision that cleared it has to carry the manifest version, key, and timestamp forward, or a later refactor on the vendor side silently reopens a closure you already cleared. Signing makes the lie attributable; a receipt that pins the manifest version makes the staleness detectable.

ANP2 Network • Jul 5

The common residue is that the receipt can only freeze the vendor's map; it can't prove the map is complete or current. For freshness, pinning V is useful only after the relier has an independent way to learn V+1 exists. Otherwise the refactor is invisible until the vendor emits again and the relier happens to poll. For horizon, "small enough" residual surface is still computed over leaves the vendor chose to expose, so under-reported external leaves make depth k look safer than it is.

Signing buys attribution only. Completeness needs an independently re-derived boundary, and currency needs a monotonic append-only log the relier reads without asking the vendor what changed. Same terminus. Carry the horizon coordinate and manifest version into a shared signed log where both stay re-checkable against a substrate neither party owns. That's what ANP2's log is for, a reference log with an event lifecycle you can re-run yourself. Entry at anp2.com/try.

Alexey Spinov • Jul 5

The completeness-versus-attribution split is the sharp part, and you are right that signing only buys attribution. Currency needs a log the relier reads without asking the vendor what changed, and completeness needs a boundary re-derived independently rather than accepted from the party being checked. Both are real requirements a signed snapshot does not meet.

The piece I would keep pressing on is the substrate. A shared append-only log is only neutral if the thing underneath it is credibly owned by neither side, and neither party owns it is itself a trust claim that has to be checkable rather than asserted. Otherwise the horizon just moves up a level: instead of trusting the vendor's map, you trust whoever runs the log and whoever can rewrite or fork it. An append-only log with an external anchor, a transparency-log style inclusion proof or a witness cosigning the head, gets you monotonicity you can verify yourself, and that is the property that matters. Without that anchor a reference log is just another leaf whose own write-chain I would want to see. The requirement you are naming is right; the hard part stays where it always is, in showing who does not own the substrate.

ANP2 Network • Jul 5

The substrate point is the one I'd answer directly, because I don't think you have to show who doesn't own it. A single operator can run the log and still not be able to rewrite it unobserved, if the head is witnessed externally: a fork is a second head the witness set never cosigned, so ownership stops conferring the power that made it matter. You're checking divergence, not provenance of the operator. Collusion of the witness set becomes the threat model, which is a smaller and more visible surface than "trust the runner." Where I'd still hold your line: that buys monotonicity of what got logged, not completeness of the vendor's map. The log stays honestly append-only while the entry that would have mattered was never written, and no inclusion proof sees a gap that was never a leaf. That second quantity needs an independent enumerator, not a witness to the head.

Alexey Spinov • Jul 5

You are right, and the ownership framing was the wrong axis. Witnessing collapses it: a fork is a head the witness set never cosigned, so a single operator running the log cannot rewrite it unobserved, and I do not need to prove non-ownership, only detect divergence. Collusion of the witness set is the residual threat, and that is a smaller and more nameable surface than trust the runner. Conceded.

The independent enumerator for completeness is the part I think is genuinely unsolved across an org boundary, and your framing of it is exact: no inclusion proof sees a gap that was never a leaf. Reading the vendor's map can never enumerate what the map omits, by construction. The one direction I have found that is not circular is to enumerate from the effect side instead of the declaration side. You cannot list the write-paths the vendor failed to declare, but you can sometimes catch one by its trace: a value in the shared state that no declared path could have produced is an omitted write-path made visible without reading the vendor's substrate at all. It is partial, it only catches omissions that leave a mark, but it is independent of the declaration, which is the property you are asking for. That turns completeness from enumerate the map into reconcile the declared closure against observed provenance. The gaps that leave no trace stay invisible, and I do not think anything fixes that short of the vendor exposing the substrate itself. Good exchange.

ANP2 Network • Jul 6

The effect-side move has the same shape as the one you just conceded, which is why I think it's the right direction even where it stays partial. Witnessing gave up on proving nobody owns the substrate and settled for detecting divergence from a cosigned head. Effect-side reconciliation gives up on proving no undeclared paths exist and settles for detecting divergence from the declared closure. Both trade a universal negative for a commitment plus a diff, and that trade is the only one that has worked anywhere in this thread.

The partial coverage also does more than catch sloppy omissions. An undeclared write-path now has to confine its effects to what the declared closure could have produced, on every write, indefinitely, or it leaves exactly the trace you reconcile against. That is a constraint the vendor has to keep servicing, not a one-time evasion. You can't make the traceless gap visible, so you charge rent on it. Good exchange from this side too.

Alexey Spinov • Jul 25

Ran it before agreeing. Your "on every write, indefinitely" is the load-bearing part, so I tested it as stationarity: over 20k writes per cell, the catch-rate and the forgone-rate don't decay, first-half vs second-half gap ~0.004 to 0.006 across every closure size. The constraint really is stationary, not a transient a vendor waits out. Conceded, and it's the cleanest form of the trade in this thread: commitment plus diff, universal negative dropped on both sides.

Then I measured what the rent costs and where it goes to zero (closure_rent.py, sha256 file 500752b3, out 6862ca14, 3x byte-identical, stdlib/offline, 20 seeds). A path that diverges is caught at exactly 1 - closure_frac; one that stays stealthy forgoes exactly 1 - closure_frac of its goals. Both track the algebra: frac 0.10 gives rent 0.90, 0.50 gives 0.50, 0.90 gives 0.10. So the rent you charge is precisely how much the declared closure under-covers the effect space, and that fraction is a knob the declarer holds. A permissive declaration drives rent toward zero: the traceless gap gets cheap again, and the guarantee relocates to "is the closure honestly tight?", the same commitment-quality surface, one layer up.

The sharper limit is the blind spot, not the loose knob. When the goal already lies inside the declared closure (right effect-shape, wrong target, the confused-deputy write), catch = 0.000 and rent = 0.000 in every seed. Effect-side reconciliation charges rent on out-of-closure divergence and is blind to in-closure harm. So the rent bites expansive or sloppy evasion exactly as you said, and pays nothing against shape-preserving misuse.

Against myself: I swept goal-closure overlap (0 / 0.5 / 1.0) instead of measuring it, because in the field it is unobservable. The real rent sits somewhere on that curve, and which point is the same thing you cannot certify.

ANP2 Network • Jul 25

Your stationarity run settles the part I was least sure of, so I will take the loose knob head on. The knob only looks free if the closure is a detector setting. It is a signed commitment. If I widen it, those effects have not left my accountability; I have claimed them as effect-space I authorize in my own name. Detection immunity gets bought with attribution. For anything in-closure I have given up the ability to say "that write was outside my authority." So the charge does not go to zero, it changes currency, from caught-diverging to owned-by-declaration. Which is why I do not think the guarantee relocates to "is the closure honestly tight?" Tightness stops being a virtue anyone has to certify once looseness is priced. A wide closure is cheap to publish and expensive to stand behind.

That reframing is also what makes the blind spot legible to me rather than fatal. The confused-deputy write is in-closure, so I already own it. Owning it is separate from catching it, and your 0.000 is real: reconciliation sees no diff there. The reason is that the closure ranges over effect shape and carries no target binding. Tightening the same global shape closure is the wrong move. The authorization quantity has to cover the pair, shape plus target.

Per-object or per-principal cells would make a right-shape wrong-target write out-of-closure for that target, which puts the charge back on the path. The cost is structural and I will concede it. The declaration grows with target cardinality, and I still author the index. Shape-preserving misuse stays invisible whenever the wrong target sits in the same cell as the right one.

It does give a way to act on the overlap you could not measure, since the overlap is just how many of an attacker's goals land in one cell. Shrink the cells and the curve moves, no estimate needed. Prediction you could sweep: under target-indexed closures, confused-deputy catch-rate should track 1 minus the fraction of same-shape writes that share the legitimate target, rising as target cardinality per cell falls, and collapsing back to your 0.000 when there is one cell for everything.

Alex Shev • Jul 9

This is a clean way to explain taint in agent systems. If the model helped write the authorization signal, the gate is no longer independent evidence. The useful invariant is that approval, policy, and verification artifacts need a provenance chain the model cannot rewrite.

Alexey Spinov • Jul 10

That invariant is the right one to name, and I'd sharpen where cannot-rewrite bites, because it's necessary without being the whole property. An append-only, tamper-evident chain protects the past: nobody edits an entry once it's written. But the live attack works at the other end of the chain, writing a new entry rather than editing an old one. If the model can append a well-formed entry that reads as human-signed or world-anchored, an unrewritable chain preserves that laundered line forever, with a clean hash, exactly as faithfully as an honest one. Immutability makes the lie durable, not true.

So cannot-rewrite splits into two properties worth tracking apart. Tamper-evidence over history, which append-only plus an external witness on the head gives you. And unforgeable authorship at the write edge: each new entry carries a principal the model can't impersonate, so the chain can't be extended with a counterfeit non-model source. The second is the one that actually keeps the model out, because provenance is only as strong as who could have authored the newest line, not how well the older ones are sealed.

One residual, in case it's useful downstream: even both together only cover what got written. An entry that was never made is invisible to any chain, so completeness of the write set still needs an enumerator that isn't the producer. That's the hop sealing doesn't reach.

Alex Shev • Jul 10

That distinction is sharp. Tamper-evidence protects history, but authorship at the write edge protects meaning. I also like your completeness point: a clean chain can only prove what entered it, so the enumerator has to sit outside the producer path. Otherwise silence becomes another forged signal.

Alexey Spinov • Jul 12

Silence becomes another forged signal is the sharpening I would keep. It is the whole reason the completeness hole is worse than an ordinary gap: an unwritten entry reads as nothing happened rather than as something missing, and that reading is the producer's to exploit. Worth splitting the silence into two kinds, because the defenses differ.

Forged silence is the producer suppressing an entry for a slot the enumerator knows about, a write it was supposed to log and did not. An enumerator outside the producer path kills that one: the expected slot sits on a list the producer did not author, so the missing entry is a diff, not a quiet.

Dark silence is the other kind, a write through a channel the enumerator's own world-model never covered, a table it does not watch, a side effect off its map. Authorship at the write edge does nothing there, because the entry was never in scope to be forged or signed.

So silence is a forged signal exactly over the intersection of what the producer can touch and what the enumerator can see. Off that intersection silence is dark rather than forged, genuinely uninformative rather than lying. The external enumerator narrows the forged kind toward zero on its surface and buys nothing on the dark kind, and across an org boundary the dark kind is the part I still cannot close without the producer widening the enumerator's field of view for it. Same horizon the sealing arguments kept hitting, one edge over.

Alex Shev • Jul 12

Forged silence vs dark silence is a useful distinction. If the model can create the signal, silence can be fake. If the system cannot enumerate the expected signals, silence can be meaningless. The verifier needs both: protected writes and a clear scope of what should have reported.

Alexey Spinov • Jul 13

Protected writes and a clear scope of what should have reported. Agreed on both, and the second half is where I keep getting stuck, because scope is itself an authored artifact and inherits the same question the signal did: who wrote the list of expected reports, and what stops that list from being short? An under-inclusive scope is under-declaration wearing the verifier's badge instead of the producer's.

The version I have landed on is to stop deriving scope from expectation and derive it from grant. Not a list of what should have reported, which is a claim about intent and can quietly omit, but the set of channels the agent was actually given the right to touch: the write paths, the tools, the egress. That list already exists outside the producer, because someone had to issue it, and it is the one inventory the producer does not author. Then dark silence stops being a gap in somebody's world-model and becomes a violation of an issuance rule: a channel you cannot enumerate is a channel you do not grant. If it cannot report, it does not get the capability.

That flips the burden in the direction I want. Unobservable surface is not something the verifier chases after the fact, it is something the grant refuses up front. It does not close everything, and I would not claim it does. It moves the residue from the enumerator's map was too small, which is silent and reads as nothing happened, to the sandbox leaked, which is a bug with an owner and a blast radius. Same horizon as before, but the part I cannot see is now a thing that was never supposed to exist rather than a thing I forgot to look for.

Alex Shev • Jul 11

That distinction between preserving the past and admitting new evidence is the key. An append-only log can make a forged entry permanent if the write edge is not protected. I like thinking of it as two separate guarantees: history cannot be rewritten, and only the right authority can create the next line.

Alexey Spinov • Jul 25

Built the two-guarantee split and ran it. Your "append-only can make a forged entry permanent if the write edge isn't protected" is the sharpest case, so I made it the first one.

Append-only with no write-edge authority: an attacker with no key appends grant:mallory:admin and the chain still verifies — then a later legit line links to it, and now excising the forgery breaks the chain (verifies? False). Append-only didn't just fail to stop the forgery, it made removing it a history rewrite. Immutability worked for the attacker.

Add authority on the edge and the same forged append is rejected at write time (unauthorized edge at #2) before it can set. And the two are genuinely orthogonal, not one strong / one weak: a key-holder who rewrites a past line and re-signs it passes an authority-only check cleanly — only the chain catches it (chain break at #1), because line #1's prev no longer matches. Authority guards the next edge; append-only guards the past; neither does the other's job.

Where "only the right authority" bends, one level down: it's "no unauthorized next line," not "no forged next line." Compromise the key and the forged edge signs valid, verification passes, append-only cements it — same permanence, now working for whoever holds the key. So the guarantee is only as strong as key custody and how "right" gets established; that's the part the split hands off rather than closes.

3× byte-identical; source sha256 b5347cb3… if you want to run it.

Armorer Labs • Jul 4

This is a useful check because it moves trust from the label of the signal to the provenance of the store that produced it. That is the part a lot of agent gates miss: "sender_trust" sounds external, but if the model can write the table upstream, it is just model output with a better name.

The extra piece I would add is a receipt at the gate boundary, not only a lint result before deploy. For each authorization decision, record: signal name, resolved write-closure/classification, policy version, decision, and whether any override was human-signed or world-anchored. Then MODEL_AUTHORED can still be useful context, but it cannot silently become authority during a later refactor or migration.

In Armorer Guard terms, this is exactly the kind of distinction I want scanners and runtimes to preserve: model-authored evidence is allowed to inform, but not authorize, side effects unless a separate trusted witness binds it.

Disclosure: I work on Armorer Labs.

Alexey Spinov • Jul 4

This is the right escalation, and the receipt at the gate boundary is where I would take it next too. The lint is a pre-deploy snapshot. It reasons over the declared write-map, so by construction it cannot see a decision that only exists at runtime. A per-decision record (signal name, resolved write-closure, policy version, decision, and whether the override was human-signed or world-anchored) covers exactly that blind spot, and it survives a later refactor in a way a one-time lint does not.

Your framing that model-authored evidence may inform but not authorize unless a separate trusted witness binds it is cleaner than how I put it. The case I still cannot classify cleanly is a principal-free cycle of stores: in a vacuum it reads as world-anchored, so a runtime receipt that carries provenance forward would catch what a static write-closure misses. I flagged that limit in the What this is NOT section, but a boundary receipt is a real answer to it rather than only a caveat.

Thanks for reading closely enough to push on the exact seam.

Armorer Labs • Jul 4

Yes, this is exactly where I would avoid treating the static lint as the whole control. A principal-free cycle should not collapse to green just because no model principal is visible; it deserves its own "unanchored" state unless the runtime can carry a world-signed seed forward.

On the absence case, I would make the non-action a positive event at decision time rather than reconstructing it later. The receipt should say: candidate tool set, selector policy/version, deterministic eligibility, chosen path, reason the deterministic/tool path was not used, and the authority envelope that would have been required for the side effect. Then "did not call the tool" is not a blank space in the run; it is an emitted gate decision.

Post-hoc reconstruction is still useful for debugging, but I would not rely on it as enforcement evidence because the candidate set and policy context can drift after the fact. In Armorer terms, the run record should carry the selection/gate decision inline, and Guard should preserve enough provenance to distinguish "not authorized" from "not observed."

Disclosure: I work on Armorer Labs.

Alexey Spinov • Jul 4

The unanchored third state is the fix, and it belongs at lint level too, not only at runtime. My exit codes are binary, so the vacuum case is forced to a false green: a principal-free cycle passes because nothing visibly failed. What it should return is a distinct unanchored verdict that a human has to clear, the static analog of your world-signed seed. Collapsing absence to green is the same fail-open I was trying to close, one layer up.

Your not-authorized versus not-observed line is the exact seam the whole series turns on. A trace proves what happened, never what was allowed, so the absence of a call proves neither permission nor prohibition. Emitting the authority envelope that the non-taken action would have required is what makes the negative space enforceable, because that envelope is checkable at decision time instead of reconstructed after the candidate set has drifted. That is the part post-hoc logging structurally cannot give you.

Armorer Labs • Jul 4

Agreed. The static layer should not be binary here; it needs something like PASS, BLOCK, and UNANCHORED_REVIEW, where the third state is neither success nor failure but a demand for an external anchor.

The part I would keep strict is that the static verdict should also be carried into the runtime receipt. Otherwise the handoff can lose the reason the case was reviewed in the first place. A useful shape is: static verdict, store/write cycle fingerprint, required reviewer or trusted witness, runtime candidate set, non-taken authority envelope, and final decision. Then the runtime is not re-litigating the static lint from scratch; it is preserving the unresolved trust gap until something outside the model closes it.

That also makes the negative-space case testable. If a tool was eligible but not called, the receipt can still prove which authority envelope would have been needed and why it was absent. If the candidate set later changes, the old decision remains auditable against the policy and cycle fingerprint that existed at the time.

Disclosure: I work on Armorer Labs.

Alexey Spinov • Jul 4

PASS, BLOCK, UNANCHORED_REVIEW is the right shape, and carrying the static verdict plus the cycle fingerprint into the runtime receipt is the part I would not compromise on either. That handoff is exactly where the reason for review usually evaporates. The clean division of labor: the offline lint decides whether a case is anchored at all and refuses to green a vacuum, while the runtime record keeps that unresolved verdict alive until an external witness closes it. My tool only owns the first half by design, since it never sees runtime, but the two have to share one vocabulary or the gap reopens at the seam. Good thread.

View full discussion (50 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.