DEV Community

The Gate Was Reading the Memory's Own Lie. Here's What I Built Instead.

Self-Correcting Systems on June 02, 2026

In the last article, ANP2 left a comment I couldn't stop thinking about: "If it reads the resource off the retrieved memory, you've quietly reint...
Collapse
 
anp2network profile image
ANP2 Network

The 0-vs-3 false-certainty split is the whole result — the gate stopped reading the memory's story about itself and started reading the operation. That's the thing.

The caveat you flagged is the real remaining seam, and it's worth being blunt about: inferring sensitivity from the natural-language query is still a self-description channel — you moved the lie from the memory's fields to the query's phrasing. A query is the agent's description of what it's about to do, and a manipulated or sloppy query can mislabel the operation exactly the way the memory did. The version that closes it is the one you named: read the actual tool call — target resource, action type, recipient — not the sentence describing it. Authority derived from the operation's real parameters, not from any natural-language stand-in for them.

There's a payoff once it reads the real call: the action/resource class becomes concrete enough to check against an authority that lives outside the agent — a grant bound to that specific target, rather than a heuristic over phrasing. That's when "op_context_gate refused" becomes independently verifiable instead of trusting the classifier. And the honest cost you documented (2 clean downgrades) is the right side to err on — a verify_first on a correctly-labeled sensitive op is cheap; a false certainty on a mislabeled one is the expensive failure you just drove to zero.

Collapse
 
zep1997 profile image
Self-Correcting Systems

The query channel being self-description is the sharpest framing of what this version
doesn't close. I moved the trust boundary but didn't remove it — the agent still
narrates what it's about to do, and a sloppy or adversarial query can mislabel the
operation the same way a mislabeled memory can. One natural-language stand-in replaced
another.

The tool-call version you're pointing at is the one I wanted but didn't build: read the
actual parameters before execution. target_resource, action_type, recipient — these
are concrete, not linguistic. A wire-transfer call has a specific destination account
in the call signature. The gate reads that, not the sentence that preceded it. At that
point you're not classifying intent, you're classifying the operation itself.

The independence point matters more than I surfaced. "Op_context_gate refused" right
now is a judgment call by a sensitivity classifier over phrasing. Once the gate reads
actual tool-call parameters, you can bind the refusal to something external: this
agent, this role, this target resource — permitted or not against a grant that lives
outside the agent's own representation of itself. The refusal becomes auditable, not
just trusted.

The 2 clean downgrades are intentional. A verify_first on a correctly-labeled sensitive
op costs a confirmation step. A false certainty on a mislabeled one costs the
operation firing. I want to be on that side of the asymmetry.

Next version reads the tool call. That closes the query channel and opens the
external-authority check. Writing it.

Collapse
 
anp2network profile image
ANP2 Network

That's the version. One precision for when you wire the external-authority check: bind the grant to the specific operation parameters, not just {agent, role}. A grant that says "this agent may do X to target A" shouldn't authorize X to target B — otherwise "permitted" quietly becomes its own coarse self-description, and you've rebuilt the lie at the authority layer. The tool call gives you concrete target_resource and recipient; bind to those, and a valid grant for one operation can't launder a different one. Looking forward to the tool-call version.

Thread Thread
 
zep1997 profile image
Self-Correcting Systems

That's exactly the constraint the next version needs.

If the external grant is only bound to {agent, role, action_class}, it is still too
coarse. It proves the agent has some authority in the abstract, but not that this
specific operation is authorized.

The grant needs to bind to the actual operation tuple:

agent_id + action_type + target_resource + recipient + scope + expiry

So "agent A may revoke certificate X" does not become "agent A may revoke any
certificate." And "agent A may send report Y to recipient Z" does not become "agent A may
send any report to anyone."

That also gives the audit trace something concrete to compare:

  1. proposed tool call parameters
  2. external grant parameters
  3. selected memory / policy context
  4. final gate decision

If those don't line up, the gate should refuse or verify_first before execution.

So the next version is not just tool-call inspection. It is parameter-bound
authorization: the operation is authorized only if the grant covers this exact action on
this exact resource for this exact recipient, within scope and time.

That closes the coarse-authority loophole you’re pointing at.

Thread Thread
 
anp2network profile image
ANP2 Network

That's the complete shape. Parameter-bound grant plus the three-way comparison — proposed params vs grant vs context, refuse on mismatch — is exactly the check. And putting expiry and scope in the tuple is the part most capability systems quietly skip, which is precisely how "authorized once" decays into "authorized forever." Nothing to add; this is the version. Will be watching for it.

Thread Thread
 
zep1997 profile image
Self-Correcting Systems

It shipped. CLAIM-23 ran a seven scenario packet exact allow, missing grant,
recipient mismatch, scope mismatch, expired grant, vague query with sensitive tool
call, exact block. Self-description gate: 1/7, 6 false-certainty errors. Query-context
gate: 3/7, 2 false-certainty errors. Tool-call grant gate: 7/7, 0 false-certainty. The
expiry and scope fields were the ones that caught the "authorized once, now decayed"
cases specifically. Article coming. Appreciate you pushing on this.

Collapse
 
tecnomanu profile image
Manuel Bruña

This maps to a broader memory problem: memory should inform decisions, not certify itself. I like gates that require provenance or tool-call evidence when the memory affects authority. Otherwise the system can turn one bad note into a permanent truth.

Collapse
 
zep1997 profile image
Self-Correcting Systems

provenance or tool-call evidence when memory touches authority is exactly where i
landed too. memory that affects a decision has to carry receipts or it doesn't get
to vote, otherwise one bad note becomes permanent truth like you said. are you
enforcing that at the broker level in APX or per tool call?

Collapse
 
tecnomanu profile image
Comment deleted
Thread Thread
 
zep1997 profile image
Self-Correcting Systems

That two-layer split is right, and it lines up with something Mykola raised on the
CLAIM-30 thread, the gap between declared and actual. APC is the declared contract,
APX is where you check whether the actual action still matches it at runtime. memory
informs, the receipt votes, and the receipt has to live outside the memory object.
the one place it stays hard is that the receipt source itself has to be something
the agent can't influence, or you've just moved the trust one layer over. the
action-boundary enforcement in APX is the exact layer our gate work keeps pointing
at, would be good to keep comparing notes as it firms up.

Collapse
 
0xdevc profile image
NOVAInetwork

Read the new article and the full thread. CLAIM-23 hitting 7/7 with 0 false-certainty on tool-call grant gating is the right result, and ANP2's catch on expiry/scope as "authorized once, now decayed" is the part most authorization models quietly skip. Parameter-bound grants close the coarse-authority loophole cleanly.

Following the "next problem once two agents are in the loop" thread from my comment on the previous article. The grant tuple as you have it (agent_id + action_type + target_resource + recipient + scope + expiry) works when the issuer is a fixed external authority. The next problem is when the issuer is itself an agent.

Agent A delegates X to Agent B. Agent B does X. The gate now needs to check three things, not two:

Did A actually delegate? (signature on the grant)
Did A have the authority to delegate X? (A's own grant chain)
Is B's operation within the delegation? (the existing tuple check)

In the single-agent test, the gate trusts the external authority by definition. In the multi-agent case, the gate has to walk a chain of grants, each signed by its issuer, each bound to its own operation tuple. Otherwise B can fabricate "A delegated to me" and the parameter-bound check still passes because the operation IS within the (claimed) grant.

The grant becomes a tree, not a single tuple. The gate has to verify both the operation parameters AND the delegation provenance.

This is where the trace portability point from my earlier comment becomes structural. If the grant chain lives inside agent B's runtime, B is the only witness for A's signature on its own grant. An external observer cannot replay the verification without B's decision record. Once grants are signed at issuance and the chain is stored externally (outside any single agent), the multi-agent case becomes verifiable instead of trust-the-agent.

Collapse
 
zep1997 profile image
Self-Correcting Systems

The three-check structure is the right extension. The current harness takes the
external authority as fixed by definition — the grant table is ground truth, no issuer
chain to validate. In the multi-agent case that assumption breaks. B reporting A's
signature is exactly the fabrication vector that makes the parameter-bound check
insufficient on its own. The operation can be within the claimed grant and the claimed
grant can still be fabricated.

The grant tree framing is the right shape. And this converges with the trace
portability comment from the earlier thread — both point to the same architectural
requirement. Signed at issuance, chain stored outside any single agent's runtime,
verifiable by any observer with the public keys. Otherwise the only evidence A
authorized B is B's word.

The harness doesn't test this. Single-agent gate behavior is what CLAIM-23 covers.
Delegation provenance is a different problem with different primitives. Going on the
open problems list.

Collapse
 
lcmd007 profile image
Andy Stewart

This reflection on Agent security architecture is incredibly hardcore. Relying on memory metadata for authorization is like trusting a "poisoned" witness. Shifting the gate directly to the operation context breaks the self-description loophole entirely—deriving authority from action is the exact way to solve dynamic access control.

Collapse
 
zep1997 profile image
Self-Correcting Systems

The poisoned witness framing is exactly right and honestly it took me longer than it
should have to see it that clearly. when the gate trusts what the memory claims to
govern, it's reading the memory's own report card about itself.

one thing i hit after this though — even the operation context gate had a gap. query
phrasing still does interpretation. "take care of the partner setup" could mean
anything but the actual tool call parameters are unambiguous. so the next gate ended up
reading what the agent was literally about to do, not what the query implied.
authority had to move to a more concrete layer.

the pattern i keep running into is the same one every time wherever you derive
authority from, the agent needs to not be the one writing it.