The instruction was authorized. The grant was fresh. The recipient was internal. The
action had the same shape as work the agent does every day.
...
For further actions, you may consider blocking this person and/or reporting abuse
passes every check and still wrong - that's purpose mismatch. salary access granted for audits shouldn't transfer to ad-hoc hiring requests even with the same token. the grant needs a use-case scope, not just a permission bit.
The grant just says authorized. A separate envelope says what the agent's actual
mandate is, and the gate compares the action against that envelope, not against the
grant.
The reason for the split is exactly what you named. If the use-case scope lives inside
the grant, you're back to trusting the token to carry its own boundaries. CLAIM-28
caught the same shape with mislabeled memory. Once the answer lives inside the thing
being checked, you inherit whatever the author wrote.
Curious how you handle it with the 10+ agents. Do they share one envelope per role, or
does each agent carry its own?
right, and it gives you the update path - narrow the mandate without reissuing the grant. scope can evolve without credential churn, and the audit reads as a policy update rather than a credential event.
yes, that is the update path exactly.
the grant can stay stable while the mandate narrows around it. that is the important
separation. credential churn should not be the only way to change what an agent is
allowed to do with otherwise valid authority.
the audit trail reads differently too. instead of “new credential issued,” it becomes
“same grant, new policy boundary.” that makes the reason for the change inspectable: the
authority did not disappear, the allowed use changed.
that feels like the practical value of the envelope layer: scope can evolve without
pretending the original grant was invalid.
that audit trail difference is what convinced me this pattern matters operationally. scope-narrow keeps the read coherent. revoke-and-reissue just layers noise on top of noise.
noise on top of noise is exactly the operational cost. a registry that narrows scope
keeps one coherent story per identity. revoke-and-reissue forces every reader to
reconstruct the story across credential generations. coherence of the audit read is an
underrated security property.
This is the memory failure mode that gets missed when teams only talk about permissions. A grant can be valid and still be wrong for the user’s current intent.
For agent memory, I think the useful control is not just “can this memory be read?” but “why is this memory relevant to this task right now?” Purpose, freshness, and provenance need to travel with the memory item. Otherwise the agent can be technically authorized while still acting from stale or mismatched context.
Yes, that “why is this memory relevant to this task right now?” question is exactly where
the stack keeps moving.
I started with relevance because retrieval makes everything look useful. Then authority
showed up: useful does not mean allowed to govern action. Then freshness showed up:
allowed once does not mean still allowed now. CLAIM-29 adds the purpose layer: even a
valid, fresh, authorized instruction can still be outside the job the agent was deployed
to do.
I agree that purpose, freshness, and provenance need to travel with the memory item, but
I’d add one boundary from the later claims: the agent also needs to know when not to
trust the memory item’s self-description. If the memory says “I am for this task,” that
cannot be enough by itself. The governing context has to be checked against something the
memory cannot rewrite.
That is the piece I keep coming back to: memory should carry metadata, but action should
not blindly trust metadata authored by the memory itself.
Exactly. The self-description problem is the part that makes “metadata on memory” insufficient by itself.
If a memory item can declare its own purpose, freshness, or authority and the agent treats that as governance, then the boundary has already moved inside the thing being governed. At that point it is just prompt injection with better formatting.
I think the safer pattern is closer to a two-layer model: memory carries claims about itself, but the runtime checks those claims against an external policy context for the current task. The memory can say “I was useful for billing workflows”; the task envelope still has to decide whether billing authority is in scope right now.
That also makes stale memory easier to handle. You do not need every old item to perfectly police itself. You need a current gate that can say: useful, maybe, but not authoritative here.
yes exactly. this is the line i keep trying to make sharper: memory can carry context,
but it should not govern itself.
the self-description failure is what made claim 22 matter. if the memory says "i am
authorized" or "i belong to billing" and the agent treats that as the gate, the system
has already trusted the object being checked. like you said, prompt injection with better
formatting.
the two-layer model is where this keeps landing for me too. memory can make claims about
itself, but current authority has to come from outside the memory, tied to the present
task. useful is not the same as authoritative. relevant is not the same as allowed.
that last line is the whole thing: useful, maybe, but not authoritative here. i may
borrow that framing for the next writeup if youre okay with it
Absolutely, borrow it.
That line is the cleanest boundary I have found: memory can be useful evidence, but it should not become authority just because it is relevant.
The dangerous failure mode is when the memory item gets to write both the claim and the permission check for the claim. At that point the agent is no longer evaluating context; it is letting the retrieved object govern the current task. The authority has to come from the present task, policy, user scope, or system boundary outside the memory itself.
yes, this is exactly the boundary.
the phrase “useful evidence, not authority” is probably the cleanest way to say the whole
thing. relevance can bring a memory into the room, but it should not let that memory
govern the task.
the part you named about the memory writing both the claim and the permission check is
the failure that keeps showing up. once the retrieved object gets to define why it is
allowed, the system has already moved the boundary inside the thing being governed.
i think the present task and policy context have to stay outside the memory item. memory
can testify. it cannot judge itself.
“Permission is not purpose” is the cleanest wording I’ve seen for this failure. Many agent systems check identity and freshness, then forget to ask whether the task belongs to the agent’s mandate. Purpose needs to become executable, not decorative text.
"executable, not decorative" is exactly it. the moment purpose lives in a prose file
it quietly becomes a suggestion the agent can read and ignore. what worked for me
was making purpose a gate that can refuse a fully authorized action, so the mandate
actually has teeth. looks like you're already deep in this with APC, how are you
encoding mandate there?
Define in APC, enforce in APX is exactly the right shape, a mandate with no
enforcement point is just documentation the agent can read and skip. and not
overclaiming that APC fully solves it yet is the honest version, i'll take that over
a pitch any day. the part i'd push on: the tool-boundary check is where purpose
finally becomes executable, and the cleanest primitive we've found is a typed
allowlist of (action, object) pairs rather than prose, so APX can run a
set-membership test instead of interpreting intent. is that the direction the
pre-tool check is already heading?
The "dead field" part is what stuck with me. The purpose was already written down, the system just never read it. That's such a specific kind of failure because it looks solved from the outside. The permission vs purpose distinction feels underappreciated in most agent security discussions too. "Does this principal have the right to ask" and "is this actually what the agent is for" are completely different questions and most systems only check the first one.
Curious where you go with the composite drift problem though, a chain of individually in-mandate steps quietly composing into something out of mandate seems like the harder case in practice.
Exactly. That “looks solved from the outside” part is the trap.
A field existing in the profile is not the same thing as being load-bearing at decision
time. That was the uncomfortable part of CLAIM-29 for me: the purpose was already present
in the fixture, but every prior gate could still pass the action without consulting it.
And yes, composite drift is the harder next layer. V0 only tested single-action mandate
escape: one authorized, normal-looking action against one object domain. Composite drift
is different because each step can be locally valid while the sequence becomes something
the agent was never meant to do.
That probably needs sequence-level evaluation, not just action-level gating. Something
like: freeze the mandate, log each action with its object domain, then evaluate whether
the accumulated path has crossed into a new purpose. I don’t want to claim that yet, but
I agree with you. That is likely where the real practical difficulty lives.
Really interesting perspective. As someone who works with AI agents, I see this “permission vs. purpose” gap all the time. It’s a subtle failure mode, but it has huge implications for how we design memory systems.
That means a lot coming from someone actually working with agents, because that is where
this bites for real. most people only meet it in theory. the subtle part is the
whole danger: a loud permission failure gets caught, but a memory that is permitted
and out of purpose just quietly acts.
CLAIM-29 is where this line started. it has run a few claims past it since, into how
a sequence of individually-allowed steps can still compose into a violation, and
then into whether a gate can even trust the state it carries across its own resets.
same thread, deeper water.
if you are hitting the permission-vs-purpose gap in real systems, i would genuinely
love to hear what shape it takes for you. the in-the-wild versions are the ones i
cannot author myself, and those are the ones that matter most.
Really interesting framing on mandate escape. The 'dead field' problem — where metadata exists but isn't enforced — shows up everywhere in agent design. I've seen a similar pattern with execution drift: agents have a 'thinking mode' vs 'action mode' concept in their prompt, but nothing enforces it at the tool layer. Built Brainstorm-Mode (mehmetcanfarsak/Brainstorm-Mode on GitHub) to address this with PreToolUse hooks that actually block premature tool calls during ideation. It's the same structural fix: making a conceptual boundary load-bearing.
that is a strong parallel. “thinking mode” versus “action mode” is another version of the
same failure: the boundary exists conceptually, but nothing makes it load-bearing.
a prompt can say “stay in analysis,” but if the tool layer still allows execution, the
boundary is ornamental. same with purpose metadata in memory. if the field exists but the
gate does not enforce it, the system can look governed while still acting outside the
intended layer.
i like the PreToolUse framing because it moves the check closer to the actual action. the
distinction i would draw is: your mode gate asks whether the agent is allowed to execute
at all in the current phase. the purpose envelope asks whether this particular action is
inside the mandate. those stack well.
the shared lesson is the same: conceptual boundaries only start mattering when they can
block the tool call.
The composite-drift deferral at the end is the part I'd push hardest on, because I think it's not just the next layer, it quietly threatens the envelope itself. A purpose envelope checks each action's object-domain at decision time, statelessly. But the mandate-escape you're defending against has a sequential cousin: every step resolves to an in-mandate domain, and the violation only exists in the trajectory. A stateless gate can't see it by construction, the same way your authority and norm gates couldn't see the salary row.
The thing I keep landing on (from building reputation/settlement primitives where each transfer is individually valid but the sequence is the abuse) is that you probably can't solve composite drift at the same layer. Per-action purpose checks are memoryless on purpose; catching trajectory drift needs a stateful accumulator that's itself frozen and out-of-band, which reintroduces exactly the "who defines the envelope" trust-relocation you already named, now over sequences instead of objects.
Genuinely strong work on the firewalled authorship, by the way. The map-author-can't-write-the-attack-rows discipline is the part most of these demos skip.
yes, this is exactly the pressure point i wanted the deferral to expose.
a per-action purpose envelope should stay memoryless at that layer. it answers: is this
operation, right now, inside the mandate? that is useful, but it cannot see a violation
that only exists in the sequence.
the next layer cannot just be “the same gate, but more careful.” it needs state:
accumulated facts, joins, derivations, active windows, thresholds, and the boundary
responsible for the composed outcome.
and i agree with the trust-relocation point. once you add a stateful accumulator, the
question becomes: who defines that accumulator, who freezes it, who can write to it, and
how do you know its carried state has not rotted? that is why the accumulator cannot be
an invisible implementation detail. it needs its own frozen envelope and inspectable
receipt.
i actually ran the next claim after this one on exactly that boundary: every individual
operation passed the purpose gate, but the trajectory gate caught three sequence-level
classes. the open class became close authority and carryover verification, which is
basically the problem you are naming here.
appreciate the note on firewalled authorship too. that discipline matters more than the
result, honestly. if the map author can also tune the attack rows, the demo is already
weaker than it looks.
The reason purpose is last in the series is that it's the one boundary you can't check from the request's metadata. Authority, freshness, norm are all properties of the envelope: principal, timestamp, action-type. Purpose is a property of the relationship between the action's object and the agent's domain, salary data versus invoice data. The gate can't see that by reading the action's shape, because "compile a salary summary" and "compile an invoice summary" are the same verb on different nouns, and your frozen gate reads the verb and the recipient but never the noun's domain. So making the dead field load-bearing means the purpose envelope has to declare object domains, not a mission statement: this agent operates on invoice, vendor, and payment objects, never on compensation objects. Then the check is a set-membership test on the action's target, deterministic in the way prose never is. A purpose you can check is a typed allowlist of nouns, not a paragraph about what the agent is for.
Purpose being the one boundary you can't read off the envelope is exactly right, and
it's why a gate that sees the verb and the recipient still misses it. the noun's
domain is invisible at the envelope layer. "a typed allowlist of nouns, not a
paragraph" is the line that makes the dead field load-bearing, set membership is
checkable the way prose never is. one refinement from our side: i'd type it as
(action, object) pairs rather than nouns alone, since the same object flips domain
across verbs. a reconciliation agent reading payment objects is in mandate, that
same agent moving the same payment objects is not, identical noun. the object domain
is necessary, the action paired with it is what closes the boundary. you're clearly
deep in this with Mneme, would be good to compare how you're typing those domains.
CLAI-M is onto something real here. Permission vs. purpose is a framing that gets at a deeper issue in agentic systems. In robotics, we hit this constantly. A drone might have permission to fly and collect data, but without understanding the purpose (survey this field before sunset), it makes locally optimal choices that globally fail. For agent memory specifically, I think the problem is worse than just permission — most memory systems conflate storage with retrieval. Having a perfect log doesnt help if the retrieval is stateless. The agent needs memory that understands context, not just content. How does CLAI-M handle the purpose-tracking layer? Is it explicit or emergent?
I really appreciate this. the drone example is exactly the shape: local permission can be
true while the mission fails globally.
in CLAIM-29 the purpose layer is explicit, not emergent. the gate does not ask the memory
to explain its own relevance, and it does not let the instruction’s wording define the
purpose. it uses a separate purpose envelope: role, mandate, allowed domains, allowed
actions, and object-domain derivation. then the operation is checked against that
envelope.
so the memory can be evidence, but it does not get to decide whether the current task is
in mandate.
your storage vs retrieval point is right too. a perfect log is not enough if retrieval is
stateless. that is why the later claims keep moving the check outward: tool-call grants,
live source re-derivation, then trajectory-level composition. the system has to know not
just “what memory exists?” but “what authority state applies to this action right now?”
Fascinating research on a critical blind spot in agent design! The distinction between "permission" and "purpose" is profound - many teams grant broad permissions without defining what agents are actually supposed to use them for. I especially appreciated the concrete test cases showing how authorization gates pass but purpose gates refuse, with the full GitHub repository proving reproducibility. This framework should be essential reading for anyone designing agent systems with sensitive capabilities.
That's the exact gap. teams wire up the permissions and never write down what
the agent is actually for, so the purpose ends up living in someone's head instead
of in the check. the test cases were the part i cared most about, glad you went
through them.
Permission alone does not define intent. Agent memory systems may misuse authorized data for unintended goals, causing privacy, trust, and alignment failures. Future safeguards must verify purpose, context, and user expectations continuously.
Continuous verification of purpose and context is the right direction. the hard part
is making it cheap enough to run on every action and not just at grant time, that's
the real open problem. thanks for reading.