DEV Community

Michael "Mike" K. Saleme
Michael "Mike" K. Saleme

Posted on

Every agent passport layer is grading its own exam

A new layer is consolidating in the agent stack, and it has a name now: pre-action authorization. The idea is clean. Before an agent executes a tool call, a deterministic policy engine intercepts it, checks it against declarative rules, and signs an audit record. The model proposes; the gateway disposes.

This pattern is real and it is shipping.

In Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents (arXiv 2603.20953), Uchi Uchibeke specifies it precisely: authorization "runs at the framework layer, not the model's reasoning layer. Prompt injection cannot override it." Same inputs, same decision, no model in the evaluation path. The Agent Passport System (APS) ships the same shape in production form — Ed25519 identities, scoped delegation that can only narrow, a three-signature action chain.

The architecture is right. The protocol layer cannot enforce safety, so a deterministic gateway above it must. NSA's June MCP advisory says the same thing from the defensive side: deny-by-default, scope everything, sign every message.

So the design is converging. Here is the part that isn't.

Self-attestation is not resistance

Every implementation in this layer is tested by the people who built it.

OAP reports a striking number: social engineering succeeded against the bare model 74.6% of the time, and 0% against a restrictive OAP policy across 879 attempts. Read the limitations section, in the author's own words: the attackers "self-select and skew toward social engineering rather than protocol-level attacks; results may not generalize to APT-grade adversaries." It's a self-run bounty, by the spec author, against a self-selected crowd. That's not a criticism of OAP — it's an honest disclosure most of this field doesn't make.

APS goes further and says the quiet part in its own README: "A valid signature is not a valid claim." It enumerates receipts that are cryptographically perfect and must still be rejected — wrong claim, expired delegation, revoked delegation. The team clearly understands the gap. And its conformance suite? Byte-level. It verifies that two implementations canonicalize identically — interoperability — and states plainly it "does not replace dynamic test execution."

So we have two kinds of testing in this layer, and neither is the one that matters most:

Self-run adversarial evals, tied to the implementation that's being graded.

Byte-level conformance, which proves two systems agree, never that either one is right.

Conformance proves agreement. It never proves resistance.

The missing discipline

What this layer does not have is a neutral adversary — a third-party harness that takes any pre-action-authorization gateway, regardless of who built it, and attempts protocol-level bypass, scope-boundary escalation, delegation-chain abuse, and replay. One that scores resistance, not self-attested policy.

This pattern already exists everywhere else in security. TLS implementations don't get to publish their own interop test as proof of security — they face independent test suites and external attack. Payment terminals submit to PCI test labs they don't control. The entire premise of a trust layer is that its trust is externally verifiable. A passport you grade yourself is a name tag.

The agent-identity layer is being built right now, fast. NIST's AI Agent Standards Initiative (Feb 2026) made identity one of three pillars. OWASP's Top 10 for Agentic Applications (2026) added ASI04 — agentic supply chain — and ASI07, insecure inter-agent communication. MCP moved to OAuth 2.1 with RFC 8707 resource-scoped tokens. Every one of these is a control surface that will ship with a vendor's own test results attached.

The slot for the independent adversary is open. Not because no one can fill it — because the people building the gateways are, understandably, building gateways, not the thing that attacks them.

What an adversarial conformance harness looks like

I've been building the attacker's half of this for the protocol below it. The Agent Security Harness runs 474 adversarial tests against MCP and agent endpoints — it forges elevated OAuth scopes and checks they're rejected (AUTH-003), it plants command-execution canaries in the handshake (MCP-017), it walks delegation chains looking for authority that should have narrowed and didn't.

That last category is exactly what the passport layer needs and doesn't yet have a neutral version of: take a signed delegation, attempt to use it beyond its scope, and score whether the gateway holds. APS's own model says authority "can only decrease at each transfer point." Good. Now prove it against an adversary who didn't write the gateway.

The honest framing: I test the protocol layer today, not the passport layer. The passport layer's adversarial conformance is unbuilt — by me or anyone. I'm naming it because the design has converged far enough that the gap is now the most important thing in the picture.

A passport proves who an agent is. It does not prove that identity can't be turned against you. The first one is a signature. The second one only a determined adversary can certify — and right now, in this layer, the only adversary in the room is the one who built the lock.


Sources: arXiv 2603.20953; github.com/aeoess/agent-passport-system; NIST AI Agent Standards Initiative (Feb 2026); OWASP Top 10 for Agentic Applications 2026; MCP Authorization spec (RFC 8707).

Top comments (0)