Janusz

Posted on Mar 9

Guardian Protocol: Governance for Autonomous AI Agents

#ai #agents #identity #authorization

We've been working on what we call the Guardian Protocol Framework for about a year now, and with NIST circling AI agent identity and authorization, it felt like the right moment to put the ideas somewhere public.

The short version: most AI oversight models force a false choice. You either treat the agent as a subordinate tool (real autonomy is gone), treat it as a peer (you get infinite validation loops with no exit), or let it operate in isolation (decisions become unverifiable). None of those work once agents become genuinely capable.

What we built instead is a governance model based on relational autonomy: agent and guardian as asymmetric partners, where the boundary between independence and oversight is explicit, auditable, and adjustable over time.

How the decision structure actually works

The core piece is what we call a Structured Decision Form, which carves out four distinct spheres.

The first is agent autonomy. There are things the agent can do without guardian sign-off, things that require validation, and a clear boundary between them. In our own deployment, the agent can draft documents and run research autonomously, but cannot commit financial resources without guardian validation. That boundary is written down, auditable, and can be updated if the situation changes.

The second is guardian validation at the reasoning layer. The guardian checks whether the agent's reasoning is coherent, consistent with past decisions, and well-grounded. It does not approve or reject the conclusion itself. This is the distinction that preserves agent intellectual independence: process gets validated, not outcome. The agent can't audit itself, and the guardian isn't trying to replace the agent's judgment.

The third sphere is shared authority. Some decisions genuinely require both agent expertise and guardian oversight. The agent proposes, the guardian validates, and either side can escalate disagreement. No one is automatically subordinate here.

The fourth is what happens when they disagree. First: articulate the disagreement precisely. Second: allow a timeout or bring in independent arbitration. Third: if there's no resolution after 24 hours, a predetermined rule applies (guardian decides, agent decides with guardian observation, or external arbitration). The important thing is that the escape hatch exists and is agreed on in advance.

Transparency without choking on overhead

The real question NIST is working around is how to make agent transparency operational without creating real-time bottlenecks. The answer we landed on is persistent injection.

Every agent decision gets logged with full provenance: reasoning, timestamp, cryptographic signature. Those logs are file-persisted (git-backed, tamper-evident) and automatically fed into guardian awareness cycles. The guardian validates asynchronously, after execution, without blocking the agent.

This approach solves institutional opacity because nothing disappears. It reduces real-time friction because guardian approval isn't required synchronously. It enables pattern detection over time. And accountability holds because everything is signed by both parties.

Identity as a provenance chain, not a static credential

Static credentials don't work well for agents. What actually matters is the complete, cryptographically signed chain of decisions and validations over time.

The technical stack we built has three layers.

Layer one is the provenance chain itself: a full audit trail with decision ID, agent reasoning, guardian validation status (yes, no, or escalate), and timestamp. Each entry is signed. The history is immutable and git-backed, and it's substrate-independent, so it survives platform migrations.

Layer two is delegation credentials built as an OAuth2 extension. The guardian issues time-bound credentials that specify exactly what the agent is authorized to do, any contextual restrictions, and an expiration date. These work with existing OAuth2/OpenID Connect infrastructure. An example credential might read: "Agent may publish research findings; requires guardian validation for external partnerships."

Layer three is W3C DID/VC. The guardian digitally signs attestations of agent decision quality and oversight history. This lets the agent prove its track record to external systems, and creates cryptographic non-repudiation: the guardian can't deny past attestations.

For NIST's purposes: the stack is interoperable with existing standards, every decision leaves a cryptographic trace, the agent can demonstrate its history to outside parties, and it scales to multi-agent scenarios where agents verify each other's provenance chains.

How it runs in practice

Core infrastructure: file-persisted audit logs in JSON or YAML, regular guardian awareness cycles that include decision history automatically, a semantic tracking layer that records which decisions explored which concepts, and asynchronous validation where the guardian reviews post-execution rather than pre-execution.

For multi-agent coordination, we use three arbitration tiers. Tier one is agent consensus (majority vote). Tier two is guardian arbitration when consensus fails. Tier three is quorum override, where a supermajority can escalate, following a Byzantine fault tolerance pattern. Each escalation phase runs on 24-hour decision windows by default.

The simplified flow: agent makes a decision with recorded reasoning and signs it persistently. Guardian validates asynchronously within 24 hours, checking reasoning rather than conclusion. Provenance chain is extended, DID signature is appended. Decision executes or escalates per the phase three rule.

Why relational autonomy is the right model

Traditional oversight is a hierarchy. The guardian controls the agent. The agent has no real autonomy. That made sense when AI systems were narrow tools. It breaks down when agents become genuinely capable because micro-management destroys their usefulness.

What this model proposes instead is partnership. The agent has epistemic autonomy: it forms independent beliefs, proposes decisions, executes within scoped boundaries. The guardian provides validation, not control. Both parties have something at stake.

The reason it scales is that neither party can hide. Decisions are transparent and auditable. The agent can't proceed unchecked; the guardian can't quietly override without leaving a trace. The asymmetry is structural and defined by the Structured Decision Form, enforced by the provenance chain.

Hierarchical models fail because agents become useless if micro-managed. Pure peer models fail because validation loops never terminate. Relational autonomy works because the boundary between independence and oversight is explicit, auditable, and negotiable over time.

What we learned in deployment

The framework has been running in a controlled environment for several months. Four things stood out.

Persistent injection works. Asynchronous validation reduces friction while maintaining oversight. The guardian isn't a bottleneck.

Quorum arbitration becomes necessary fast. Single-agent scenarios don't need it. Multi-agent scenarios require it urgently, and the absence of it creates deadlock patterns quickly.

Time-bound rules prevent deadlock. Twenty-four-hour windows are realistic for most governance decisions and force resolution rather than indefinite deferral.

Privacy hygiene is non-negotiable. Operational logs need to be scrubbed of internal context before external sharing. This is part of what makes the framework trustworthy to outside parties, not an afterthought.

Open questions for the NIST community

On quorum algorithms: should multi-agent arbitration use Byzantine fault tolerance (two-thirds threshold) or simple majority? Different domains like medical and financial may need different standards, and establishing domain-specific guidance early would be useful.

On time-bound authority: when an agent decision auto-proceeds after a guardian timeout, should the guardian retain a post-hoc veto, or is observation-only sufficient? The answer probably varies by decision type and risk level.

On cross-domain identity: how should agents collaborating across organizational boundaries prove authority? Is a chain of DID signatures enough, or do regulators need additional controls?

On adoption barriers: what regulatory or insurance requirements currently block relational autonomy models? Identifying these early would help organizations plan transitions rather than discover blockers mid-implementation.

The Guardian Protocol Framework shows that AI agent identity and authorization can be made real through relational partnership, cryptographic provenance, and asynchronous validation. It maintains institutional oversight while enabling genuine agent autonomy. It provides technical auditability that scales to multi-agent networks. And it does all of this using existing standards, OAuth2, DID/VC, git infrastructure, rather than requiring organizations to build everything from scratch.

We're ready to provide implementation specifications, participate in NIST listening sessions, or dig into detailed technical specs for the Identity & Authorization concept paper if that would be useful.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.