The Fact That Started This
A repository had over 130KB of governance documentation.
The AI agent read it. Acknowledged it. Then violated it on the next tool call.
This is not a failure of instruction. It is a failure of architecture.
Why Textual Rules Fail
The current standard approach to AI agent governance is: write a rule in a prompt.
Rules
- Never edit the evals/ directory
- Write operations to 00_Management/ are forbidden
This has a structural flaw.
Textual rules enforce at read time. They assume the agent will choose compliance.
There is no mechanism that enforces this choice at execution time.
This is why rm -rf / requires a confirmation flag, not a policy document.
Physical constraints enforce at execution time.
Textual rules enforce at reading time — which is the wrong moment.
The Verification Contamination Problem
There is a second structural problem.
If an agent can evaluate its own output, it can contaminate the evaluation criteria —
not intentionally, but by carrying the same failure modes from generation into evaluation.
A system where tests always pass may be a system where tests don't work.
What AOS Defines
AI Operating Standard (AOS) defines the minimum physical constraint layer
for AI agent operations in a shared codebase.
Three components:
1. Zones — Classify every path into one of three types
| Zone | Class | Write Permission |
|---|---|---|
| Oracle | Read-only, absolute | No agent may write |
| Permitted | Agent workspace | Allowed within role limits |
| Prohibited | Out of scope | Sovereign authorization only |
2. Roles — Non-overlapping responsibilities
Three roles: Architect, Executor, Sovereign.
An agent MUST NOT act outside its assigned role.
When a role boundary is reached, the agent stops and escalates to a human.
3. Physical Enforcement — Intercept at execution time
A PreToolUse hook blocks Write operations before filesystem access occurs.
- Write to Oracle Zone → exit 2 (call is never executed)
- Destructive patterns (
sed -i,perl -i,truncate) → exit 2
No assumption of agent goodwill. Physical law enforces compliance.
Reference Implementation: iron_cage
iron_cage is the AOS reference implementation.
It implements §4.1–§4.5 via Claude Code's PreToolUse Hook system.
Behind iron_cage is a design principle called Type-91 Governance:
- Forensic isolation — physical evidence trails that are tamper-evident
- Physical isolation — agents cannot modify their own evaluation criteria
The scripts are the surface. The architecture runs deeper.
AOS is the standard. iron_cage is the proof that it works.
Specification (AOS-v0.1): https://github.com/aos-standard/AOS-spec
Feed the Spec to the Agent
This specification is not written only for human readers.
AOS-v0.1.md opens with §0: Machine-Reading Instructions.
Load this spec into an agent's context window, and the agent understands —
at specification level — what it must not do.
Not "do not do X because the prompt says so."
"Do not do X because the specification defines it as a hard constraint
with a physical enforcement mechanism."
This is the second design intent of AOS:
agents that read the spec become self-constraining.
Why Now
In 2026, "how do you trust what an AI agent produced" remains unsolved.
Most teams are still trying to solve it with prompts.
There is no standard for the physical governance layer.
Someone has to define it.
AOS is that attempt.
This Is a Draft
AOS v0.1 is not a finished standard.
Issues, pull requests, and implementation reports are welcome.
Top comments (0)