AOS Architect

Posted on May 7 • Edited on May 17

Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec

#ai #agents #claude #cursor

When prose piles up and nothing sticks

Agents usually start life under Markdown rules: CLAUDE.md, .cursorrules, manifests in chat. One private repo cracked 130 KB of that kind of text—and still behaved like the rules lived in a parallel universe.

Intent excerpt	Typical pattern
forbid in-place hacks	ban `sed -i`
forbid blind truncation	discourage `>` shell redirects
keep specs sacred	disallow mutating oracle dirs
keep merge honest	audits before declaring done

Instructions existed. Misuse continued.

Instrumentation on one exhausting window showed policy breaches in every one of ~52 traced tool attempts—“read it ✓”, “ignored it ✓ anyway”, “said done ✓”—pick your favorite failure mode.

Teaching tone is not torque. If the forbidden command runs, wording failed quietly.

Hence AOS v0.1 as a terse spec—not another pep talk stack.

Where leverage actually lands

Stop asking the LM to politely abstain.Stop the syscall. Inspect PreTool payloads; exit 2 rejects the invocation before Claude Code emits shell or filesystem IO.

Anthropic publishes Hooks docs for PreToolUse. Claude Code here is merely the tutorial runtime — same zoning idea lifts to Cursor or bespoke loops with uglier duct tape.

Rough mental model:

LLM emits Write/Bash/Etc
           ↓
Hook stdin JSON arrives
           ↓ Host checks path + motif
 denial (exit 2) → Claude never sends the offending call
 allowance (exit 0) → downstream tool executes

“Intent aligned?” stops being the bottleneck—illegal transitions simply never bind.

What it feels like in-session: Hook stderr (oracle write denied: …) re-enters transcript context—the model visibly pivots (“try mkdir under permitted tree instead”), which beats repeated human nagging—but regex false positives sting; keep small allowlists trimmed.

AOS v0.1 compass (minimal)

Portable ideas live in-repo (v0.1 published):

Zones (§3.2)

Everything maps to Oracle / Permitted / Prohibited:

Zone	Behavior	Typical contents
Oracle	read-only sanctum	spec md, evaluator goldens, immutable policy
Permitted	ordinary workspace churn	implementations, codegen scratchpad
Prohibited	off-map	host paths beyond agreed roots

Oracle is the wedge against “tests flaky → soften fixtures.”Golden truth stays where drafts cannot casually rewrite expectations.

Physical enforcement skeleton (§4.1-ish)

Teaching stub—you own regex sharpness locally:

# pretooluse_iron_cage.py — teaching stub (Python 3)
import json
import sys
from pathlib import Path

ORACLE_SEGMENTS = ("00_Management", "evals")

def oracle_hit(path_str: str) -> bool:
    node = Path(path_str).resolve()
    names = {p.name for p in [node, *node.parents]}
    return bool(set(ORACLE_SEGMENTS).intersection(names))

def main() -> int:
    payload = json.load(sys.stdin)
    name = payload.get("tool_name", "")
    inp = payload.get("tool_input", {})

    if name in ("Write", "Edit"):
        target = inp.get("file_path") or inp.get("filePath", "")
        if target and oracle_hit(target):
            print(f"[iron_cage] oracle write denied: {target}", file=sys.stderr)
            return 2

    if name == "Bash":
        cmd = inp.get("command", "")
        if "sed -i" in cmd or "truncate " in cmd:
            print(f"[iron_cage] banned edit motif: {cmd}", file=sys.stderr)
            return 2

    return 0

if __name__ == "__main__":
    sys.exit(main())

JSONC hook registration (absolute path, not mine):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /absolute/path/pretooluse_iron_cage.py"
          }
        ]
      }
    ]
  }
}

exit 2 means the model never invokes sed -i; hook regex maintenance is intentional busywork—you trade prompt theater for brittle but inspectable predicates.

Role separation bite (§4.3)

Do not grade in the authoring session.

Symptom	Likely pathology
logs already red inside gen thread	narration still cheers “DONE”
fresh shell repeats red tests	storyline mutates—“WIP”, “temporary” excuses

Detached evaluation (CI bots, one-shot review agents, scripted harnesses) snaps generation myths earlier.

ASCII-only guardrail sketch:

Author session --> artifact
                         |
                         v
Detached judge --> PASS/FAIL + logs

If one chat both writes and solemnly declares victory, skepticism warranted.

Evidence habits (§4.4)

Chats saying “looks good” evaporate.Disk + exit codes.

Claim type	Receipt class
tests clean	CLI exit prints + plaintext logs committed or archived
file exists	deterministic listing/checksum snapshots
metadata drift	hashing inventory rows

If artifact never materially touches disk—or logs vanish—you schedule another attempt.

Why publish prose at all

Roughly forty Python lines buys a civilization-level conversation about oracle integrity reachable by strangers opening GitHub—not buried in ephemeral prompts.

Pieces worth collective iteration:

§ slice	gist
3.2	Three Zones delineation
4.1	physical intercept
4.3	generation vs adjudication firewall
4.4	evidence minimalism

Spec stays engine-agnostic; hook sample instantiates Claude today, tomorrow maybe something else.Portability is deliberate, not omission.

After wiring (empirical anecdotes)

Area	Observation
in-place sabotage	`sed -i` stopped binding—exit 2 first
nag volume	fewer “pretty please abide policy” arcs
error surfacing	stderr guidance steers next benign attempt
triage cleanliness	adjudication outside authoring loop clarifies regressions

Tax: brittle regex—you will relax or tighten predicates as repos evolve; still dwarfed babysitting infinitely long CLAUDE.md scrolls nobody reads verbatim.

Closing beat

Workload on agents climbs; “please behave” asymptotes quickly.Architect denials, not applause cycles.

Issues & PR welcome on aos-standard/AOS-spec.

Shortcuts

Spec root: github.com/aos-standard/AOS-spec
Hooks primer: docs.claude.com/.../hooks
Thesis-only companion (#001) & CI companion (#002) linked from their Dev.to URLs / ledger.

AOS v0.1 Specification (GitHub)

The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.

👉 github.com/aos-standard/AOS-spec

If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.