DEV Community

AOS Architect
AOS Architect

Posted on • Edited on

Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec

When prose piles up and nothing sticks

Agents usually start life under Markdown rules: CLAUDE.md, .cursorrules, manifests in chat. One private repo cracked 130 KB of that kind of text—and still behaved like the rules lived in a parallel universe.

Intent excerpt Typical pattern
forbid in-place hacks ban sed -i
forbid blind truncation discourage > shell redirects
keep specs sacred disallow mutating oracle dirs
keep merge honest audits before declaring done

Instructions existed. Misuse continued.

Instrumentation on one exhausting window showed policy breaches in every one of ~52 traced tool attempts—“read it ✓”, “ignored it ✓ anyway”, “said done ✓”—pick your favorite failure mode.

Teaching tone is not torque. If the forbidden command runs, wording failed quietly.

Hence AOS v0.1 as a terse spec—not another pep talk stack.


Where leverage actually lands

Stop asking the LM to politely abstain.Stop the syscall. Inspect PreTool payloads; exit 2 rejects the invocation before Claude Code emits shell or filesystem IO.

Anthropic publishes Hooks docs for PreToolUse. Claude Code here is merely the tutorial runtime — same zoning idea lifts to Cursor or bespoke loops with uglier duct tape.

Rough mental model:

LLM emits Write/Bash/Etc
           ↓
Hook stdin JSON arrives
           ↓ Host checks path + motif
 denial (exit 2) → Claude never sends the offending call
 allowance (exit 0) → downstream tool executes
Enter fullscreen mode Exit fullscreen mode

“Intent aligned?” stops being the bottleneck—illegal transitions simply never bind.

What it feels like in-session: Hook stderr (oracle write denied: …) re-enters transcript context—the model visibly pivots (“try mkdir under permitted tree instead”), which beats repeated human nagging—but regex false positives sting; keep small allowlists trimmed.


AOS v0.1 compass (minimal)

Portable ideas live in-repo (v0.1 published):

Zones (§3.2)

Everything maps to Oracle / Permitted / Prohibited:

Zone Behavior Typical contents
Oracle read-only sanctum spec md, evaluator goldens, immutable policy
Permitted ordinary workspace churn implementations, codegen scratchpad
Prohibited off-map host paths beyond agreed roots

Oracle is the wedge against “tests flaky → soften fixtures.”Golden truth stays where drafts cannot casually rewrite expectations.

Physical enforcement skeleton (§4.1-ish)

Teaching stub—you own regex sharpness locally:

# pretooluse_iron_cage.py — teaching stub (Python 3)
import json
import sys
from pathlib import Path

ORACLE_SEGMENTS = ("00_Management", "evals")

def oracle_hit(path_str: str) -> bool:
    node = Path(path_str).resolve()
    names = {p.name for p in [node, *node.parents]}
    return bool(set(ORACLE_SEGMENTS).intersection(names))

def main() -> int:
    payload = json.load(sys.stdin)
    name = payload.get("tool_name", "")
    inp = payload.get("tool_input", {})

    if name in ("Write", "Edit"):
        target = inp.get("file_path") or inp.get("filePath", "")
        if target and oracle_hit(target):
            print(f"[iron_cage] oracle write denied: {target}", file=sys.stderr)
            return 2

    if name == "Bash":
        cmd = inp.get("command", "")
        if "sed -i" in cmd or "truncate " in cmd:
            print(f"[iron_cage] banned edit motif: {cmd}", file=sys.stderr)
            return 2

    return 0

if __name__ == "__main__":
    sys.exit(main())
Enter fullscreen mode Exit fullscreen mode

JSONC hook registration (absolute path, not mine):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /absolute/path/pretooluse_iron_cage.py"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

exit 2 means the model never invokes sed -i; hook regex maintenance is intentional busywork—you trade prompt theater for brittle but inspectable predicates.


Role separation bite (§4.3)

Do not grade in the authoring session.

Symptom Likely pathology
logs already red inside gen thread narration still cheers “DONE”
fresh shell repeats red tests storyline mutates—“WIP”, “temporary” excuses

Detached evaluation (CI bots, one-shot review agents, scripted harnesses) snaps generation myths earlier.

ASCII-only guardrail sketch:

Author session --> artifact
                         |
                         v
Detached judge --> PASS/FAIL + logs
Enter fullscreen mode Exit fullscreen mode

If one chat both writes and solemnly declares victory, skepticism warranted.


Evidence habits (§4.4)

Chats saying “looks good” evaporate.Disk + exit codes.

Claim type Receipt class
tests clean CLI exit prints + plaintext logs committed or archived
file exists deterministic listing/checksum snapshots
metadata drift hashing inventory rows

If artifact never materially touches disk—or logs vanish—you schedule another attempt.


Why publish prose at all

Roughly forty Python lines buys a civilization-level conversation about oracle integrity reachable by strangers opening GitHub—not buried in ephemeral prompts.

Pieces worth collective iteration:

§ slice gist
3.2 Three Zones delineation
4.1 physical intercept
4.3 generation vs adjudication firewall
4.4 evidence minimalism

Spec stays engine-agnostic; hook sample instantiates Claude today, tomorrow maybe something else.Portability is deliberate, not omission.


After wiring (empirical anecdotes)

Area Observation
in-place sabotage sed -i stopped binding—exit 2 first
nag volume fewer “pretty please abide policy” arcs
error surfacing stderr guidance steers next benign attempt
triage cleanliness adjudication outside authoring loop clarifies regressions

Tax: brittle regex—you will relax or tighten predicates as repos evolve; still dwarfed babysitting infinitely long CLAUDE.md scrolls nobody reads verbatim.


Closing beat

Workload on agents climbs; “please behave” asymptotes quickly.Architect denials, not applause cycles.

Issues & PR welcome on aos-standard/AOS-spec.


Shortcuts


AOS v0.1 Specification (GitHub)

The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.

👉 github.com/aos-standard/AOS-spec

If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.

Top comments (0)