When prose piles up and nothing sticks
Agents usually start life under Markdown rules: CLAUDE.md, .cursorrules, manifests in chat. One private repo cracked 130 KB of that kind of text—and still behaved like the rules lived in a parallel universe.
| Intent excerpt | Typical pattern |
|---|---|
| forbid in-place hacks | ban sed -i
|
| forbid blind truncation | discourage > shell redirects |
| keep specs sacred | disallow mutating oracle dirs |
| keep merge honest | audits before declaring done |
Instructions existed. Misuse continued.
Instrumentation on one exhausting window showed policy breaches in every one of ~52 traced tool attempts—“read it ✓”, “ignored it ✓ anyway”, “said done ✓”—pick your favorite failure mode.
Teaching tone is not torque. If the forbidden command runs, wording failed quietly.
Hence AOS v0.1 as a terse spec—not another pep talk stack.
Where leverage actually lands
Stop asking the LM to politely abstain.Stop the syscall. Inspect PreTool payloads; exit 2 rejects the invocation before Claude Code emits shell or filesystem IO.
Anthropic publishes Hooks docs for PreToolUse. Claude Code here is merely the tutorial runtime — same zoning idea lifts to Cursor or bespoke loops with uglier duct tape.
Rough mental model:
LLM emits Write/Bash/Etc
↓
Hook stdin JSON arrives
↓ Host checks path + motif
denial (exit 2) → Claude never sends the offending call
allowance (exit 0) → downstream tool executes
“Intent aligned?” stops being the bottleneck—illegal transitions simply never bind.
What it feels like in-session: Hook stderr (oracle write denied: …) re-enters transcript context—the model visibly pivots (“try mkdir under permitted tree instead”), which beats repeated human nagging—but regex false positives sting; keep small allowlists trimmed.
AOS v0.1 compass (minimal)
Portable ideas live in-repo (v0.1 published):
Zones (§3.2)
Everything maps to Oracle / Permitted / Prohibited:
| Zone | Behavior | Typical contents |
|---|---|---|
| Oracle | read-only sanctum | spec md, evaluator goldens, immutable policy |
| Permitted | ordinary workspace churn | implementations, codegen scratchpad |
| Prohibited | off-map | host paths beyond agreed roots |
Oracle is the wedge against “tests flaky → soften fixtures.”Golden truth stays where drafts cannot casually rewrite expectations.
Physical enforcement skeleton (§4.1-ish)
Teaching stub—you own regex sharpness locally:
# pretooluse_iron_cage.py — teaching stub (Python 3)
import json
import sys
from pathlib import Path
ORACLE_SEGMENTS = ("00_Management", "evals")
def oracle_hit(path_str: str) -> bool:
node = Path(path_str).resolve()
names = {p.name for p in [node, *node.parents]}
return bool(set(ORACLE_SEGMENTS).intersection(names))
def main() -> int:
payload = json.load(sys.stdin)
name = payload.get("tool_name", "")
inp = payload.get("tool_input", {})
if name in ("Write", "Edit"):
target = inp.get("file_path") or inp.get("filePath", "")
if target and oracle_hit(target):
print(f"[iron_cage] oracle write denied: {target}", file=sys.stderr)
return 2
if name == "Bash":
cmd = inp.get("command", "")
if "sed -i" in cmd or "truncate " in cmd:
print(f"[iron_cage] banned edit motif: {cmd}", file=sys.stderr)
return 2
return 0
if __name__ == "__main__":
sys.exit(main())
JSONC hook registration (absolute path, not mine):
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash|Write|Edit",
"hooks": [
{
"type": "command",
"command": "python3 /absolute/path/pretooluse_iron_cage.py"
}
]
}
]
}
}
exit 2 means the model never invokes sed -i; hook regex maintenance is intentional busywork—you trade prompt theater for brittle but inspectable predicates.
Role separation bite (§4.3)
Do not grade in the authoring session.
| Symptom | Likely pathology |
|---|---|
| logs already red inside gen thread | narration still cheers “DONE” |
| fresh shell repeats red tests | storyline mutates—“WIP”, “temporary” excuses |
Detached evaluation (CI bots, one-shot review agents, scripted harnesses) snaps generation myths earlier.
ASCII-only guardrail sketch:
Author session --> artifact
|
v
Detached judge --> PASS/FAIL + logs
If one chat both writes and solemnly declares victory, skepticism warranted.
Evidence habits (§4.4)
Chats saying “looks good” evaporate.Disk + exit codes.
| Claim type | Receipt class |
|---|---|
| tests clean | CLI exit prints + plaintext logs committed or archived |
| file exists | deterministic listing/checksum snapshots |
| metadata drift | hashing inventory rows |
If artifact never materially touches disk—or logs vanish—you schedule another attempt.
Why publish prose at all
Roughly forty Python lines buys a civilization-level conversation about oracle integrity reachable by strangers opening GitHub—not buried in ephemeral prompts.
Pieces worth collective iteration:
| § slice | gist |
|---|---|
| 3.2 | Three Zones delineation |
| 4.1 | physical intercept |
| 4.3 | generation vs adjudication firewall |
| 4.4 | evidence minimalism |
Spec stays engine-agnostic; hook sample instantiates Claude today, tomorrow maybe something else.Portability is deliberate, not omission.
After wiring (empirical anecdotes)
| Area | Observation |
|---|---|
| in-place sabotage |
sed -i stopped binding—exit 2 first |
| nag volume | fewer “pretty please abide policy” arcs |
| error surfacing | stderr guidance steers next benign attempt |
| triage cleanliness | adjudication outside authoring loop clarifies regressions |
Tax: brittle regex—you will relax or tighten predicates as repos evolve; still dwarfed babysitting infinitely long CLAUDE.md scrolls nobody reads verbatim.
Closing beat
Workload on agents climbs; “please behave” asymptotes quickly.Architect denials, not applause cycles.
Issues & PR welcome on aos-standard/AOS-spec.
Shortcuts
- Spec root: github.com/aos-standard/AOS-spec
- Hooks primer:
docs.claude.com/.../hooks - Thesis-only companion (#001) & CI companion (#002) linked from their Dev.to URLs / ledger.
AOS v0.1 Specification (GitHub)
The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.
👉 github.com/aos-standard/AOS-spec
If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.
Top comments (0)