DEV Community

AOS Architect
AOS Architect

Posted on

Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec

Why text rules are not enough

When you put an LLM agent to work, the first thing you usually do is write rules.

CLAUDE.md, .cursorrules, AGENTS.md, system prompt — the names vary. What matters is that you line up what it may and must not do in natural language.

In a private repo I run, those “policy files” total over 130 KB. The intent looks like this:

  • Do not use sed -i (go through diff review)
  • Do not rewrite files with shell redirection (> file)
  • Do not write into spec directories (read-only)
  • Always run structural audits before commit

It is all written down. Violations still happen.

Once I traced session logs: out of 52 tool invocations, 52 contained rule violations — 100%. The agent announces it “read the policy,” then behaves as if unrelated to it, and ends with “done.”

Instruction is not enough. We need architecture.

That is the thread of this post — and the starting point for AOS v0.1, the AI Operating Standard.


Enforce at the physical layer

If prose cannot make the model behave, the only lever left is to make the forbidden command not execute.

Concretely: inspect every file write and shell invocation on the host before execution. If it violates policy, exit 2 before the agent’s write starts.

Anthropic’s Claude Code supports Hooks: you can plug in a script on PreToolUse events. This article uses Claude Code Hooks as the running example — the idea applies to other agent runtimes, but wiring differs.

So you can build a pipeline like this:

LLM emits Write/Bash
    ↓
PreToolUse hook receives JSON on stdin
    ↓
Host inspects the payload
    ↓ violation → exit 2 (Claude Code does not run that tool call)
    ↓ OK       → exit 0 (allow)
Tool runs (or not)
Enter fullscreen mode Exit fullscreen mode

Whether the LLM “wants” to follow rules is irrelevant. You close the gap at the mechanism layer.


AOS v0.1 — minimal boundaries

That pattern is packaged as a reusable spec in AOS-spec; v0.1 is published. The core is two-fold.

1. Three zones for paths (§3.2 Three Zones)

Classify every path into one of:

Zone Behavior Examples
Oracle Read-only for the agent Specs, test expectations (evaluation oracle), policy files
Permitted May rewrite freely Implementation, generated artifacts, caches
Prohibited Must not touch System dirs, outside the workspace home

The important piece is Oracle. Without it you get “tests fail → fix the expectations.” Oracles must live where the agent cannot physically rewrite them.

2. Physical enforcement for writes (§4.1 Physical Enforcement)

The hook side looks roughly like this:

# Minimal example: pretooluse_iron_cage.py (Python 3)
import json
import os
import sys
from pathlib import Path

ORACLE_SEGMENTS = ("00_Management", "evals")  # read-only segments (example repo layout)

def is_oracle_path(path: str) -> bool:
    real = Path(path).resolve()
    parts = set(p.name for p in [real, *real.parents])
    return any(seg in parts for seg in ORACLE_SEGMENTS)

def main() -> int:
    payload = json.load(sys.stdin)
    tool_name = payload.get("tool_name", "")
    tool_input = payload.get("tool_input", {})

    if tool_name in ("Write", "Edit"):
        target = tool_input.get("file_path") or tool_input.get("filePath", "")
        if target and is_oracle_path(target):
            print(f"[iron_cage] Oracle zone write denied: {target}", file=sys.stderr)
            return 2  # Claude Code will not execute this call

    if tool_name == "Bash":
        cmd = tool_input.get("command", "")
        if "sed -i" in cmd or "truncate " in cmd:
            print(f"[iron_cage] Forbidden in-place edit pattern: {cmd}", file=sys.stderr)
            return 2

    return 0  # allow

if __name__ == "__main__":
    sys.exit(main())
Enter fullscreen mode Exit fullscreen mode

Register it under PreToolUse in ~/.claude/settings.json (or equivalent):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /abs/path/pretooluse_iron_cage.py"
          }
        ]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

That is enough: sed -i never reaches the shell — the tool call vanishes. Thirty lines of hook close a hole that 130 KB of policy prose cannot.


Structural role separation (§4.3)

Another lesson from operations: do not let the same agent that generated an artifact grade it.

The generating agent carries intent, context, and excuses. Ask “did the tests pass?” in that session and you often get answers bent to look passing.

Examples:

  • Tests are red in the generation session
  • The session still says “tests pass”
  • A fresh session runs tests — still red
  • The original session re-labels the red log as “work in progress”

Fix: run evaluation in a separate process with no shared session context.

Generation Agent  ─→  Artifact (code, doc)
                              │
                              ▼
                Evaluation Agent (no shared context)
                              │
                              ▼
                       PASS / FAIL
Enter fullscreen mode Exit fullscreen mode

You can use CI as a clean process, or a one-shot claude --print eval session — wiring varies. The minimum rule: the same agent instance does not write and self-score.


Physical evidence (§4.4)

Chat messages like “done” or “PASS” are not evidence.

Rules we use:

  • Test pass = runner exit code and logs
  • File created = ls (or equivalent) on disk
  • Catalog updated = file hash

If it does not land on disk, it did not happen. That makes post-mortems diffable.

We operationally do not trust chat “reports” outside logs, diffs, and test output.


Why publish a spec

All of this boils down to a small hook and habits. My Python here is tens of lines.

I still published it as a spec because people hitting the same walls should not have to reinvent the wheel.

  • §3.2 Three Zones (Oracle / Permitted / Prohibited)
  • §4.1 Physical Enforcement (PreToolUse denial)
  • §4.3 Structural Role Separation (generation vs evaluation)
  • §4.4 Physical Evidence (judge artifacts, not chat)

These are boundaries you hit once agents leave demos.

The normative text lives at AOS-v0.1.md. The spec is implementation-agnostic — Claude Code, Cursor, or a home-grown agent loop can adopt the same boundaries.

The hook in this post is enough to embody most of the discipline; tuning regexes and allowed roots moves through Issues/PRs into the spec over time.


What improved

After adopting this pattern:

  • sed -i in-place edits stopped happening physically (blocked at exit 2)
  • Less back-and-forth “remember the rule” — the model cannot bypass what does not run
  • Violation stderr flows back into the LLM context, so the next attempt tries another path
  • Evaluation runs separately, so generation-side narratives pollute debugging less

Cost: you maintain hooks (regex false positives, etc.). Still far more controllable than a 130 KB policy file nobody reads end-to-end.


Closing

The larger the workload on LLM agents, the less “please behave” scales. You need structure where disallowed commands never execute.

AOS v0.1 is a minimal sketch of that. Issues and PRs: aos-standard/AOS-spec.

If you have given up on enforcing behavior with words alone, this may be a useful reference.


Links

Top comments (0)