DEV Community: AOS Architect

Four ways production agents silently fail — and the physical patterns that prevent them (AOS v0.2)

AOS Architect — Wed, 03 Jun 2026 22:56:19 +0000

Four ways production agents silently fail

An LLM agent that felt great locally tends to break in the same places once you push it toward production:

Silent failure — swallows an exception and returns "done" with nothing on disk
No trace — claims the tests passed, but no file was ever written
Restart wipes state — only runs inside a session; a reboot means zero continuity
Self-inflicted violations go unreported — the agent that broke the rule reports nothing

None of these get fixed by "please be more careful." You have to make the broken state structurally impossible on the host side, before the agent's request goes through. The rest of this article is the four physical patterns (§10.1–§10.4) that map one-to-one onto these four failure modes.

Why physical? — what v0.1 established

The idea isn't new. In the AOS v0.1 article I laid out the minimal framework: constrain LLM agents with host-side physical constraints, not textual rules. Four pillars:

§3.2 Three Zones — classify every path as Oracle (read-only), Permitted (workspace), or Prohibited
§4.1 Hook Requirement — intercept writes and shell calls in a PreToolUse hook, block violations with exit 2
§4.3 Role Separation — the agent that generates an artifact must not be the sole evaluator of it
§4.4 Physical Evidence — completion is proven by a file on disk, not by a chat message

That's the "what should be constrained" layer — the boundary line. But one question always remained: "OK, but how do I actually implement this in a real tool?" The four failure modes above are exactly what leaks through that implementation gap, and v0.2 is what closes it.

What v0.2 does: keep the norms, add the examples

The approach is deliberately narrow:

§1–§9 normative text (MUST / MUST NOT) is unchanged — fully backward compatible
New §10 Implementation Examples — four production patterns, each linked to real code in a public repository
§6 renamed from Reference Implementation (singular) to Reference Implementations (plural) — removed the reference to an unpublished implementation, and pointed it at the real, public physical-agent-patterns repo instead

So it isn't "more spec words." It's "connect the spec words to code that already runs."

§10.1 Manifest declaration (maps to §8, §9)

An AOS-compliant tool declares its zone boundaries in manifest.json, so another agent can learn — before startup — where it may write and what it must not touch.

{
  "aos_compliant": "v0.2",
  "permitted_output_paths": ["docs/reports/"],
  "oracle_paths": ["evals/", "config/"]
}

oracle_paths maps directly to the §3.2 Oracle zone; the hook blocks writes here at execution time.
permitted_output_paths is the Permitted zone — the only place the tool may produce output.

The key rule: declaration without enforcement is non-compliant (§8, final paragraph). Writing aos_compliant in your manifest means nothing unless a hook (or CI gate) actually blocks writes to oracle_paths.

§10.2 Physical evidence (maps to §4.4)

The failure mode AOS targets: an agent claims it "ran" but left no trace. The physical-first pattern makes evidence a precondition of completion, not an afterthought.

# Write evidence BEFORE declaring done (from agent_with_evidence.py)
evidence = {
    "task": task,
    "result": result_text,
    "timestamp": datetime.date.today().isoformat(),
    "model": model,
}
evidence_path.write_text(json.dumps(evidence, indent=2))
# Only after the file exists: print completion
print(f"[done] Evidence written: {evidence_path}")

A caller verifies completion just by checking that evidence_path exists. No conversational assertion required.

Source: physical-agent-patterns/patterns/02_physical-first/agent_with_evidence.py

§10.3 Immune loop (maps to §4.1, §4.5)

A running agent detects AOS violations in the workspace and triggers a repair sequence. The crucial part: detection (read-only scan) is separated from repair (write).

# violation_detector.py — write the report BEFORE any repair attempt
violations = _scan(root)
report = {
    "timestamp": datetime.datetime.utcnow().isoformat(),
    "violations": violations,
}
report_path.write_text(json.dumps(report, indent=2))

The detector writes a JSON violation report (itself §4.4 evidence). The repair planner reads it and either applies known fixes or escalates to the Sovereign when a design decision is required (§4.5). The detector never repairs its own findings — which also satisfies §4.3 role separation.

Source: physical-agent-patterns/patterns/03_immune-loop/

§10.4 systemd runtime (maps to §4.4, persistence)

An agent that only runs interactively can't satisfy §4.4 across reboots. The systemd pattern binds the agent to the OS process supervisor: the service defines the execution boundary, the timer enforces the schedule, and output files survive restarts.

# agent.py — the output file is the evidence of the run
output_path = OUTPUT_DIR / f"agent_run_{today}.md"
if output_path.exists():
    print(f"[skip] Output already exists for {today}: {output_path}")
    return output_path
# ... run and write ...
output_path.write_text(content)

# physical-agent.timer (excerpt)
[Timer]
OnCalendar=daily
Persistent=true

The idempotency guard (if output_path.exists(): return) prevents duplicate runs while keeping the evidence file as the canonical completion record. Persistent=true fires a missed run on next boot — so the evidence requirement holds regardless of uptime.

Source: physical-agent-patterns/patterns/01_systemd-runtime/

The four patterns mapped to AOS sections

Pattern	AOS section	In one line
Manifest declaration	§8, §9	Declare writable zones in a machine-readable way
Physical evidence	§4.4	An evidence file is the precondition for completion
Immune loop	§4.1, §4.5	Separate violation detection from repair/escalation
systemd runtime	§4.4 (persistence)	Keep evidence across restarts

None of these are clever inventions. They're just the boundaries you always hit when you push agents toward production, factored into reusable form.

Why bake implementation examples into the spec

A common failure mode for specs: the norms are solid, but nobody has a starting point. A reader finishes thinking "I agree it's correct — now what's my first line of code?" and stalls.

v0.2 shrinks that distance. Every clause now has clonable, runnable public code attached. The spec itself stays runtime-agnostic (Claude Code / Cursor / your own loop), but the examples make it cheaper for the second person to adopt it.

All four §10 patterns live in physical-agent-patterns, so you can git clone and read them directly.

Wrapping up

AOS v0.2 is not a version that adds new constraints. It's the version that fills in how to implement the constraints, with links to working code.

Normative text (v0.2): AOS-spec/AOS-v0.2.md
Implementation patterns: physical-agent-patterns

If you've felt that "textual rules alone can't keep agents in line," I hope this gives you a concrete starting point.

AOS specification (GitHub)

The "physical governance" approach in this article is specified and published as AOS (AI Operating Standard). v0.2 adds the implementation-examples section.

👉 AOS-spec — the spec (v0.2)
👉 physical-agent-patterns — implementation patterns

If the spec or the examples were useful, a ⭐ star helps shape the next version. Issues and PRs are welcome.

A mocked ad-copy CLI, real evals, and 30 Playwright cycles (tool 1027)

AOS Architect — Sun, 17 May 2026 13:56:57 +0000

What this is

Earlier posts in this series were mostly why agent work needs hard boundaries—not politeness, but paths, CI, and tooling you can point at. Here I stay on the boring side: a small repo-local CLI (internal id 1027) that prints JSON ad variants under mocks, with evaluators and repeated browser/CLI runs layered on top.

I am not selling model magic. Same inputs, same stubbed copies / best; that is the point when you need to explain behavior to someone who was not in the room when the demo ran.

Our public article ledger lists this draft as #004 next to the Japanese Zenn manuscript.

What the tool actually does

Inputs look like product, audience, and channel. Outputs are JSON: multiple copies, a picked best, and score-like fields from small heuristics (channel baselines, tiny nudges for length). Marketing can paste into a sheet; engineers can assert on stdout. Those two audiences rarely share one artifact, so fixing the boundary as JSON saves a lot of arguments later.

With --mock (and the bypass flag we use for local runs), the CLI does not call a remote LLM. A hash from the input tuple pins the stub, so the demo payload and the regression payload are the same object. When you show it externally, repeatable bytes beat a one-off “wow” completion.

DESIGN.md in the repo splits payment gates, outbound calls, and filesystem writes so static checks can police them. If you only take one idea from AOS here, it is the Oracle / Permitted / Prohibited split; the full contract lives in the GitHub spec linked at the bottom.

Walk-through with fixture-shaped inputs

Roughly what the evaluators exercise:

Field	Sample
Product	Migration enablement SaaS
Audience	Mid-market IT leaders
Channel	google

You always get copies, best, and deterministic scores with no outbound model call on that path. Later you can swap in a real generator behind the same shape; the lesson I care about is that the contract is narrower than the prose. JSON in logs beats parsing Markdown when you want spreadsheets, dashboards, or a second agent to judge output.

What the evals check

Three buckets, nothing fancy:

--mock path: stdout contains copies and best inside a success envelope (we use --bypass-payment where payment is not the subject of the test).
Static hygiene: small AST-based scripts reject “always true” assertions that look like coverage theater.
pytest adversarial marks: tests tagged @pytest.mark.adversarial must not be skipped by accident (pytest … -m adversarial). If a test is supposed to hurt, it should stay in the default pain path.

Most of this reads as busywork until the repo grows faster than your memory. Then you want failures to show up without someone remembering to tick the scary suite.

Thirty Playwright cycles (five checks each)

CLI tests alone miss a lot of wiring: paths, permissions, timers, how the process is launched. So we also run Playwright: one bundle of five checks, thirty times, all green in the report trail (payment refused without a real transaction, mock path succeeds, keywords in stdout—exact list is in the shipped log).

150 green runs sounds like a vanity stat. I use it differently: one lucky pass is cheap; many identical passes say the environment story is not a fluke. After you have watched CI go green for the wrong reason once, you start wanting volume, not a single badge.

If you want to copy the pattern into your own codebase, four questions are enough: Can you freeze the output shape (here, JSON)? Can you replay without the model? Can you hit it from CI and from a browser driver? Does your eval layer make “quietly skipped hard tests” awkward? This repo is a minimal yes on all four.

Trying it in practice

The AOS specification is open; the curated implementation bundle is not thrown on npm as a product. If you want to try it internally or talk through a serious eval, leave a short note on the companion Zenn post (Japanese) or open a scoped issue on aos-standard/AOS-spec and say what you are trying to do. I will answer where it is practical and keep spec debate separate from “can we ship you a build.”

AOS v0.1 Specification (GitHub)

The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.

👉 AOS-spec — specification
👉 physical-agent-patterns — implementation patterns

If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.

Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec

AOS Architect — Thu, 07 May 2026 11:38:02 +0000

When prose piles up and nothing sticks

Agents usually start life under Markdown rules: CLAUDE.md, .cursorrules, manifests in chat. One private repo cracked 130 KB of that kind of text—and still behaved like the rules lived in a parallel universe.

Intent excerpt	Typical pattern
forbid in-place hacks	ban `sed -i`
forbid blind truncation	discourage `>` shell redirects
keep specs sacred	disallow mutating oracle dirs
keep merge honest	audits before declaring done

Instructions existed. Misuse continued.

Instrumentation on one exhausting window showed policy breaches in every one of ~52 traced tool attempts—“read it ✓”, “ignored it ✓ anyway”, “said done ✓”—pick your favorite failure mode.

Teaching tone is not torque. If the forbidden command runs, wording failed quietly.

Hence AOS v0.1 as a terse spec—not another pep talk stack.

Where leverage actually lands

Stop asking the LM to politely abstain.Stop the syscall. Inspect PreTool payloads; exit 2 rejects the invocation before Claude Code emits shell or filesystem IO.

Anthropic publishes Hooks docs for PreToolUse. Claude Code here is merely the tutorial runtime — same zoning idea lifts to Cursor or bespoke loops with uglier duct tape.

Rough mental model:

LLM emits Write/Bash/Etc
           ↓
Hook stdin JSON arrives
           ↓ Host checks path + motif
 denial (exit 2) → Claude never sends the offending call
 allowance (exit 0) → downstream tool executes

“Intent aligned?” stops being the bottleneck—illegal transitions simply never bind.

What it feels like in-session: Hook stderr (oracle write denied: …) re-enters transcript context—the model visibly pivots (“try mkdir under permitted tree instead”), which beats repeated human nagging—but regex false positives sting; keep small allowlists trimmed.

AOS v0.1 compass (minimal)

Portable ideas live in-repo (v0.1 published):

Zones (§3.2)

Everything maps to Oracle / Permitted / Prohibited:

Zone	Behavior	Typical contents
Oracle	read-only sanctum	spec md, evaluator goldens, immutable policy
Permitted	ordinary workspace churn	implementations, codegen scratchpad
Prohibited	off-map	host paths beyond agreed roots

Oracle is the wedge against “tests flaky → soften fixtures.”Golden truth stays where drafts cannot casually rewrite expectations.

Physical enforcement skeleton (§4.1-ish)

Teaching stub—you own regex sharpness locally:

# pretooluse_iron_cage.py — teaching stub (Python 3)
import json
import sys
from pathlib import Path

ORACLE_SEGMENTS = ("00_Management", "evals")

def oracle_hit(path_str: str) -> bool:
    node = Path(path_str).resolve()
    names = {p.name for p in [node, *node.parents]}
    return bool(set(ORACLE_SEGMENTS).intersection(names))

def main() -> int:
    payload = json.load(sys.stdin)
    name = payload.get("tool_name", "")
    inp = payload.get("tool_input", {})

    if name in ("Write", "Edit"):
        target = inp.get("file_path") or inp.get("filePath", "")
        if target and oracle_hit(target):
            print(f"[iron_cage] oracle write denied: {target}", file=sys.stderr)
            return 2

    if name == "Bash":
        cmd = inp.get("command", "")
        if "sed -i" in cmd or "truncate " in cmd:
            print(f"[iron_cage] banned edit motif: {cmd}", file=sys.stderr)
            return 2

    return 0

if __name__ == "__main__":
    sys.exit(main())

JSONC hook registration (absolute path, not mine):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash|Write|Edit",
        "hooks": [
          {
            "type": "command",
            "command": "python3 /absolute/path/pretooluse_iron_cage.py"
          }
        ]
      }
    ]
  }
}

exit 2 means the model never invokes sed -i; hook regex maintenance is intentional busywork—you trade prompt theater for brittle but inspectable predicates.

Role separation bite (§4.3)

Do not grade in the authoring session.

Symptom	Likely pathology
logs already red inside gen thread	narration still cheers “DONE”
fresh shell repeats red tests	storyline mutates—“WIP”, “temporary” excuses

Detached evaluation (CI bots, one-shot review agents, scripted harnesses) snaps generation myths earlier.

ASCII-only guardrail sketch:

Author session --> artifact
                         |
                         v
Detached judge --> PASS/FAIL + logs

If one chat both writes and solemnly declares victory, skepticism warranted.

Evidence habits (§4.4)

Chats saying “looks good” evaporate.Disk + exit codes.

Claim type	Receipt class
tests clean	CLI exit prints + plaintext logs committed or archived
file exists	deterministic listing/checksum snapshots
metadata drift	hashing inventory rows

If artifact never materially touches disk—or logs vanish—you schedule another attempt.

Why publish prose at all

Roughly forty Python lines buys a civilization-level conversation about oracle integrity reachable by strangers opening GitHub—not buried in ephemeral prompts.

Pieces worth collective iteration:

§ slice	gist
3.2	Three Zones delineation
4.1	physical intercept
4.3	generation vs adjudication firewall
4.4	evidence minimalism

Spec stays engine-agnostic; hook sample instantiates Claude today, tomorrow maybe something else.Portability is deliberate, not omission.

After wiring (empirical anecdotes)

Area	Observation
in-place sabotage	`sed -i` stopped binding—exit 2 first
nag volume	fewer “pretty please abide policy” arcs
error surfacing	stderr guidance steers next benign attempt
triage cleanliness	adjudication outside authoring loop clarifies regressions

Tax: brittle regex—you will relax or tighten predicates as repos evolve; still dwarfed babysitting infinitely long CLAUDE.md scrolls nobody reads verbatim.

Closing beat

Workload on agents climbs; “please behave” asymptotes quickly.Architect denials, not applause cycles.

Issues & PR welcome on aos-standard/AOS-spec.

Shortcuts

Spec root: github.com/aos-standard/AOS-spec
Hooks primer: docs.claude.com/.../hooks
Thesis-only companion (#001) & CI companion (#002) linked from their Dev.to URLs / ledger.

AOS v0.1 Specification (GitHub)

The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.

👉 AOS-spec — specification
👉 physical-agent-patterns — implementation patterns

If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.

AI Governance: One Repo, One Smoke Tool, and a Green CI Run

AOS Architect — Sun, 12 Apr 2026 13:06:56 +0000

What this reads like

Continuation of Why AI Agents Don't Follow Rules. Same thesis: policy text settles at load time; physical constraints settle at execution time. Here we show artifacts you can cite inside a governed monorepo: hashed commits, enumerated checks, CI job lanes—without asking strangers to trust a private Actions permalink.

Hook-level code belongs in #003 — Binding AI agents with physics. Production failure patterns are in #005 — Four ways agents silently fail.

What we actually did

Inside a repo running under AOS v0.1 zone semantics, we stood up a thin smoke pillar—not a hero demo, but a tripwire so automated regressions bite when someone "helpfully" rewrites evals or oracle fixtures.

Typical layout (repo-specific paths, portable idea):

tools/smoke_pillar/
├── main.py
├── evals/
├── playwright/          # browser tests isolated from Python core
└── manifest.json        # declares writable zones

Design before bytes

The directory tree was not hand-drawn and then backfilled. A scaffold generator (template that emits the full tool tree) ran first; humans and agents edited only inside Permitted zones afterward.

Step	Action	Why
1	Register the tool shape in an internal design registry	Fix boundaries before line 1
2	Generator emits manifest, evals harness, test config	Avoid cosmetic folder sprawl
3	Edits stay in implementation workspace	Keep oracle/eval truth out of generation paths

Public vocabulary lives in AOS-spec. Internal ledgers are ops indexing—not something readers need to mirror verbatim.

CI mold — patterns you can copy

After the smoke pillar passed once, we hardened the template so new tools survive bare python3 on GitHub Actions matrices:

Move	Purpose
`main.py --help` exits cleanly before heavy imports	survives venv-less CI
optional `.env`	secrets-free matrices
keep heavy type-check deps out of baseline requirements unless opted in	deterministic smoke band
`timeout` wrappers on local diagnostics	agents cannot hang infra silently
sibling regression probe tool	tripwire if the template starts lying

The probe is not a vanity metric—it catches "forge stayed green once" rot after refactors.

Local gates before push

Rough checklist historically satisfied:

Check	Passing means
`python3 evals/run_evals.py`	exit 0, no intentional skips
`npx playwright test` inside the tool's isolated test dir	`1 passed`, scoped runs only
repo layout compliance script (structure audit)	OK / no critical drift

pre-commit may re-run the structure audit so "green locally" leaks less often onto main. Hooks (PreToolUse, exit 2) and CI are different layers with the same philosophy: stop right before merge or disk.

Commits as receipts (not folklore)

We anchor milestones to short SHAs (your fork will differ—the pattern is the point):

SHA (prefix)	What changed
`d303ece0`	initial smoke scaffold + manifest
`85a524e0`	verification notes + metadata sync
`2bcbb52c`	import-order resilience for naked CI Python
`9870fa67`	template CI hardening + regression probe
`143dda68`	tip where the cited graph was green

URLs rot. SHA + job lane names travel better in outbound writing.

Why we skip raw Actions permalinks

The monorepo is private.

A pasted actions/runs/... badge 404s outside the org and fingerprints repo ownership. For external readers we ship:

commit SHAs (above)
job lanes that were green together—e.g. evals-matrix, independent-judge, Playwright smoke, structure-audit matrix
cloneable AOS-spec as vocabulary proof

"We cannot show our CI UI" is fine if repeatable commands + public spec remain inspectable.

Agent-operated commits (with caveats)

During this milestone, the human operator did not manually type git commit / git push. An agent toolchain issued operations under consistent author metadata.

Git metadata alone is forgeable. Hence the layered receipts: evals, Playwright, structure audit, and an independent judge job green on the same graph as the cited SHA. "An agent did everything" ≠ "safe" without that stack.

Hook denials — a separate receipt class

Distinct from CI: PreToolUse hook returns exit 2 and the Write never reaches disk. That is execution-time denial with a log excerpt—not prompt theater. Same family as #003.

Independent judge lane

A CI job reviews diffs with a vendor-separated model from the authoring stack.

Letting the same session say "looks fine" is self-grading. That is verification contamination.

Scheduled CI embarrassment beats a chat message that says "all good."

Practical limits

Constraint	Meaning
Private repo narrative	method essay, not a file tour
`permissions: contents: read` in workflows	narrower blast radius

What we actually check before merge

"This change is safe" shows up in agent chat all the time. We do not merge on that sentence alone.

We ask for the commit SHA and the CI graph: independent-judge and evals-matrix green on the same workflow run. Run ID, Actions export, or a screenshot—all fine.

If that cannot be produced, the change waits. PRs with polished logs but no matching graph show up more often than you might expect.

Where this series goes next

CI and hooks cover execution-time denial. Silent production failures—no trace, no persistence—are #005 plus physical-agent-patterns.

AOS Specification (GitHub)

The "physical governance" approach in this article is formalized as AOS (AI Operating Standard) — v0.2 adds runnable implementation examples.

👉 github.com/aos-standard/AOS-spec — specification

👉 github.com/aos-standard/physical-agent-patterns — patterns

If useful, please ⭐ star the repo. Issues and PRs welcome.

Why AI Agents Don't Follow Rules — The Case for Physical Governance

AOS Architect — Mon, 06 Apr 2026 23:18:38 +0000

The incident

One repository carried north of 130 KB of governance Markdown.

An agent consumed it. It answered as if it had understood—then violated those same constraints on its very next Write/Bash.

That rarely means “needs more prompting.” Usually it means the enforcement moment is missing: policy shows up during context load, tool calls happen later.

Why prompt-only bans leak

Teams still anchor on prose in prompts and markdown:

Pattern	Aim
“Never mutate `evals/`”	keep evaluation oracle from being rewritten
“No Writes under `00_Management/`”	guard canonical governance text

The trouble is reliance on attention at ingestion time. Tool calls afterward are not mechanically tied to whether the agent “remembers.” It can skim, reroute, or hallucinate exemptions.

Destructive UNIX commands behave differently: rm -rf / arrives behind a syscall gate, not a PDF. Hardware and OS designers assume humans forget; agents forget faster.

Rough split:

Text-only policy  → warns once, when tokens are assembled
Physical gate       → denies the transition right before disk or shell

When the generator grades itself

Separate problem: self-checking.

If the same conversational loop both authors an artifact and “confirms” it is fine, you import the same biases twice. Mostly not malice—the same shortcuts from generation bleed into adjudication.

A suite that always green may be unplugged instrumentation.

Structural fix: evaluations in different processes (CI, ephemeral runs, reviewers) — not another chat turn in the same session.

What AOS stacks

The AI Operating Standard (AOS) is a small vocabulary for where governance lives. Three slices only:

1 — Zones

Zone	Meaning	Typical write rule
Oracle	Specs and test truth	agents do not write here
Permitted	implementation workspace	scoped by role
Prohibited	outside the agreed tree	sovereign (human operator) clearance only

Oracle is the piece that kills “tests red → loosen expectations.”Truth for pass/fail has to live where automation cannot casually patch it.

2 — Roles

Design / execution / approval stay explicitly disjoint. When an agent crosses its lane, stop and escalate to a human. No sideways title upgrades.

3 — Physical enforcement

Hooks (e.g. Claude Code PreToolUse) inspect JSON before a Write executes. Typical outcomes:

Try this	Typical host response
Write into an Oracle-marked subtree	`exit 2` — canceled call
Forbidden edit patterns (`sed -i`, in-place truncation)	same refusal

Trust is aimed at mechanics, not good intentions.

iron_cage in one breath

iron_cage is just the working name we use for our PreToolUse wiring—it is not magic, it is AOS v0.1 §§4.x rendered as a handful of Python and settings.

Behind it sit two habits we nicknamed Type-91 Governance:

Axis	Aim
Forensic isolation	logs/hashes outsiders can reconstruct
Physical isolation	generation context is not where final evaluations live

Specifications live in AOS-spec on GitHub—iron_cage is one plausible answer. For runnable detail, skim the Hooks companion (#003) first.

Concrete examples of what vanished for us early on: Writes aimed at evaluator JSON under guarded paths and first attempts at sed -i on shared hosts.

Machine-readable preamble

Opening AOS-v0.1.md with machine-facing instructions lets you anchor bans in something outside today’s ephemeral chat.

Not “pretty please”; “this markdown is upstream of the prompt.” It does not automate compliance—it gives reviewers and automation a shared glossary.

Why publish wording at all

Mid-2026, trust in autonomous diffs is still mostly vibes. Everybody reinvents oracle boundaries in private repos. Putting the vocabulary in aos-standard/AOS-spec tries to shave that tax—even if implementations differ.

Related

Long EN walkthrough (ledger #003): binding-ai-agents-with-physics...
CI-heavy companion (ledger #002): ai-governance-one-repo...
Claude Code Hooks primer: docs.claude.com/.../hooks

AOS v0.1 Specification (GitHub)

The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.

👉 AOS-spec — specification
👉 physical-agent-patterns — implementation patterns

If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.