Four ways production agents silently fail
An LLM agent that felt great locally tends to break in the same places once you push it toward production:
- Silent failure — swallows an exception and returns "done" with nothing on disk
- No trace — claims the tests passed, but no file was ever written
- Restart wipes state — only runs inside a session; a reboot means zero continuity
- Self-inflicted violations go unreported — the agent that broke the rule reports nothing
None of these get fixed by "please be more careful." You have to make the broken state structurally impossible on the host side, before the agent's request goes through. The rest of this article is the four physical patterns (§10.1–§10.4) that map one-to-one onto these four failure modes.
Why physical? — what v0.1 established
The idea isn't new. In the AOS v0.1 article I laid out the minimal framework: constrain LLM agents with host-side physical constraints, not textual rules. Four pillars:
- §3.2 Three Zones — classify every path as Oracle (read-only), Permitted (workspace), or Prohibited
-
§4.1 Hook Requirement — intercept writes and shell calls in a
PreToolUsehook, block violations withexit 2 - §4.3 Role Separation — the agent that generates an artifact must not be the sole evaluator of it
- §4.4 Physical Evidence — completion is proven by a file on disk, not by a chat message
That's the "what should be constrained" layer — the boundary line. But one question always remained: "OK, but how do I actually implement this in a real tool?" The four failure modes above are exactly what leaks through that implementation gap, and v0.2 is what closes it.
What v0.2 does: keep the norms, add the examples
The approach is deliberately narrow:
- §1–§9 normative text (MUST / MUST NOT) is unchanged — fully backward compatible
- New §10 Implementation Examples — four production patterns, each linked to real code in a public repository
-
§6 renamed from
Reference Implementation(singular) toReference Implementations(plural) — removed the reference to an unpublished implementation, and pointed it at the real, public physical-agent-patterns repo instead
So it isn't "more spec words." It's "connect the spec words to code that already runs."
§10.1 Manifest declaration (maps to §8, §9)
An AOS-compliant tool declares its zone boundaries in manifest.json, so another agent can learn — before startup — where it may write and what it must not touch.
{
"aos_compliant": "v0.2",
"permitted_output_paths": ["docs/reports/"],
"oracle_paths": ["evals/", "config/"]
}
-
oracle_pathsmaps directly to the §3.2 Oracle zone; the hook blocks writes here at execution time. -
permitted_output_pathsis the Permitted zone — the only place the tool may produce output.
The key rule: declaration without enforcement is non-compliant (§8, final paragraph). Writing aos_compliant in your manifest means nothing unless a hook (or CI gate) actually blocks writes to oracle_paths.
§10.2 Physical evidence (maps to §4.4)
The failure mode AOS targets: an agent claims it "ran" but left no trace. The physical-first pattern makes evidence a precondition of completion, not an afterthought.
# Write evidence BEFORE declaring done (from agent_with_evidence.py)
evidence = {
"task": task,
"result": result_text,
"timestamp": datetime.date.today().isoformat(),
"model": model,
}
evidence_path.write_text(json.dumps(evidence, indent=2))
# Only after the file exists: print completion
print(f"[done] Evidence written: {evidence_path}")
A caller verifies completion just by checking that evidence_path exists. No conversational assertion required.
Source: physical-agent-patterns/patterns/02_physical-first/agent_with_evidence.py
§10.3 Immune loop (maps to §4.1, §4.5)
A running agent detects AOS violations in the workspace and triggers a repair sequence. The crucial part: detection (read-only scan) is separated from repair (write).
# violation_detector.py — write the report BEFORE any repair attempt
violations = _scan(root)
report = {
"timestamp": datetime.datetime.utcnow().isoformat(),
"violations": violations,
}
report_path.write_text(json.dumps(report, indent=2))
The detector writes a JSON violation report (itself §4.4 evidence). The repair planner reads it and either applies known fixes or escalates to the Sovereign when a design decision is required (§4.5). The detector never repairs its own findings — which also satisfies §4.3 role separation.
Source: physical-agent-patterns/patterns/03_immune-loop/
§10.4 systemd runtime (maps to §4.4, persistence)
An agent that only runs interactively can't satisfy §4.4 across reboots. The systemd pattern binds the agent to the OS process supervisor: the service defines the execution boundary, the timer enforces the schedule, and output files survive restarts.
# agent.py — the output file is the evidence of the run
output_path = OUTPUT_DIR / f"agent_run_{today}.md"
if output_path.exists():
print(f"[skip] Output already exists for {today}: {output_path}")
return output_path
# ... run and write ...
output_path.write_text(content)
# physical-agent.timer (excerpt)
[Timer]
OnCalendar=daily
Persistent=true
The idempotency guard (if output_path.exists(): return) prevents duplicate runs while keeping the evidence file as the canonical completion record. Persistent=true fires a missed run on next boot — so the evidence requirement holds regardless of uptime.
Source: physical-agent-patterns/patterns/01_systemd-runtime/
The four patterns mapped to AOS sections
| Pattern | AOS section | In one line |
|---|---|---|
| Manifest declaration | §8, §9 | Declare writable zones in a machine-readable way |
| Physical evidence | §4.4 | An evidence file is the precondition for completion |
| Immune loop | §4.1, §4.5 | Separate violation detection from repair/escalation |
| systemd runtime | §4.4 (persistence) | Keep evidence across restarts |
None of these are clever inventions. They're just the boundaries you always hit when you push agents toward production, factored into reusable form.
Why bake implementation examples into the spec
A common failure mode for specs: the norms are solid, but nobody has a starting point. A reader finishes thinking "I agree it's correct — now what's my first line of code?" and stalls.
v0.2 shrinks that distance. Every clause now has clonable, runnable public code attached. The spec itself stays runtime-agnostic (Claude Code / Cursor / your own loop), but the examples make it cheaper for the second person to adopt it.
All four §10 patterns live in physical-agent-patterns, so you can git clone and read them directly.
Wrapping up
AOS v0.2 is not a version that adds new constraints. It's the version that fills in how to implement the constraints, with links to working code.
- Normative text (v0.2): AOS-spec/AOS-v0.2.md
- Implementation patterns: physical-agent-patterns
If you've felt that "textual rules alone can't keep agents in line," I hope this gives you a concrete starting point.
AOS specification (GitHub)
The "physical governance" approach in this article is specified and published as AOS (AI Operating Standard). v0.2 adds the implementation-examples section.
👉 AOS-spec — the spec (v0.2)
👉 physical-agent-patterns — implementation patterns
If the spec or the examples were useful, a ⭐ star helps shape the next version. Issues and PRs are welcome.
Top comments (0)