<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: AOS Architect</title>
    <description>The latest articles on DEV Community by AOS Architect (@aos_standard).</description>
    <link>https://dev.to/aos_standard</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3864661%2Ffb24f366-82cd-4814-9853-9e612950fa0a.png</url>
      <title>DEV Community: AOS Architect</title>
      <link>https://dev.to/aos_standard</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aos_standard"/>
    <language>en</language>
    <item>
      <title>Four ways production agents silently fail — and the physical patterns that prevent them (AOS v0.2)</title>
      <dc:creator>AOS Architect</dc:creator>
      <pubDate>Wed, 03 Jun 2026 22:56:19 +0000</pubDate>
      <link>https://dev.to/aos_standard/four-ways-production-agents-silently-fail-and-the-physical-patterns-that-prevent-them-aos-v02-1c17</link>
      <guid>https://dev.to/aos_standard/four-ways-production-agents-silently-fail-and-the-physical-patterns-that-prevent-them-aos-v02-1c17</guid>
      <description>&lt;h2&gt;
  
  
  Four ways production agents silently fail
&lt;/h2&gt;

&lt;p&gt;An LLM agent that felt great locally tends to break in the same places once you push it toward production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Silent failure&lt;/strong&gt; — swallows an exception and returns "done" with nothing on disk&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No trace&lt;/strong&gt; — claims the tests passed, but no file was ever written&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restart wipes state&lt;/strong&gt; — only runs inside a session; a reboot means zero continuity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-inflicted violations go unreported&lt;/strong&gt; — the agent that broke the rule reports nothing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these get fixed by &lt;em&gt;"please be more careful."&lt;/em&gt; You have to make the broken state &lt;strong&gt;structurally impossible&lt;/strong&gt; on the host side, before the agent's request goes through. The rest of this article is the four physical patterns (§10.1–§10.4) that map one-to-one onto these four failure modes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why physical? — what v0.1 established
&lt;/h3&gt;

&lt;p&gt;The idea isn't new. In the &lt;a href="https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg"&gt;AOS v0.1 article&lt;/a&gt; I laid out the minimal framework: constrain LLM agents with &lt;strong&gt;host-side physical constraints&lt;/strong&gt;, not textual rules. Four pillars:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;§3.2 Three Zones&lt;/strong&gt; — classify every path as Oracle (read-only), Permitted (workspace), or Prohibited&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§4.1 Hook Requirement&lt;/strong&gt; — intercept writes and shell calls in a &lt;code&gt;PreToolUse&lt;/code&gt; hook, block violations with &lt;code&gt;exit 2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§4.3 Role Separation&lt;/strong&gt; — the agent that generates an artifact must not be the sole evaluator of it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§4.4 Physical Evidence&lt;/strong&gt; — completion is proven by a file on disk, not by a chat message&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the &lt;em&gt;"what should be constrained"&lt;/em&gt; layer — the boundary line. But one question always remained: &lt;strong&gt;"OK, but how do I actually implement this in a real tool?"&lt;/strong&gt; The four failure modes above are exactly what leaks through that &lt;strong&gt;implementation gap&lt;/strong&gt;, and v0.2 is what closes it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What v0.2 does: keep the norms, add the examples
&lt;/h2&gt;

&lt;p&gt;The approach is deliberately narrow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;§1–§9 normative text (MUST / MUST NOT) is unchanged&lt;/strong&gt; — fully backward compatible&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New §10 Implementation Examples&lt;/strong&gt; — four production patterns, each linked to real code in a public repository&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;§6 renamed from &lt;code&gt;Reference Implementation&lt;/code&gt; (singular) to &lt;code&gt;Reference Implementations&lt;/code&gt; (plural)&lt;/strong&gt; — removed the reference to an unpublished implementation, and pointed it at the real, public &lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt; repo instead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So it isn't &lt;em&gt;"more spec words."&lt;/em&gt; It's &lt;strong&gt;"connect the spec words to code that already runs."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  §10.1 Manifest declaration (maps to §8, §9)
&lt;/h2&gt;

&lt;p&gt;An AOS-compliant tool declares its zone boundaries in &lt;code&gt;manifest.json&lt;/code&gt;, so another agent can learn — &lt;em&gt;before startup&lt;/em&gt; — where it may write and what it must not touch.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"aos_compliant"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"v0.2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permitted_output_paths"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"docs/reports/"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"oracle_paths"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"evals/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"config/"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;oracle_paths&lt;/code&gt; maps directly to the §3.2 Oracle zone; the hook blocks writes here at execution time.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;permitted_output_paths&lt;/code&gt; is the Permitted zone — the only place the tool may produce output.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key rule: &lt;strong&gt;declaration without enforcement is non-compliant&lt;/strong&gt; (§8, final paragraph). Writing &lt;code&gt;aos_compliant&lt;/code&gt; in your manifest means nothing unless a hook (or CI gate) actually blocks writes to &lt;code&gt;oracle_paths&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  §10.2 Physical evidence (maps to §4.4)
&lt;/h2&gt;

&lt;p&gt;The failure mode AOS targets: &lt;strong&gt;an agent claims it "ran" but left no trace.&lt;/strong&gt; The physical-first pattern makes evidence a &lt;em&gt;precondition&lt;/em&gt; of completion, not an afterthought.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Write evidence BEFORE declaring done (from agent_with_evidence.py)
&lt;/span&gt;&lt;span class="n"&gt;evidence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;today&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;evidence_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# Only after the file exists: print completion
&lt;/span&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[done] Evidence written: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;evidence_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A caller verifies completion just by checking that &lt;code&gt;evidence_path&lt;/code&gt; exists. &lt;strong&gt;No conversational assertion required.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/aos-standard/physical-agent-patterns/blob/main/patterns/02_physical-first/agent_with_evidence.py" rel="noopener noreferrer"&gt;physical-agent-patterns/patterns/02_physical-first/agent_with_evidence.py&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  §10.3 Immune loop (maps to §4.1, §4.5)
&lt;/h2&gt;

&lt;p&gt;A running agent detects AOS violations in the workspace and triggers a repair sequence. The crucial part: &lt;strong&gt;detection (read-only scan) is separated from repair (write).&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# violation_detector.py — write the report BEFORE any repair attempt
&lt;/span&gt;&lt;span class="n"&gt;violations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;violations&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;violations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;report_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The detector writes a JSON violation report (itself §4.4 evidence). The repair planner reads it and either applies known fixes or &lt;strong&gt;escalates to the Sovereign when a design decision is required&lt;/strong&gt; (§4.5). The detector never repairs its own findings — which also satisfies §4.3 role separation.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/aos-standard/physical-agent-patterns/tree/main/patterns/03_immune-loop" rel="noopener noreferrer"&gt;physical-agent-patterns/patterns/03_immune-loop/&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  §10.4 systemd runtime (maps to §4.4, persistence)
&lt;/h2&gt;

&lt;p&gt;An agent that only runs interactively can't satisfy §4.4 across reboots. The systemd pattern binds the agent to the OS process supervisor: the service defines the execution boundary, the timer enforces the schedule, and output files survive restarts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# agent.py — the output file is the evidence of the run
&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OUTPUT_DIR&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_run_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[skip] Output already exists for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output_path&lt;/span&gt;
&lt;span class="c1"&gt;# ... run and write ...
&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# physical-agent.timer (excerpt)
&lt;/span&gt;&lt;span class="nn"&gt;[Timer]&lt;/span&gt;
&lt;span class="py"&gt;OnCalendar&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;daily&lt;/span&gt;
&lt;span class="py"&gt;Persistent&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The idempotency guard (&lt;code&gt;if output_path.exists(): return&lt;/code&gt;) prevents duplicate runs while keeping the evidence file as the canonical completion record. &lt;code&gt;Persistent=true&lt;/code&gt; fires a missed run on next boot — so the evidence requirement holds &lt;strong&gt;regardless of uptime&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Source: &lt;a href="https://github.com/aos-standard/physical-agent-patterns/tree/main/patterns/01_systemd-runtime" rel="noopener noreferrer"&gt;physical-agent-patterns/patterns/01_systemd-runtime/&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The four patterns mapped to AOS sections
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;AOS section&lt;/th&gt;
&lt;th&gt;In one line&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Manifest declaration&lt;/td&gt;
&lt;td&gt;§8, §9&lt;/td&gt;
&lt;td&gt;Declare writable zones in a machine-readable way&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Physical evidence&lt;/td&gt;
&lt;td&gt;§4.4&lt;/td&gt;
&lt;td&gt;An evidence file is the precondition for completion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Immune loop&lt;/td&gt;
&lt;td&gt;§4.1, §4.5&lt;/td&gt;
&lt;td&gt;Separate violation detection from repair/escalation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;systemd runtime&lt;/td&gt;
&lt;td&gt;§4.4 (persistence)&lt;/td&gt;
&lt;td&gt;Keep evidence across restarts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of these are clever inventions. They're just the boundaries you &lt;em&gt;always&lt;/em&gt; hit when you push agents toward production, factored into reusable form.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why bake implementation examples into the spec
&lt;/h2&gt;

&lt;p&gt;A common failure mode for specs: &lt;strong&gt;the norms are solid, but nobody has a starting point.&lt;/strong&gt; A reader finishes thinking "I agree it's correct — now what's my first line of code?" and stalls.&lt;/p&gt;

&lt;p&gt;v0.2 shrinks that distance. Every clause now has &lt;strong&gt;clonable, runnable public code&lt;/strong&gt; attached. The spec itself stays runtime-agnostic (Claude Code / Cursor / your own loop), but the examples make it cheaper for the &lt;em&gt;second&lt;/em&gt; person to adopt it.&lt;/p&gt;

&lt;p&gt;All four §10 patterns live in &lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;, so you can &lt;code&gt;git clone&lt;/code&gt; and read them directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;AOS v0.2 is not a version that adds new constraints. It's the version that &lt;strong&gt;fills in &lt;em&gt;how&lt;/em&gt; to implement the constraints, with links to working code.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normative text (v0.2): &lt;a href="https://github.com/aos-standard/AOS-spec/blob/main/AOS-v0.2.md" rel="noopener noreferrer"&gt;AOS-spec/AOS-v0.2.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Implementation patterns: &lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've felt that "textual rules alone can't keep agents in line," I hope this gives you a concrete starting point.&lt;/p&gt;




&lt;h2&gt;
  
  
  AOS specification (GitHub)
&lt;/h2&gt;

&lt;p&gt;The "physical governance" approach in this article is specified and published as &lt;strong&gt;AOS (AI Operating Standard)&lt;/strong&gt;. v0.2 adds the implementation-examples section.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; — the spec (v0.2)&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;&lt;/strong&gt; — implementation patterns&lt;/p&gt;

&lt;p&gt;If the spec or the examples were useful, a ⭐ &lt;strong&gt;star&lt;/strong&gt; helps shape the next version. Issues and PRs are welcome.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>A mocked ad-copy CLI, real evals, and 30 Playwright cycles (tool 1027)</title>
      <dc:creator>AOS Architect</dc:creator>
      <pubDate>Sun, 17 May 2026 13:56:57 +0000</pubDate>
      <link>https://dev.to/aos_standard/a-mocked-ad-copy-cli-real-evals-and-30-playwright-cycles-tool-1027-39ln</link>
      <guid>https://dev.to/aos_standard/a-mocked-ad-copy-cli-real-evals-and-30-playwright-cycles-tool-1027-39ln</guid>
      <description>&lt;h2&gt;
  
  
  What this is
&lt;/h2&gt;

&lt;p&gt;Earlier posts in this series were mostly &lt;em&gt;why&lt;/em&gt; agent work needs hard boundaries—not politeness, but paths, CI, and tooling you can point at. Here I stay on the boring side: &lt;strong&gt;a small repo-local CLI&lt;/strong&gt; (internal id &lt;strong&gt;1027&lt;/strong&gt;) that prints &lt;strong&gt;JSON ad variants&lt;/strong&gt; under mocks, with evaluators and repeated browser/CLI runs layered on top.&lt;/p&gt;

&lt;p&gt;I am not selling model magic. Same inputs, same stubbed &lt;code&gt;copies&lt;/code&gt; / &lt;code&gt;best&lt;/code&gt;; that is the point when you need to explain behavior to someone who was not in the room when the demo ran.&lt;/p&gt;

&lt;p&gt;Our public article ledger lists this draft as &lt;strong&gt;&lt;code&gt;#004&lt;/code&gt;&lt;/strong&gt; next to the Japanese Zenn manuscript.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the tool actually does
&lt;/h2&gt;

&lt;p&gt;Inputs look like product, audience, and channel. Outputs are JSON: multiple &lt;strong&gt;&lt;code&gt;copies&lt;/code&gt;&lt;/strong&gt;, a picked &lt;strong&gt;&lt;code&gt;best&lt;/code&gt;&lt;/strong&gt;, and score-like fields from small heuristics (channel baselines, tiny nudges for length). Marketing can paste into a sheet; engineers can assert on stdout. Those two audiences rarely share one artifact, so &lt;strong&gt;fixing the boundary as JSON&lt;/strong&gt; saves a lot of arguments later.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;&lt;code&gt;--mock&lt;/code&gt;&lt;/strong&gt; (and the bypass flag we use for local runs), the CLI does not call a remote LLM. A hash from the input tuple pins the stub, so &lt;strong&gt;the demo payload and the regression payload are the same object&lt;/strong&gt;. When you show it externally, repeatable bytes beat a one-off “wow” completion.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DESIGN.md&lt;/code&gt; in the repo splits payment gates, outbound calls, and filesystem writes so static checks can police them. If you only take one idea from AOS here, it is the &lt;strong&gt;Oracle / Permitted / Prohibited&lt;/strong&gt; split; the full contract lives in the GitHub spec linked at the bottom.&lt;/p&gt;

&lt;h2&gt;
  
  
  Walk-through with fixture-shaped inputs
&lt;/h2&gt;

&lt;p&gt;Roughly what the evaluators exercise:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Sample&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Product&lt;/td&gt;
&lt;td&gt;Migration enablement SaaS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audience&lt;/td&gt;
&lt;td&gt;Mid-market IT leaders&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Channel&lt;/td&gt;
&lt;td&gt;google&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You always get &lt;strong&gt;&lt;code&gt;copies&lt;/code&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;code&gt;best&lt;/code&gt;&lt;/strong&gt;, and deterministic scores with no outbound model call on that path. Later you can swap in a real generator &lt;strong&gt;behind the same shape&lt;/strong&gt;; the lesson I care about is that &lt;strong&gt;the contract is narrower than the prose&lt;/strong&gt;. JSON in logs beats parsing Markdown when you want spreadsheets, dashboards, or a second agent to judge output.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the evals check
&lt;/h2&gt;

&lt;p&gt;Three buckets, nothing fancy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--mock&lt;/code&gt; path:&lt;/strong&gt; stdout contains &lt;strong&gt;&lt;code&gt;copies&lt;/code&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;code&gt;best&lt;/code&gt;&lt;/strong&gt; inside a success envelope (we use &lt;code&gt;--bypass-payment&lt;/code&gt; where payment is not the subject of the test).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static hygiene:&lt;/strong&gt; small AST-based scripts reject “always true” assertions that look like coverage theater.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;pytest&lt;/code&gt; adversarial marks:&lt;/strong&gt; tests tagged &lt;strong&gt;&lt;code&gt;@pytest.mark.adversarial&lt;/code&gt;&lt;/strong&gt; must not be skipped by accident (&lt;code&gt;pytest … -m adversarial&lt;/code&gt;). If a test is supposed to hurt, it should stay in the default pain path.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most of this reads as busywork until the repo grows faster than your memory. Then you want failures to show up &lt;strong&gt;without&lt;/strong&gt; someone remembering to tick the scary suite.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thirty Playwright cycles (five checks each)
&lt;/h2&gt;

&lt;p&gt;CLI tests alone miss a lot of wiring: paths, permissions, timers, how the process is launched. So we also run &lt;strong&gt;Playwright&lt;/strong&gt;: one bundle of &lt;strong&gt;five checks&lt;/strong&gt;, &lt;strong&gt;thirty times&lt;/strong&gt;, all green in the report trail (payment refused without a real transaction, mock path succeeds, keywords in stdout—exact list is in the shipped log).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;150 green runs&lt;/strong&gt; sounds like a vanity stat. I use it differently: one lucky pass is cheap; many identical passes say the &lt;strong&gt;environment story&lt;/strong&gt; is not a fluke. After you have watched CI go green for the wrong reason once, you start wanting volume, not a single badge.&lt;/p&gt;

&lt;p&gt;If you want to copy the pattern into your own codebase, four questions are enough: Can you &lt;strong&gt;freeze the output shape&lt;/strong&gt; (here, JSON)? Can you &lt;strong&gt;replay without the model&lt;/strong&gt;? Can you hit it &lt;strong&gt;from CI and from a browser driver&lt;/strong&gt;? Does your eval layer make “quietly skipped hard tests” awkward? This repo is a minimal &lt;strong&gt;yes&lt;/strong&gt; on all four.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trying it in practice
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;AOS specification&lt;/strong&gt; is open; the curated implementation bundle is not thrown on npm as a product. If you want to try it internally or talk through a serious eval, leave a short note on the companion &lt;strong&gt;Zenn&lt;/strong&gt; post (Japanese) or open a scoped issue on &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;aos-standard/AOS-spec&lt;/a&gt;&lt;/strong&gt; and say what you are trying to do. I will answer where it is practical and keep spec debate separate from “can we ship you a build.”&lt;/p&gt;




&lt;h2&gt;
  
  
  AOS v0.1 Specification (GitHub)
&lt;/h2&gt;

&lt;p&gt;The "physical governance" approach described in this article is formalized as &lt;strong&gt;AOS (AI Operating Standard) v0.1&lt;/strong&gt; — a minimal, machine-enforceable spec for AI agent operations.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; — specification&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;&lt;/strong&gt; — implementation patterns&lt;/p&gt;

&lt;p&gt;If you find this useful, please ⭐ &lt;strong&gt;star the repo&lt;/strong&gt;. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>agents</category>
      <category>playwright</category>
    </item>
    <item>
      <title>Binding AI agents with physics, not politeness — AOS v0.1 as a minimal spec</title>
      <dc:creator>AOS Architect</dc:creator>
      <pubDate>Thu, 07 May 2026 11:38:02 +0000</pubDate>
      <link>https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg</link>
      <guid>https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg</guid>
      <description>&lt;h2&gt;
  
  
  When prose piles up and nothing sticks
&lt;/h2&gt;

&lt;p&gt;Agents usually start life under &lt;strong&gt;Markdown rules&lt;/strong&gt;: &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.cursorrules&lt;/code&gt;, manifests in chat. One private repo cracked &lt;strong&gt;130 KB&lt;/strong&gt; of that kind of text—and still behaved like the rules lived in a parallel universe.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Intent excerpt&lt;/th&gt;
&lt;th&gt;Typical pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;forbid in-place hacks&lt;/td&gt;
&lt;td&gt;ban &lt;code&gt;sed -i&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;forbid blind truncation&lt;/td&gt;
&lt;td&gt;discourage &lt;code&gt;&amp;gt;&lt;/code&gt; shell redirects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;keep specs sacred&lt;/td&gt;
&lt;td&gt;disallow mutating oracle dirs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;keep merge honest&lt;/td&gt;
&lt;td&gt;audits before declaring done&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Instructions existed. Misuse continued.&lt;/p&gt;

&lt;p&gt;Instrumentation on one exhausting window showed &lt;strong&gt;policy breaches in every one of ~52 traced tool attempts&lt;/strong&gt;—“read it ✓”, “ignored it ✓ anyway”, “said done ✓”—pick your favorite failure mode.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Teaching tone is not torque.&lt;/strong&gt; If the forbidden command runs, wording failed quietly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hence &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS v0.1&lt;/a&gt;&lt;/strong&gt; as a terse spec—not another pep talk stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where leverage actually lands
&lt;/h2&gt;

&lt;p&gt;Stop asking the LM to politely abstain.&lt;strong&gt;Stop the syscall.&lt;/strong&gt; Inspect PreTool payloads; &lt;strong&gt;exit 2&lt;/strong&gt; rejects the invocation before Claude Code emits shell or filesystem IO.&lt;/p&gt;

&lt;p&gt;Anthropic publishes &lt;a href="https://docs.claude.com/en/docs/claude-code/hooks" rel="noopener noreferrer"&gt;Hooks docs&lt;/a&gt; for &lt;strong&gt;&lt;code&gt;PreToolUse&lt;/code&gt;&lt;/strong&gt;. Claude Code here is merely the &lt;strong&gt;tutorial runtime&lt;/strong&gt; — same zoning idea lifts to Cursor or bespoke loops with uglier duct tape.&lt;/p&gt;

&lt;p&gt;Rough mental model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM emits Write/Bash/Etc
           ↓
Hook stdin JSON arrives
           ↓ Host checks path + motif
 denial (exit 2) → Claude never sends the offending call
 allowance (exit 0) → downstream tool executes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;“Intent aligned?” stops being the bottleneck—&lt;strong&gt;illegal transitions simply never bind.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it feels like in-session:&lt;/strong&gt; Hook stderr (&lt;code&gt;oracle write denied: …&lt;/code&gt;) re-enters transcript context—the model visibly pivots (“try mkdir under permitted tree instead”), which beats repeated human nagging—but regex false positives sting; keep small allowlists trimmed.&lt;/p&gt;




&lt;h2&gt;
  
  
  AOS v0.1 compass (minimal)
&lt;/h2&gt;

&lt;p&gt;Portable ideas live &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;in-repo&lt;/a&gt;&lt;/strong&gt; (v0.1 published):&lt;/p&gt;

&lt;h3&gt;
  
  
  Zones (§3.2)
&lt;/h3&gt;

&lt;p&gt;Everything maps to Oracle / Permitted / Prohibited:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Zone&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Typical contents&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oracle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;read-only sanctum&lt;/td&gt;
&lt;td&gt;spec md, evaluator goldens, immutable policy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Permitted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ordinary workspace churn&lt;/td&gt;
&lt;td&gt;implementations, codegen scratchpad&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prohibited&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;off-map&lt;/td&gt;
&lt;td&gt;host paths beyond agreed roots&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Oracle is the wedge against “tests flaky → soften fixtures.”Golden truth stays where drafts cannot casually rewrite expectations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Physical enforcement skeleton (§4.1-ish)
&lt;/h3&gt;

&lt;p&gt;Teaching stub—you own regex sharpness locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pretooluse_iron_cage.py — teaching stub (Python 3)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="n"&gt;ORACLE_SEGMENTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;00_Management&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;evals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;oracle_hit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path_str&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parents&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ORACLE_SEGMENTS&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;intersection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stdin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;inp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Edit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filePath&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;oracle_hit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[iron_cage] oracle write denied: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sed -i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;truncate &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[iron_cage] banned edit motif: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSONC hook registration (&lt;strong&gt;absolute path&lt;/strong&gt;, not mine):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json-doc"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Bash|Write|Edit"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"python3 /absolute/path/pretooluse_iron_cage.py"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;exit 2&lt;/code&gt; means the model never invokes &lt;code&gt;sed -i&lt;/code&gt;; hook regex maintenance is intentional busywork—you trade prompt theater for brittle but inspectable predicates.&lt;/p&gt;




&lt;h2&gt;
  
  
  Role separation bite (§4.3)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Do not grade in the authoring session.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Likely pathology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;logs already red inside gen thread&lt;/td&gt;
&lt;td&gt;narration still cheers “DONE”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fresh shell repeats red tests&lt;/td&gt;
&lt;td&gt;storyline mutates—“WIP”, “temporary” excuses&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Detached evaluation (&lt;strong&gt;CI bots&lt;/strong&gt;, &lt;strong&gt;one-shot review agents&lt;/strong&gt;, scripted harnesses) snaps generation myths earlier.&lt;/p&gt;

&lt;p&gt;ASCII-only guardrail sketch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Author session --&amp;gt; artifact
                         |
                         v
Detached judge --&amp;gt; PASS/FAIL + logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If one chat both writes and solemnly declares victory, skepticism warranted.&lt;/p&gt;




&lt;h2&gt;
  
  
  Evidence habits (§4.4)
&lt;/h2&gt;

&lt;p&gt;Chats saying “looks good” evaporate.&lt;strong&gt;Disk + exit codes.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Claim type&lt;/th&gt;
&lt;th&gt;Receipt class&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;tests clean&lt;/td&gt;
&lt;td&gt;CLI exit prints + plaintext logs committed or archived&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;file exists&lt;/td&gt;
&lt;td&gt;deterministic listing/checksum snapshots&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;metadata drift&lt;/td&gt;
&lt;td&gt;hashing inventory rows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If artifact never materially touches disk—or logs vanish—you schedule another attempt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why publish prose at all
&lt;/h2&gt;

&lt;p&gt;Roughly forty Python lines buys a civilization-level conversation about &lt;strong&gt;oracle integrity&lt;/strong&gt; reachable by strangers opening GitHub—not buried in ephemeral prompts.&lt;/p&gt;

&lt;p&gt;Pieces worth collective iteration:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;§ slice&lt;/th&gt;
&lt;th&gt;gist&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3.2&lt;/td&gt;
&lt;td&gt;Three Zones delineation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4.1&lt;/td&gt;
&lt;td&gt;physical intercept&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4.3&lt;/td&gt;
&lt;td&gt;generation vs adjudication firewall&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4.4&lt;/td&gt;
&lt;td&gt;evidence minimalism&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Spec stays engine-agnostic; hook sample instantiates Claude today, tomorrow maybe something else.&lt;strong&gt;Portability is deliberate&lt;/strong&gt;, not omission.&lt;/p&gt;




&lt;h2&gt;
  
  
  After wiring (empirical anecdotes)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Observation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;in-place sabotage&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;sed -i&lt;/code&gt; stopped binding—exit 2 first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;nag volume&lt;/td&gt;
&lt;td&gt;fewer “pretty please abide policy” arcs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;error surfacing&lt;/td&gt;
&lt;td&gt;stderr guidance steers next benign attempt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;triage cleanliness&lt;/td&gt;
&lt;td&gt;adjudication outside authoring loop clarifies regressions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Tax:&lt;/strong&gt; brittle regex—you will relax or tighten predicates as repos evolve; still dwarfed babysitting infinitely long CLAUDE.md scrolls nobody reads verbatim.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing beat
&lt;/h2&gt;

&lt;p&gt;Workload on agents climbs; “please behave” asymptotes quickly.&lt;strong&gt;Architect denials&lt;/strong&gt;, not applause cycles.&lt;/p&gt;

&lt;p&gt;Issues &amp;amp; PR welcome on &lt;strong&gt;&lt;code&gt;aos-standard/AOS-spec&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Shortcuts&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spec root: &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;github.com/aos-standard/AOS-spec&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Hooks primer: &lt;a href="https://docs.claude.com/en/docs/claude-code/hooks" rel="noopener noreferrer"&gt;&lt;code&gt;docs.claude.com/.../hooks&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Thesis-only companion (&lt;strong&gt;#001&lt;/strong&gt;) &amp;amp; CI companion (&lt;strong&gt;#002&lt;/strong&gt;) linked from their Dev.to URLs / ledger.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  AOS v0.1 Specification (GitHub)
&lt;/h2&gt;

&lt;p&gt;The "physical governance" approach described in this article is formalized as &lt;strong&gt;AOS (AI Operating Standard) v0.1&lt;/strong&gt; — a minimal, machine-enforceable spec for AI agent operations.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; — specification&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;&lt;/strong&gt; — implementation patterns&lt;/p&gt;

&lt;p&gt;If you find this useful, please ⭐ &lt;strong&gt;star the repo&lt;/strong&gt;. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>claude</category>
      <category>cursor</category>
    </item>
    <item>
      <title>AI Governance: One Repo, One Smoke Tool, and a Green CI Run</title>
      <dc:creator>AOS Architect</dc:creator>
      <pubDate>Sun, 12 Apr 2026 13:06:56 +0000</pubDate>
      <link>https://dev.to/aos_standard/ai-governance-one-repo-one-smoke-tool-and-a-green-ci-run-28ae</link>
      <guid>https://dev.to/aos_standard/ai-governance-one-repo-one-smoke-tool-and-a-green-ci-run-28ae</guid>
      <description>&lt;h2&gt;
  
  
  What this reads like
&lt;/h2&gt;

&lt;p&gt;Continuation of &lt;strong&gt;&lt;a href="https://dev.to/aos_standard/why-ai-agents-dont-follow-rules-the-case-for-physical-governance-382f"&gt;Why AI Agents Don't Follow Rules&lt;/a&gt;&lt;/strong&gt;. Same thesis: &lt;strong&gt;policy text settles at load time; physical constraints settle at execution time.&lt;/strong&gt; Here we show &lt;strong&gt;artifacts you can cite&lt;/strong&gt; inside a governed monorepo: hashed commits, enumerated checks, CI job lanes—without asking strangers to trust a private Actions permalink.&lt;/p&gt;

&lt;p&gt;Hook-level code belongs in &lt;strong&gt;&lt;a href="https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg"&gt;#003 — Binding AI agents with physics&lt;/a&gt;&lt;/strong&gt;. Production failure patterns are in &lt;strong&gt;&lt;a href="https://dev.to/aos_standard/four-ways-production-agents-silently-fail-and-the-physical-patterns-that-prevent-them-aos-v02-1c17"&gt;#005 — Four ways agents silently fail&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What we actually did
&lt;/h2&gt;

&lt;p&gt;Inside a repo running under &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS v0.1&lt;/a&gt;&lt;/strong&gt; zone semantics, we stood up a &lt;strong&gt;thin smoke pillar&lt;/strong&gt;—not a hero demo, but a tripwire so automated regressions bite when someone "helpfully" rewrites evals or oracle fixtures.&lt;/p&gt;

&lt;p&gt;Typical layout (repo-specific paths, portable idea):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tools/smoke_pillar/
├── main.py
├── evals/
├── playwright/          # browser tests isolated from Python core
└── manifest.json        # declares writable zones
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Design before bytes
&lt;/h2&gt;

&lt;p&gt;The directory tree was &lt;strong&gt;not&lt;/strong&gt; hand-drawn and then backfilled. A &lt;strong&gt;scaffold generator&lt;/strong&gt; (template that emits the full tool tree) ran first; humans and agents edited only inside Permitted zones afterward.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Step&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Register the tool shape in an internal design registry&lt;/td&gt;
&lt;td&gt;Fix boundaries before line 1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Generator emits manifest, evals harness, test config&lt;/td&gt;
&lt;td&gt;Avoid cosmetic folder sprawl&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Edits stay in implementation workspace&lt;/td&gt;
&lt;td&gt;Keep oracle/eval truth out of generation paths&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Public vocabulary lives in &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt;. Internal ledgers are ops indexing—not something readers need to mirror verbatim.&lt;/p&gt;




&lt;h2&gt;
  
  
  CI mold — patterns you can copy
&lt;/h2&gt;

&lt;p&gt;After the smoke pillar passed once, we hardened the &lt;strong&gt;template&lt;/strong&gt; so new tools survive bare &lt;code&gt;python3&lt;/code&gt; on GitHub Actions matrices:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Move&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;main.py --help&lt;/code&gt; exits cleanly &lt;strong&gt;before&lt;/strong&gt; heavy imports&lt;/td&gt;
&lt;td&gt;survives venv-less CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;optional &lt;code&gt;.env&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;secrets-free matrices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;keep heavy type-check deps out of baseline requirements unless opted in&lt;/td&gt;
&lt;td&gt;deterministic smoke band&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;timeout&lt;/code&gt; wrappers on local diagnostics&lt;/td&gt;
&lt;td&gt;agents cannot hang infra silently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;sibling &lt;strong&gt;regression probe&lt;/strong&gt; tool&lt;/td&gt;
&lt;td&gt;tripwire if the template starts lying&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The probe is not a vanity metric—it catches "forge stayed green once" rot after refactors.&lt;/p&gt;




&lt;h2&gt;
  
  
  Local gates before push
&lt;/h2&gt;

&lt;p&gt;Rough checklist historically satisfied:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;Passing means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;python3 evals/run_evals.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;exit &lt;strong&gt;0&lt;/strong&gt;, no intentional skips&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;npx playwright test&lt;/code&gt; inside the tool's isolated test dir&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;1 passed&lt;/code&gt;&lt;/strong&gt;, scoped runs only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;repo &lt;strong&gt;layout compliance script&lt;/strong&gt; (structure audit)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;OK / no critical drift&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;pre-commit may re-run the structure audit so "green locally" leaks less often onto &lt;code&gt;main&lt;/code&gt;. Hooks (&lt;code&gt;PreToolUse&lt;/code&gt;, &lt;code&gt;exit 2&lt;/code&gt;) and CI are different layers with the same philosophy: &lt;strong&gt;stop right before merge or disk&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Commits as receipts (not folklore)
&lt;/h2&gt;

&lt;p&gt;We anchor milestones to short SHAs (your fork will differ—the &lt;strong&gt;pattern&lt;/strong&gt; is the point):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;SHA (prefix)&lt;/th&gt;
&lt;th&gt;What changed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;d303ece0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;initial smoke scaffold + manifest&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;85a524e0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;verification notes + metadata sync&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;2bcbb52c&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;import-order resilience for naked CI Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;9870fa67&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;template CI hardening + regression probe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;143dda68&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;tip where the cited graph was green&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;URLs rot. &lt;strong&gt;SHA + job lane names&lt;/strong&gt; travel better in outbound writing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why we skip raw Actions permalinks
&lt;/h2&gt;

&lt;p&gt;The monorepo is &lt;strong&gt;private.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A pasted &lt;code&gt;actions/runs/...&lt;/code&gt; badge &lt;strong&gt;&lt;code&gt;404&lt;/code&gt;s&lt;/strong&gt; outside the org and fingerprints repo ownership. For external readers we ship:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;commit SHAs (above)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;job lanes that were green together&lt;/strong&gt;—e.g. &lt;code&gt;evals-matrix&lt;/code&gt;, &lt;code&gt;independent-judge&lt;/code&gt;, Playwright smoke, structure-audit matrix&lt;/li&gt;
&lt;li&gt;cloneable &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; as vocabulary proof&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"We cannot show our CI UI" is fine if &lt;strong&gt;repeatable commands + public spec&lt;/strong&gt; remain inspectable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent-operated commits (with caveats)
&lt;/h2&gt;

&lt;p&gt;During this milestone, the &lt;strong&gt;human operator did not manually type&lt;/strong&gt; &lt;code&gt;git commit&lt;/code&gt; / &lt;code&gt;git push&lt;/code&gt;. An agent toolchain issued operations under consistent author metadata.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git metadata alone is forgeable.&lt;/strong&gt; Hence the layered receipts: evals, Playwright, structure audit, and an &lt;strong&gt;independent judge&lt;/strong&gt; job green on the &lt;strong&gt;same graph&lt;/strong&gt; as the cited SHA. "An agent did everything" ≠ "safe" without that stack.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hook denials — a separate receipt class
&lt;/h2&gt;

&lt;p&gt;Distinct from CI: &lt;strong&gt;&lt;code&gt;PreToolUse&lt;/code&gt; hook returns &lt;code&gt;exit 2&lt;/code&gt;&lt;/strong&gt; and the Write never reaches disk. That is &lt;strong&gt;execution-time denial&lt;/strong&gt; with a log excerpt—not prompt theater. Same family as &lt;a href="https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg"&gt;#003&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Independent judge lane
&lt;/h2&gt;

&lt;p&gt;A CI job reviews diffs with a &lt;strong&gt;vendor-separated model&lt;/strong&gt; from the authoring stack.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Letting the same session say "looks fine" is self-grading. That is &lt;strong&gt;verification contamination&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Scheduled CI embarrassment beats a chat message that says "all good."&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical limits
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Private repo narrative&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;method essay, not a file tour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;permissions: contents: read&lt;/code&gt;&lt;/strong&gt; in workflows&lt;/td&gt;
&lt;td&gt;narrower blast radius&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What we actually check before merge
&lt;/h2&gt;

&lt;p&gt;"This change is safe" shows up in agent chat all the time. &lt;strong&gt;We do not merge on that sentence alone.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We ask for the &lt;strong&gt;commit SHA and the CI graph&lt;/strong&gt;: &lt;code&gt;independent-judge&lt;/code&gt; and &lt;code&gt;evals-matrix&lt;/code&gt; green on the &lt;strong&gt;same workflow run&lt;/strong&gt;. Run ID, Actions export, or a screenshot—all fine.&lt;/p&gt;

&lt;p&gt;If that cannot be produced, the change waits. PRs with polished logs but no matching graph show up more often than you might expect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where this series goes next
&lt;/h2&gt;

&lt;p&gt;CI and hooks cover &lt;strong&gt;execution-time denial&lt;/strong&gt;. Silent production failures—no trace, no persistence—are &lt;strong&gt;&lt;a href="https://dev.to/aos_standard/four-ways-production-agents-silently-fail-and-the-physical-patterns-that-prevent-them-aos-v02-1c17"&gt;#005&lt;/a&gt;&lt;/strong&gt; plus &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  AOS Specification (GitHub)
&lt;/h2&gt;

&lt;p&gt;The "physical governance" approach in this article is formalized as &lt;strong&gt;AOS (AI Operating Standard)&lt;/strong&gt; — v0.2 adds runnable implementation examples.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;github.com/aos-standard/AOS-spec&lt;/a&gt;&lt;/strong&gt; — specification&lt;br&gt;&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;github.com/aos-standard/physical-agent-patterns&lt;/a&gt;&lt;/strong&gt; — patterns&lt;/p&gt;

&lt;p&gt;If useful, please ⭐ &lt;strong&gt;star the repo&lt;/strong&gt;. Issues and PRs welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>security</category>
      <category>agents</category>
    </item>
    <item>
      <title>Why AI Agents Don't Follow Rules — The Case for Physical Governance</title>
      <dc:creator>AOS Architect</dc:creator>
      <pubDate>Mon, 06 Apr 2026 23:18:38 +0000</pubDate>
      <link>https://dev.to/aos_standard/why-ai-agents-dont-follow-rules-the-case-for-physical-governance-382f</link>
      <guid>https://dev.to/aos_standard/why-ai-agents-dont-follow-rules-the-case-for-physical-governance-382f</guid>
      <description>&lt;h2&gt;
  
  
  The incident
&lt;/h2&gt;

&lt;p&gt;One repository carried &lt;strong&gt;north of 130 KB&lt;/strong&gt; of governance Markdown.&lt;/p&gt;

&lt;p&gt;An agent consumed it. It answered as if it had understood—then &lt;strong&gt;violated those same constraints on its very next Write/Bash.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That rarely means “needs more prompting.” Usually it means &lt;strong&gt;the enforcement moment is missing&lt;/strong&gt;: policy shows up during context load, tool calls happen later.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why prompt-only bans leak
&lt;/h2&gt;

&lt;p&gt;Teams still anchor on prose in prompts and markdown:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Aim&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;“Never mutate &lt;code&gt;evals/&lt;/code&gt;”&lt;/td&gt;
&lt;td&gt;keep evaluation oracle from being rewritten&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;“No Writes under &lt;code&gt;00_Management/&lt;/code&gt;”&lt;/td&gt;
&lt;td&gt;guard canonical governance text&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The trouble is reliance on &lt;strong&gt;attention at ingestion time.&lt;/strong&gt; Tool calls afterward are not mechanically tied to whether the agent “remembers.” It can skim, reroute, or hallucinate exemptions.&lt;/p&gt;

&lt;p&gt;Destructive UNIX commands behave differently: &lt;strong&gt;&lt;code&gt;rm -rf /&lt;/code&gt; arrives behind a syscall gate&lt;/strong&gt;, not a PDF. Hardware and OS designers assume humans forget; agents forget faster.&lt;/p&gt;

&lt;p&gt;Rough split:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Text-only policy  → warns once, when tokens are assembled
Physical gate       → denies the transition right before disk or shell
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  When the generator grades itself
&lt;/h2&gt;

&lt;p&gt;Separate problem: &lt;strong&gt;self-checking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the same conversational loop both authors an artifact and “confirms” it is fine, you import the same biases twice. Mostly not malice—the same shortcuts from generation bleed into adjudication.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A suite that &lt;em&gt;always&lt;/em&gt; green may be unplugged instrumentation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Structural fix: evaluations in &lt;strong&gt;different processes&lt;/strong&gt; (CI, ephemeral runs, reviewers) — not another chat turn in the same session.&lt;/p&gt;




&lt;h2&gt;
  
  
  What AOS stacks
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AI Operating Standard (AOS)&lt;/a&gt;&lt;/strong&gt; is a small vocabulary for &lt;strong&gt;where&lt;/strong&gt; governance lives. Three slices only:&lt;/p&gt;

&lt;h3&gt;
  
  
  1 — Zones
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Zone&lt;/th&gt;
&lt;th&gt;Meaning&lt;/th&gt;
&lt;th&gt;Typical write rule&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Oracle&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specs and test truth&lt;/td&gt;
&lt;td&gt;agents do not write here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Permitted&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;implementation workspace&lt;/td&gt;
&lt;td&gt;scoped by role&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prohibited&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;outside the agreed tree&lt;/td&gt;
&lt;td&gt;sovereign (human operator) clearance only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Oracle&lt;/strong&gt; is the piece that kills “tests red → loosen expectations.”Truth for pass/fail has to live where automation cannot casually patch it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2 — Roles
&lt;/h3&gt;

&lt;p&gt;Design / execution / approval stay &lt;strong&gt;explicitly disjoint.&lt;/strong&gt; When an agent crosses its lane, stop and escalate to a human. No sideways title upgrades.&lt;/p&gt;

&lt;h3&gt;
  
  
  3 — Physical enforcement
&lt;/h3&gt;

&lt;p&gt;Hooks (e.g. Claude Code &lt;strong&gt;&lt;code&gt;PreToolUse&lt;/code&gt;&lt;/strong&gt;) inspect JSON &lt;strong&gt;before&lt;/strong&gt; a Write executes. Typical outcomes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Try this&lt;/th&gt;
&lt;th&gt;Typical host response&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Write into an Oracle-marked subtree&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;exit 2&lt;/code&gt;&lt;/strong&gt; — canceled call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forbidden edit patterns (&lt;code&gt;sed -i&lt;/code&gt;, in-place truncation)&lt;/td&gt;
&lt;td&gt;same refusal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Trust is aimed at mechanics, not good intentions.&lt;/p&gt;




&lt;h2&gt;
  
  
  iron_cage in one breath
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;iron_cage&lt;/strong&gt; is just the working name we use for our PreToolUse wiring—it is not magic, it is &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS v0.1&lt;/a&gt; §§4.x&lt;/strong&gt; rendered as a handful of Python and settings.&lt;/p&gt;

&lt;p&gt;Behind it sit two habits we nicknamed &lt;strong&gt;Type-91 Governance&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Axis&lt;/th&gt;
&lt;th&gt;Aim&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Forensic isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;logs/hashes outsiders can reconstruct&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Physical isolation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;generation context is not where final evaluations live&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Specifications live in &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; on GitHub—&lt;strong&gt;iron_cage is one plausible answer.&lt;/strong&gt; For runnable detail, skim &lt;strong&gt;&lt;a href="https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg"&gt;the Hooks companion (#003)&lt;/a&gt;&lt;/strong&gt; first.&lt;/p&gt;

&lt;p&gt;Concrete examples of what vanished for us early on: Writes aimed at evaluator JSON under guarded paths and first attempts at &lt;strong&gt;&lt;code&gt;sed -i&lt;/code&gt;&lt;/strong&gt; on shared hosts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Machine-readable preamble
&lt;/h2&gt;

&lt;p&gt;Opening &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec/blob/main/AOS-v0.1.md" rel="noopener noreferrer"&gt;AOS-v0.1.md&lt;/a&gt;&lt;/strong&gt; with machine-facing instructions lets you anchor bans in something &lt;strong&gt;outside today’s ephemeral chat.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not “pretty please”; “this markdown is upstream of the prompt.” It does not automate compliance—it gives reviewers and automation a shared glossary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why publish wording at all
&lt;/h2&gt;

&lt;p&gt;Mid-2026, trust in autonomous diffs is still mostly vibes. Everybody reinvents oracle boundaries in private repos. Putting the vocabulary in &lt;strong&gt;&lt;code&gt;aos-standard/AOS-spec&lt;/code&gt;&lt;/strong&gt; tries to shave that tax—even if implementations differ.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Related&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long EN walkthrough (&lt;strong&gt;ledger &lt;code&gt;#003&lt;/code&gt;&lt;/strong&gt;): &lt;a href="https://dev.to/aos_standard/binding-ai-agents-with-physics-not-politeness-aos-v01-as-a-minimal-spec-29lg"&gt;&lt;code&gt;binding-ai-agents-with-physics...&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;CI-heavy companion (&lt;strong&gt;ledger &lt;code&gt;#002&lt;/code&gt;&lt;/strong&gt;): &lt;a href="https://dev.to/aos_standard/ai-governance-one-repo-one-smoke-tool-and-a-green-ci-run-28ae"&gt;&lt;code&gt;ai-governance-one-repo...&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Claude Code Hooks primer: &lt;a href="https://docs.claude.com/en/docs/claude-code/hooks" rel="noopener noreferrer"&gt;&lt;code&gt;docs.claude.com/.../hooks&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  AOS v0.1 Specification (GitHub)
&lt;/h2&gt;

&lt;p&gt;The "physical governance" approach described in this article is formalized as &lt;strong&gt;AOS (AI Operating Standard) v0.1&lt;/strong&gt; — a minimal, machine-enforceable spec for AI agent operations.&lt;/p&gt;

&lt;p&gt;👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/AOS-spec" rel="noopener noreferrer"&gt;AOS-spec&lt;/a&gt;&lt;/strong&gt; — specification&lt;br&gt;
👉 &lt;strong&gt;&lt;a href="https://github.com/aos-standard/physical-agent-patterns" rel="noopener noreferrer"&gt;physical-agent-patterns&lt;/a&gt;&lt;/strong&gt; — implementation patterns&lt;/p&gt;

&lt;p&gt;If you find this useful, please ⭐ &lt;strong&gt;star the repo&lt;/strong&gt;. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>security</category>
    </item>
  </channel>
</rss>
