What we actually did
Inside a repo running under AOS v0.1 semantics, we stood up a thin smoke pillar pinned to tooling paths—example:
02_Production/A0000-A0999/A0000-A0099/0001_Phase_4A5_Smoke/
No hero demo; it exists so automated regressions bite when someone “helpfully” rewrites evals.
Blueprint before bytes
The metal mold traces to our internal blueprint ledger (IMPERIAL_BLUEPRINT_300.md, block BP-0001, log_id: FSP). That registration happens before 0005 emits the tree—not after someone hand-draws folders. AOS v0.1 on GitHub is the normative vocabulary; blueprint rows are orchestration bookkeeping in our monorepo.
Forge path: 0005_Template_Generator output—we avoid hand-painted directory trees pretending they passed review.
Mold-line CI (Phase 4A′.1 essentials)
Goals you can plagiarize elsewhere:
| Move | Purpose |
|---|---|
main.py --help exits cleanly before heavy imports fail bare CI Python |
survives venv-less matrices |
dotenv optional |
same |
| Forge template keeps heavy pyright wiring out of baseline requirements unless explicitly opted in | deterministic smoke band |
timeout wrappers on local diagnostics |
agents cannot hang infra forever silently |
Regression sibling 0002_Template_Ci_Probe exists so “forge stayed honest” survives future refactors—not bragging rights, tripwire.
Internal verification narratives (inside the vault, not pasted here verbatim) archived under STEP_4Aprime_1_verification_* filenames if your clone carries them—omit if missing.
Local gates before any push attempt
Rough checklist historically satisfied:
| Command family | Passing meaning |
|---|---|
python3 evals/run_evals.py |
exit 0, no flaky skips |
npx playwright test inside the tool fortress |
1 passed, fortress-only scope |
0061_Core_Vitals.py --scope a0000 |
OK / NO RED ALERT |
Hooks may also re-run 0061 pre-commit so “green locally” leaks less often onto main.
Commits cited as anchors (not folklore)
Abbreviated lineage we still map postmortems to:
| SHA (prefix) | What changed |
|---|---|
d303ece0 |
initial forge scaffolding + manifest wiring |
85a524e0 |
verification notes + manifest sync sweeps |
2bcbb52c |
CI import-order resilience for naked Python runners |
9870fa67 |
Phase 4A′.1 forge mold + 0002 probe landing |
143dda68 |
companion Dev prose committed alongside aforementioned green CI graph |
Treat SHAs like receipts: reproducible anchors when URLs rot.
Why we avoid raw Actions permalinks
Monorepo is private.
A pasted actions/runs/... badge looks persuasive until it 404 for everyone outside org—and it fingerprints owner/repo strings we prefer not to embed in outbound screenshots.
Portable bundle today: SHA + enumerated job lane names + archived internal verification prose (STEP_4A_5_*, STEP_4Aprime_1_*) if your tree includes them.
If skeptical, audit our public spec repo instead—it is cloneable proof of vocabulary, distinct from proprietary CI chrome.
Representative lanes that actually ran green on the cited graph alongside smoke + eval matrices included independent-judge, evals-matrix (bands), vitals-matrix, fortress Playwright smoke, and 0001_Phase_4A5_Smoke where wired.
Exact graph drift over time: trust fresh CI, not screenshots.
Plan A shorthand
Codename Plan A: during that milestone epoch, sovereign human did not type literal git commit / git push keystrokes—the agent toolchain issued operations under a predictable identity (Cursor Agent <cursor-agent@local> in metadata snapshots).
Interpret claims charitably: git metadata alone is forgeable. We lean on layered logs + policy text + repeatable commands above.
Blocked Writes with exit codes
Separate receipt class: PreToolUse hook denial with exit 2 proving calls never reached filesystem—documented historically in STEP_1_6_verification_*.log style artifacts inside ops logs if your fork tracks them.
If a governance pitch cannot cite executable interception plus a persisted log excerpt, readers still mostly have marketing prose.
Independent judge lane
Vendor-separated model reviews code changes—distinct from authoring stack.
Same stochastic author cannot be sole proofreader forever without verification contamination.
CI embarrassment scheduled weekly beats “model smiled approvingly.”
Practical limits (read before dunking replies)
| Constraint | Meaning |
|---|---|
| Private repo narrative | method blog, not file tour |
permissions: contents: read in workflows |
narrower blast radius |
Standard vocabulary pointer
Neutral vocabulary—not our lore dump—lives [AOS v0.1 draft](https://github.com/aos-standard/AOS-spec).
One-line skeptic challenge (comments)
Ask boosters insisting “safe rollout”:point to Actions graph screenshot or export where independent-judge, full eval band, fortress smoke—all green simultaneously on enumerated SHA.
No receipt → cosplay governance.
Internal SSOT breadcrumbs (vault clones only): STEP_4A_5_verification_2026-04-12.md, STEP_4Aprime_1_verification_2026-04-12.md plus LLM/SDK migration appendix STEP_4A_3_verification_* naming pattern.
AOS v0.1 Specification (GitHub)
The "physical governance" approach described in this article is formalized as AOS (AI Operating Standard) v0.1 — a minimal, machine-enforceable spec for AI agent operations.
👉 github.com/aos-standard/AOS-spec
If you find this useful, please ⭐ star the repo. Issues and PRs are welcome — the spec is designed to evolve with real-world usage.
Top comments (0)