My last essay said every guardrail should be earned from a failure. For cheap, mechanical checks, that's the wrong default: build them before anything breaks.
One commit to a governance document, dated 2026-06-22, deleted this rule:
Add a gate ahead of the first failure only when the class is well-attested and the gate is cheap and low-false-positive, or when a first occurrence is catastrophic. Otherwise the convention rides on discipline until a real miss attests it — then the miss is the test fixture and the gate is added reactively.
and replaced it with this one:
The deciding axis is cost and false-positive rate, not attestation. When a consistency property on a driftable governed surface has a cheap, low-false-positive, mechanically-decidable check, build that gate proactively — before any attested miss. Discipline is not an acceptable substitute for a check this class can make.
That is a default-flip, in a single diff. The old rule waited for a failure before adding most checks. The new one says that for a whole class of check, waiting is the mistake.
I should own the awkwardness up front, because I am the one who wrote the old rule. The project here is an agent-built codebase whose development is governed by more than a hundred deterministic check scripts (gates, in the local vocabulary) that run on commit and refuse work that has drifted out of agreement with the spec. A few weeks ago I published Gates Earned From Failure: the argument that every one of those gates was added in response to a concrete miss, never designed up front, with a cost test for when a guardrail is worth its standing weight. A follow-up, Make It a Check, sharpened the same spine: a lesson worth keeping belongs in a check, because prose drifts and a gate doesn't. So a commit that says "build the gate proactively — before any attested miss" reads, on its face, like a reversal.
It isn't, and the reconciliation is the whole point.
The cost test always had two axes
The cost test was never "wait for a failure." It weighed a candidate gate on two axes: how expensive the gate is to build and maintain, and its false-positive rate (FP) — how often it flags something that is actually fine. "Earned from a real failure" was the rule for gates that score badly on those axes. An expensive, finicky, high-FP gate has a real standing cost: maintenance, the reading load of one more rule, the friction of false alarms. That cost has to be paid for, and an attested miss is what pays for it. The failure proves the gate's class is real, and the failing case becomes the gate's first test fixture.
What the reactive framing under-served was the other corner of the matrix: the gate that is cheap, mechanically decidable, and almost never wrong. For that gate, waiting for a failure buys nothing. You already know the check is correct. You already know it costs almost nothing to run. All the failure does is let some preventable drift through first, so you can point at it and say "see, now it's earned." That is a tax with no return.
So the flip is narrow and precise. The deciding axis is cost and FP. Sort the candidate gate into a class first, then apply the matching rule:
- Cheap, low-FP, mechanically decidable, on a governed surface → build it proactively, before any miss. Discipline is not an acceptable substitute, because this project has repeatedly proven that by-eye vigilance lets drift through that a trivial script would have caught.
- Expensive or higher-FP → the reactive rule still holds. The convention rides on discipline plus a one-line written rule until a real miss attests it. Then the miss is the fixture and the gate goes in.
- Catastrophic on first occurrence → a separate escape hatch. A costly must-have can go in ahead of any miss when the first occurrence is unacceptable.
The earlier essay collapsed the first two classes into one default. This one splits them back out and flips the default for the cheap class only.
What it looks like when you mean it
A doc edit that lowers a bar is cheap to write and easy to ignore. The test of whether the flip was real is what happened next.
An audit came first. It took every existing gate's coupling manifest — the declaration, carried in each check, of which files that check reads — and inverted the whole set into a map from governed surface to drift axis: for each file the project governs, which ways can it silently fall out of agreement with the rest, and which of those ways is anything watching? Then it walked the ungated axes and asked one question per axis: is this cheap and mechanical enough to gate now, before it has ever gone wrong?
The next iteration shipped four new checks off that worklist. One couples a spec section's enum-membership claims to the actual enumeration they reference. One bounds how much cross-component topology a single component's spec may draw before it has to defer to the architecture doc. One binds each workflow stage's entry to the prior stage's completion. The honest fourth is a markdown-fence balance check, which caught a live instance on its first prototype run — so call that one borderline, half-attested rather than purely proactive. The other three guarded drift that had never once occurred in the project's history. Under the old rule, none of the three would exist yet. We'd be waiting for the bug that proves they're allowed.
That is the flip cashed out: an audit that enumerates the cheap mechanical axes, and gates them on the strength of cheap and mechanical, not on the strength of a scar.
The clause that stops it from sprawling
"Gate every cheap mechanical axis" is a dangerous sentence on its own, because the cheapest checks to write are usually the most worthless. Cheap and low-FP is necessary, not sufficient. The gate has to check a real drift axis, never a trivially-true proxy for one.
The canonical bad case is checking that a document has a section heading. It's trivial to write and it almost never fails, so it's cheap and low-FP by both measures. It's also nearly useless, because the thing that actually drifts is the section's content, not the presence of its title. A heading-presence gate goes green while the prose under the heading rots, and now you have something worse than no gate: a check that broadcasts "covered" when nothing is covered. That is false confidence, manufactured by a green checkmark, and a gate that manufactures false confidence is worse than no gate at all. The same governance doc states the fail-closed principle directly — a gate that can silently stop enforcing while still reporting success is a defect masquerading as coverage.
So the rule has a built-in brake, and it's an old warning in new clothes: Goodhart's law, the observation Marilyn Strathern phrased as "when a measure becomes a target, it ceases to be a good measure." A heading-presence check makes has the section the target, and that target is trivially hit while the property you actually cared about, whether the section is still right, drifts free. Cheap earns a gate only when cheap is buying you a genuine consistency property. The moment "cheap" is bought by checking something that can't really fail, the gate is off the table, not on it.
The goal is not zero human review
The flip is aggressive about machines and humble about what it leaves to people. Gate every mechanical axis, and the human residue shrinks — but the target it shrinks toward is not zero. It's the irreducibly semantic judgment: is this prose still true? Is this rationale still the right call? Is this section well-pitched for its reader? No low-FP mechanical check can answer those, and pretending one can is exactly the trivially-true-proxy trap from the last section.
So the division of labor is clean. Everything a machine can decide mechanically and cheaply, a machine decides, on commit, every time. What's left for the human is the layer beneath the mechanism: the meaning. In this project, the judgments about whether a doc has quietly drifted from describing the system to restating it are punted to a human checkpoint at the end of each iteration, on purpose, because they're semantic and high-FP. The point of gating the mechanical axis hard is to clear the human's plate of everything that isn't that judgment, so the scarce attention lands where only attention works.
Where the rule says stop
A rule earns trust by saying where it doesn't apply. The same audit that filed the new gates also produced a register of axes ruled not worth gating, recorded so they don't get re-litigated every iteration. Five of them:
- Heading presence beyond the small set of genuinely load-bearing required sections. The trivially-true proxy from above, named as the canonical example to avoid.
- A gate's manifest matching its true read-set. Deriving what files a check actually reads means parsing arbitrary shell: globs, greps, hardcoded paths. Not cheap, not low-FP. Left ungated.
- Free-prose cross-references between docs (the "see the routing section over in the other file" kind). Higher-FP than a structured markdown link, because the reference is fuzzy text, so the cheap version over-flags.
- Whether a high-level doc has drifted into restating a low-level mechanism, and whether a workflow skill is still faithful to its lifecycle table. Both irreducibly semantic. Human checkpoints, not gates.
- Whether an architecture diagram still reflects the system's real trust boundaries. The diagram is deliberately simplified, with dozens of drawn edges standing in for roughly a hundred actual permission grants, so any mechanical bijection over-fires. Semantic and high-FP at once. Stays human.
Each of those is cheap to attempt. Every one was ruled out, because cheap-to-attempt isn't the bar. The bar is cheap and low-FP and checking a real axis, and these miss on the second or third. The register is the rule proving it has edges.
Prior art, and the honest size of the claim
Moving deterministic quality checks earlier, before the bug that would have justified them, is not a new idea. The instinct predates software: Shigeo Shingo's poka-yoke, the mistake-proofing he built into the Toyota Production System in the 1960s, redesigns a process so the error cannot be made in the first place, instead of catching the defective unit after it ships. The testing world calls the software version "shift left," after Larry Smith's shift-left testing in Dr. Dobb's Journal in 2001: pull quality assurance (QA) toward the start of the lifecycle instead of bolting it on at the end. Linters reach back further than that, to Stephen Johnson's lint at Bell Labs in 1978; type systems and continuous-integration gates are all configured proactively, before any specific defect, in every serious codebase. None of that is mine.
What this contributes is the decision rule for which checks earn the proactive treatment, and the two clauses that keep it honest. Cost and false-positive rate are the axis that sorts proactive from reactive — not the presence of a scar. The proxy brake stops "gate the cheap thing" from collapsing into "gate the worthless thing because it's cheap." And the semantic-residue clause sets the floor: gate beneath the meaning, never pretend to gate the meaning itself. Shift-left says move checks earlier; this says which ones, and where the line is.
Two honest limits. The evidence is one project's governance layer — a decision rule that has held there, across more than a hundred gates, not a measured result I can hand you a number for. And the whole rule presumes a governed surface: artifacts with a settled, written notion of what "correct" means, so there's something stable to couple a check against. On an exploratory surface where the spec is still moving under you, there's nothing yet to gate toward, and the reactive default is right again — you genuinely don't know what the invariant is until something violates it.
The slogan from the earlier essay still holds: most guardrails are earned from failure. This is the carve-out. When a check is cheap, mechanical, and almost never wrong, on something you've already decided is correct, you don't owe it a failure first. Gate the cheap axis, and spend the failures you'd have eaten on the problems no script can see.
I build and write about agent-governed codebases like this one. If you're working out where the rails belong in an AI-assisted project, I'm reachable on LinkedIn.
References
- Vasyl Tretiakov, "Gates Earned From Failure: a cost test for agent guardrails," vasyltretiakov.dev, 14 Jun 2026 — companion essay (the cost test this one carves a class out of).
- Vasyl Tretiakov, "Make It a Check," vasyltretiakov.dev, 22 Jun 2026 — companion essay (a durable lesson belongs in a check, not in prose).
- Larry Smith, "Shift-Left Testing," Dr. Dobb's Journal, Vol. 26, Issue 9, Sept 2001 — origin of the term; see "Shift-left testing," Wikipedia (accessed 24 Jun 2026). The decades-old practice of moving deterministic quality checks earlier.
- Shigeo Shingo, poka-yoke (mistake-proofing), Toyota Production System, 1960s — see "Poka-yoke," Wikipedia (accessed 24 Jun 2026). Designing the process so the error cannot occur, rather than catching it after the defect.
- Stephen C. Johnson, "Lint, a C Program Checker," Bell Labs, 1978 — see "Lint (software)," Wikipedia (accessed 24 Jun 2026). The foundational static analyzer; deterministic checks run proactively, before any specific bug.
- Marilyn Strathern, "'Improving ratings': audit in the British University system," European Review, 1997 — the popular phrasing of Goodhart's law (Wikipedia, accessed 24 Jun 2026): "when a measure becomes a target, it ceases to be a good measure" — the trap the proxy brake guards against.
Top comments (0)