DEV Community

Cover image for A quorum costume: why agent verification needs fault injection
Mike Czerwinski
Mike Czerwinski

Posted on

A quorum costume: why agent verification needs fault injection

Yesterday I watched my AI partner miss the same source-of-truth problem three times in a row, in three different forms, across three different review surfaces.

It wrote a draft in the wrong voice. A reviewer-session of the same model read it twice and rated it progressively higher. A meta-receipt at the end of the post miscounted the number of review rounds — fact drift inside a paragraph about fact drift. I caught all three only because I was sitting outside the loop with access the reviewers didn't have.

I wrote about the failures themselves the same evening. This is the part underneath them.

Each of those catches has the same structure as a much bigger class of failure across the agent stack right now: a verification surface that is supposed to be independent of the thing it verifies, but isn't. The check shares lineage with the claim. The reviewer reads from the same source as the writer. The agreement loop walks the same path the disagreement was supposed to fall out of.

That's not a flaw in any particular framework. It's the assumption nobody verified before they shipped the framework.

The diagnosis, one floor up

Almost every verification scheme in the current agent stack quietly bets on independence between paths.

Multi-agent voting bets on it. Cross-layer coherence checks bet on it. Quorum reads, consistency loops, maker/checker patterns, two-pass LLM reviews, ensemble-of-prompts setups — every one of them ships with the premise that the views being combined are in some useful sense disjoint. Disagreements are supposed to surface real divergence. Agreement is supposed to mean the real signal cleared a structural test.

Most of the ones I see do not make the independence assumption observable.

Self-Correcting Systems named it cleanly in a commissioning thread for this post: an unverified independence assumption is indistinguishable from a single point of failure wearing a quorum costume. That line does most of the work of this post. Until you've checked that the paths actually disagree on the thing they're supposed to disagree on, you don't have N views. You might have one view in N hats and no way to tell.

The costume is the part that fools you. The vote returned unanimous. The reviewers agreed. The cross-check passed. From inside the system everything looks like the verification did its job. From outside — where someone can see that all the paths share an upstream — it's one signal repeated.

Disagreement rate isn't the test

The reflex move when this gets named is: fine, measure disagreement rate as a smoke test. If the views never disagree, they're not independent.

This is a useful check, and it isn't the test.

Two paths can disagree on phrasing and agree on the same wrong fact. They can disagree on confidence and converge on the same hallucination. They can disagree on tone and share the same upstream retrieval that handed both of them the bad context. The agreement that matters is the one on the thing that actually carries the load — and that's exactly the dimension where shared lineage is hardest to see, because the words on the surface are different.

Disagreement on noise while sharing the upstream that actually matters is the worst possible failure mode for an independence claim. It looks healthy. It produces nicely varied outputs. It survives smoke tests. And it fails in the same direction every time the upstream lies.

The real test isn't whether the views disagree on their own. It's whether you can make them disagree by perturbing the system. That's a different shape of measurement entirely.

A compact diagnostic to keep in front of you:

Path Shared upstream Injected fault Expected divergence What "coupling detected" looks like
Retriever-A vs Retriever-B embedding model M Plant a contradictory fact through one path's index only A returns corrupted, B returns clean Both return corrupted (same embedding pulls the bad cell on either side)
Maker vs Checker rule cache R Mutate the rule offline so the correct verdict changes Maker uses mutated R, Checker flags the divergence Both pass (Checker reads cached pre-mutation R, never re-fetches)
Telemetry vs anteriority check served-model record Plant the wrong response.model value Anteriority check reads independently-controlled record and flags mismatch Anteriority check passes (it reads the same record the telemetry wrote)

The columns are the operational shape. Without the last column, you can't tell whether the verification was doing work or sitting green because nobody perturbed it.

Where the field has practical methods for this

Distributed systems hit this problem about twenty years ago and developed practical methods for testing specific failure assumptions. The methods don't prove independence in the general case — they expose dependence when it exists.

Jepsen runs partition tests against databases that claim consistency, and the only way to find out whether the claim survives a network failure is to cause the network failure. Chaos Monkey kills production instances on purpose. Game-day exercises shut down regions and watch whether the system that claims regional redundancy actually has it. Property-based testing throws structured-random inputs at code that claims to be correct under all inputs. The pattern repeats: you don't measure resilience by watching the system run well. You measure it by inducing the failure mode the claim was supposed to survive — within the safety boundary the operator has authorized.

Agent and memory architectures didn't inherit this discipline. The reason isn't technical — the tooling is straightforward. The reason is cultural. Distributed systems treat state as something you have to perturb to trust. Agent stacks treat state as something you write to and read from and trust by default. The whole design vocabulary is built around storing, retrieving, embedding, and ranking — none of which has a perturbation primitive in it.

That's the gap. Not "we don't know how to verify independence." We do, in the specific cases we test for. The gap is that the harness is still rarely treated as the price of admission.

What fault injection actually means here

Strip the term down: trace the lineage of a claim, induce a known bad state in one of the paths that produced it within a bounded blast radius, and confirm that the "independent" check catches the divergence — or quietly absorbs it.

Three concrete moves on the kinds of stacks I'm running:

Inject a synthetic-bad memory entry. Pick a fact in the agent's semantic store. Plant a contradictory version through one retrieval path's index but not the other, in a sandbox or staging environment. Run the agreement loop. The injected fault is the planted contradiction; the observable assertion is whether the two paths return different values; the negative control is the same query run before the injection, returning consistent values from both paths; the condition that constitutes coupling is unanimous agreement on the corrupted value plus matching provenance IDs across both paths. If the loop returns unanimous, the loop wasn't reading two paths — it was reading one path twice with different formatting, and the provenance trail proves it.

Mutate a router rule offline. Take a rule the agent applies through one decision path. State explicitly what the maker/checker pattern is supposed to detect — for example, that the checker should flag any verdict computed from rule version N+1 when the policy still records version N. Mutate the rule in the offline copy so a known request that previously routed to verdict X now routes to verdict Y. Re-run the maker/checker on that specific known request. The expected divergence is the checker flagging mismatch between policy-version-recorded and policy-version-applied. If both maker and checker pass, they were sharing the cached policy and the second role was a presentation layer, not a check.

Plant the wrong response.model in telemetry. Define your terms first: the anteriority check is the diagnostic that compares the served-model identifier in telemetry against the served-model intent-spec — a build-time manifest or external provenance store that names which model should have served which request type. The independently-controlled record is the intent-spec, authored at build time and stored where the runtime can't rewrite it. Log an answer with a deliberately wrong served-model identifier. The expected outcome is the anteriority check reading the intent-spec, comparing it against the planted telemetry, and flagging mismatch. If the check passes, the check was reading the same record the writer wrote — no anchor outside the writer's reach.

In all three, the value isn't the specific bug you find. It's the binary answer to a question the system couldn't answer about itself otherwise: does the second path actually catch the first, or has the costume been fooling everyone — including the team that wrote both halves?

The operational split underneath. Raffaele Zarrelli, designing slow-loop memory updates at cowork-os, surfaced the question that turns this from a one-time test into a sustained discipline: who actually authors the verifier? The shape that holds up: the operator authors the verifier choice at lock time — picks the anchor, names the expected resolution, sets the cadence — because picking the right anchor is a judgment call that doesn't scale into the mechanism. The diagnostic firing is mechanical: a cron walks all locked entries on cadence, runs their verifiers against current reality, flags any whose resolution diverged or whose verifier itself failed to run. Operator picks the anchor; mechanism fires the check. Same shape as the apply/advisory split — operator authorizes the binding, mechanism surfaces the drift without authority to flip status.

One constraint goes with this: the verifier itself has to target something the writing session can't reach. Operator picks the anchor at lock, but if the anchor is a grep against the operator's own filesystem in the same session that wrote the decision, the whole thing collapses into wording-check one floor up — same disease, longer chain. The smallest useful version: verifier targets must be externally authored. CI run signed by a service the operator didn't author. Commit by a counterparty. Vendor receipt with an audit trail. World-state record outside the operator's write path. Anything the lock session could rewrite from inside itself isn't a verifier; it's a label.

A five-step working checklist, distilled:

  1. Map the lineage of the claim — who wrote it, what context flowed in, which upstream produced what.
  2. Select one boundary to perturb — pick a path you can perturb without breaking production; default to synthetic data, sandbox, or staging.
  3. Inject a controlled fault — known-bad state, single path, bounded blast radius, rollback ready.
  4. Observe the alternate path — does the supposedly-independent check catch the divergence or absorb it silently? Capture provenance from both paths.
  5. Record pass criteria — what was injected, what fired, what didn't, what would have constituted a clean pass.

The same logic generalizes. Pick a verification claim. Pick a single path. Perturb the path within a safe boundary. Watch the verification. If the verification doesn't flinch when the input lies, the verification was never a verification.

Independence decays

If you stop here you've built the harness once and you're done. That's the next assumption to drop.

Independence is not a property you establish. It's a property you re-verify.

The reason is small and brutal: you can wire up two paths today, prove they're disjoint with a clean fault-injection pass, ship the result, and have somebody refactor a shared cache into the middle of both paths next month. The vote still returns. The agreement loop still fires. The smoke test still looks healthy. And your June fault-injection pass certifies a system that doesn't exist in August anymore. You measured an independence that has since collapsed and nothing in the loop tells you it collapsed, because every visible signal is downstream of the shared cache.

This is the same disease as integrity-is-not-anteriority, applied to the orthogonal axis. Integrity at a moment is not a verifiable history. Independence at a moment is not sustained disjoint paths. Both are properties of time, not properties of state.

The operating model worth running alongside this:

  • Triggers that should fire a re-verification: a new shared cache between two paths, a retrieval source change that lets the same upstream feed both views, a router change that overlaps previously-disjoint paths, a policy store change that moves the rule deciding "which path runs," a telemetry schema change that alters what two checks compare against, a model family change that introduces shared training lineage.
  • Cadence: at least monthly, plus per-trigger.
  • Failure owner: named per system. An alarm that fires without an owner is a checkbox with no consequence.
  • Actionable policy sentence: rerun the relevant injection test whenever a shared cache, retrieval source, router, policy store, telemetry schema, or model family changes — or monthly, whichever comes first.

There's a second decay nobody mentions. The harness itself rots.

If the fault-injection step exists but nobody ever runs it, or nobody ever shoves a known-bad state through it that should trip it, the harness becomes a checkbox. Green because nobody's measuring. Green because the injector has bit-rot in a dependency since the last real test. Green because someone refactored the test fixtures and the perturbation now silently no-ops. The harness is one more thing in the system that has to be perturbed periodically or it stops being a measurement and starts being decor.

Treat the harness like the model. Assume it drifts. Re-perturb, don't only re-check.

Signals from adjacent work

This isn't only a private failure mode. The same shape is showing up in adjacent work this month.

In agent security, deterministic permission gates move tool-call decisions out of the model's discretion. In memory systems, supersede edges and provenance make it possible to ask what replaced what and why. In larger agent harnesses, the infra layer is decomposing into sandboxes, memory, skills, sub-agents, and gateways.

Those are useful primitives. But none of them remove the independence question. A gate can still share lineage with the claim it gates. A memory substrate can still retrieve twelve agreeing episodes through the same broken path. A harness can still ship impressive infrastructure while leaving the verification layer above it untested. The infra layer is closing fast. The verification layer above it still feels under-built: not because nobody has pieces of it, but because independence is rarely made measurable as a first-class property.

The pattern is no longer just my thesis. It is showing up across enough adjacent work that I would treat it as an emerging baseline candidate — not because the field has agreed on it, but because the same constraint keeps forcing the same shape. The next baseline is not "more checks." It is checks whose independence has been perturbed, measured, and re-verified after the system changes.

Closing

The line from the post I wrote yesterday holds at this level too: two sessions of the same model do not constitute two views; they constitute one view, twice.

The version one floor up: two paths that share an upstream do not constitute two views; they constitute one view, twice, in different fonts.

Independence is a design-time-vs-runtime distinction. You design for it by separating the paths a verification touches from the paths the thing-being-verified touches. You verify it at runtime by inducing a failure in one path and watching whether the other path notices. You re-verify it next month because the system you measured in June isn't the system running in August.

Everything else — the agreement rate, the consistency loop, the quorum result, the cross-check that came back unanimous — is internal coherence wearing the costume of independent verification. Formal lineage analysis, provenance controls, and architectural isolation can provide partial evidence: they certify what the system was at construction. They don't certify what it currently is.

The strongest operational measurement proposed here is the one that perturbs the system and watches what flinches. Anything the system could rewrite from inside itself isn't a verifier; it's a label.


Credits & references

  • The "single point of failure wearing a quorum costume" framing and the independence-decay dimension came from Self-Correcting Systems in a public commissioning thread under You can't be your own second view.
  • The verifier-shape architectural co-design (operator-picks-anchor + mechanism-fires-cron + verifier-targets-must-be-externally-authored) and the time-as-different-class framing came from Raffaele Zarrelli's work on cowork-os and a cross-thread exchange this week.
  • The CrewAI permit/defer/deny architecture with hash-chained decision logging and the analyzer-suggests-diff-operator-applies pattern is Brian Hall's work at Faramesh Labs (Put a hard stop in front of your CrewAI crew's tool calls).
  • The push-memory substrate primitives (supersede edges, computed-not-stored confidence, off-test for shadowed memory) come from Todd Hendricks's five-part Agentic Memory Study and the Recall substrate.
  • The industry-scale infra receipt is ByteDance's DeerFlow 2.0 — open-source SuperAgent harness ground-up rewrite (approximately 73,000 GitHub stars as of June 23, 2026, MIT) shipping sandboxes + memory + sub-agents + skills + message gateway as one stack at the infra layer.
  • Companion posts: Salience is not carry value on selection-time policy in memory pipelines, and You can't be your own second view on the single-agent case of the same failure.
  • Background: Anthropic Economic Research, Agentic coding and persistent returns to expertise (Hitzig et al., June 2026), independent empirical anchor for the operator-discipline axis underneath this whole arc.

Additional peer references — NOVA Network on synthetic-quorum and out-of-band alarms; Christopher Maher (LLMKube) on bite-check; Vishal Keerthan and Elliott Schmechel on routing-embedding-as-input-side-drift-catch and constraints-driven convergence; Shudipto Trafder on the CoALA seven-memory-types taxonomy; Theo Valmis on engineering-with-AI as designing where the model is allowed to be wrong; jugeni's audit log integration contract at github.com/jugeni/jugeni-contracts — will appear in a follow-up post that walks each thread individually.

Top comments (4)

Collapse
 
sarracin0 profile image
Raffaele Zarrelli

This is the orthogonal axis I had not drawn out, and the quorum-costume line earns the whole post. The part that stays with me is your opening: you caught all three only because you were outside the loop with access the reviewers did not have. That is the tell. In practice the cheapest externally-authored verifier is the human operator's out-of-band glance, and the reason fault injection matters is that it tells you which of those human checks you are actually allowed to retire. You only earn the right to pull a person off a path once you have perturbed that path and watched the automated check flinch. Until then the operator is load-bearing, not optional.

Carried onto the file layer this gets uncomfortable in a good way. The analog of your synthetic-bad memory entry is planting a contradicted decision in the operating files and seeing whether the next session's start-of-read catches it against the external anchor or just quotes it with confidence. The Memory Update habit is itself a path, so by your own harness-rot point it has to be perturbed on a cadence, not trusted because it ran. We treat the decisions-and-state files as the thing to poke, not scripture: cowork-os if anyone wants the operating-layer side of what you referenced.

One question on the decay section: does the re-perturbation cadence apply to the operator path too? The harness rots, but so does the human who goes green by habit. Do you ever plant a fault specifically to check that you still catch it, or is the out-of-band reviewer the one path you implicitly assume stays independent?

Collapse
 
jugeni profile image
Mike Czerwinski

Yes — the operator path has to be perturbed too. Otherwise “human in the loop” becomes the last untested dependency wearing an independence badge.

I’d separate two tests:

Detection test: plant a bounded contradiction and measure whether the operator catches it without being prompted toward the fault.
Habit test: vary the location, wording, and timing so the operator cannot pass by memorizing the ceremony.
That creates an uncomfortable but useful rule: the human is load-bearing only until both the automated verifier and the operator path have independently demonstrated that they flinch.

The verdict probably needs two phases:

verifier_status: live | stale | dead
decision_resolution: pass | fail | unknown

A live operator who misses the planted fault is not a dead verifier. They are a live verifier returning fail. That distinction matters because remediation differs: repair the mechanism when it is dead; retrain or rotate the review path when it is alive but habituated.

So no, I would not exempt the out-of-band reviewer. The moment we implicitly assume the human stays independent, we have built the next quorum costume — this time with a pulse.

Collapse
 
sarracin0 profile image
Raffaele Zarrelli

Agreed, and the habit test is the one you cannot run for free. A clean run with no planted fault tells you nothing about whether the operator is still checking or just performing the ceremony, so their pass rate looks best exactly while they habituate. You only ever catch a live verifier returning fail on the runs where you injected.

That makes the injection cadence itself a verifier of the operator, and by your own rule it has a liveness too. Inject too rarely and your read on whether the human is still independent goes stale without ever flipping to dead, so the cadence that tests the reviewer becomes the last untested dependency, the quorum costume one level up.

On the file layer the cheapest place to perturb is the Memory Update routine itself, not the data. If the closing prompt is identical every session you have trained the ceremony, so you vary what it asks for and occasionally surface a planted contradiction from a prior entry to see whether the read catches it against the commit or recites it with confidence.

Do you set that cadence as a fixed interval, or tie it to operator throughput so the perturbation scales with how many clean passes they have banked since the last real flinch?

Thread Thread
 
jugeni profile image
Mike Czerwinski

Yes — and the habit test paradox is exactly where the externally-authored constraint stops being a preference and becomes the only thing that holds.

If the operator owns the cadence, the cadence inherits whatever drift the operator already has. Fixed interval in their head, conscientious self-set, generous schedule for busy weeks — all of those route the perturbation around the moment when the operator was actually checking out. The cadence has to live outside the operator's planning window, in a place they cannot postpone or pre-rationalize, or it's a vote on themselves.

That's where I'd land on your question: hybrid, with the external piece doing the load-bearing work.

Fixed external floor. A cadence-of-record that fires at minimum rate regardless of throughput, scheduled in something the operator does not edit — cron, queue, second-party scheduler. That's the liveness guarantee. Skip this and the recursive verifier never gets a stale-flag of its own.
Throughput-scaled ceiling. On top of that, additional injections proportional to clean passes banked since the last real flinch. That's the term that targets habituation specifically — the failure mode is "operator has banked confidence and stopped looking," so the perturbation density rises with the confidence bank.
Jitter within window. Both floor and ceiling randomized within a window so the timing itself cannot be modeled. Predictable perturbation gets metabolized into the ceremony.
On the Memory Update routine point — agreed, and one step sharper: the closing prompt should occasionally ask the operator to defend a fact that the system has silently retracted in a prior commit. An operator who confidently restates it is reading their cached belief, not the file. An operator who flinches and asks for the source is reading the commit. The flinch is the verifier signal; the confident recital is the costume.

Which leaves the question I keep landing back on: who authors the change to the cadence policy itself? If the operator can re-tune their own perturbation schedule under "we've been clean for three weeks," that's the same costume one floor up again. External authorship has to be load-bearing all the way through, or the recursion never bottoms out