Couple Both Ways: bidirectional checks against silent drift

#ai #architecture #llm #programming

The coupling-check pattern: catching the drift between any two artifacts that are supposed to agree.

A class diagram in my project claimed that two repository finders were each backed by a remote procedure call (RPC). Neither call existed. The diagram had been generated, reviewed, and committed, and it sailed through every check I had aimed at my diagrams, of which there were already three. It described a system slightly richer than the one I had actually built, and nothing flagged the difference.

It passed because each of those three checks looked at a different axis. One validated that every type named in the diagram resolved to a defined class. One read the architecture diagram's sequence operations. One forced a specific glyph to carry a citation. Each was a perfectly good check. But none of them coupled this claim, a finder asserting a backing call, to the place that claim was supposed to resolve: the actual service definition in the component's wire contract. The drift wasn't inside any one check. It lived in the seam between them, in the one direction nobody had wired up. The fix was a fourth check that closes the seam: every finder in the diagram must name a backing RPC that resolves to a real declaration in that component's protobuf, qualified by component so a pointer to the wrong service still fails, or else be tagged explicitly as proposed-but-unbuilt work. Run against the existing diagram, it found the two finders pointing at nothing. They had been asserting un-built calls, quietly, through every prior gate.

I've come to think that small episode is the most transferable thing in the project, more so than the terminology linter it grew out of, which I wrote up in a companion piece, Rails, Not Rules. If the story stopped at "I wrote a checker for my diagrams," it would be a tip. What made it generalize was noticing that the check was an instance of a shape, and the shape was reusable almost everywhere.

The shape

Call it bidirectional manifest coupling. By manifest I mean any artifact that declares an intent some other artifact is supposed to honor: a glossary entry, a spec section, an edge in an architecture diagram, a service in a wire contract, an item in a work queue, even a document's own table of contents. The defining trait is that a manifest makes a claim about something else. The glossary claims the code uses these words. The diagram claims these components talk over these protocols. The finder claims a call backs it.

A coupling check picks two artifacts that are supposed to stay in agreement, and that drift apart silently because nothing connects them, and it fails when they disagree, in both directions. Often one endpoint is the manifest and the other is plain reality: the on-disk code, the filesystem, the set of services that actually exist. Just as often both endpoints are documents, and the check couples one declaration to another. The glossary and the code are one pair. The diagram and the spec are another. Once you have the move in hand, the pairs are everywhere.

Why both directions

Bidirectionality is the load-bearing word, and it's exactly what the opening story was missing. A one-way check lets the drift relocate to the direction you didn't cover. "Every component named in the diagram exists on disk" is a fine rule, and it would have caught a diagram that referenced a deleted module. It would have done nothing about my two phantom finders, because that drift ran the other way: the diagram asserted a call the code didn't provide. Coverage in one direction quietly grants a license to drift in the other.

So the cleanest checks in the project assert closure both ways at once. The architecture-coverage check is the tidy example. The diagram carries an explicit manifest of its components and edges, and the script asserts that the component set matches the on-disk components exactly, every listed component has a spec and every spec'd component is listed, and that every edge names endpoints the manifest knows about, with the wire-protocol edges further required to resolve to a real service declaration. When I built it, twenty-two components and forty-two edges agreed; a fault-injection test confirmed it goes red on a bogus edge. There is nowhere for a disagreement to hide, because both the presence and the absence of a thing are checked from both sides. That is the actual goal of the whole exercise. Not preventing the agent from ever being wrong, which is hopeless, but guaranteeing that when it is wrong, something turns red before I have to notice by eye.

The first run is a free audit

Building the check is also the first time you find out whether the two artifacts agreed in the first place. Authoring a coupling gate quietly assumes the surfaces it joins are already consistent, and they usually aren't; that latent disagreement is most of why the gate was worth writing. I added a membership check between two restated state-transition tables only after clearing them by eye, reading both and judging them consistent. Its first run failed anyway: three other surfaces agreed that one state could transition to a terminal one, and the table I'd eyeballed as canonical was the single place that had silently dropped the edge. The by-eye clearance slid right over a missing row; the mechanical set-difference did not. So I treat a new coupling check's first-run failures as findings, not as setup noise to quiet down, and I budget the time to chase them. The check earns its keep twice: once as the backfill audit that runs the day you write it, and again as the guard that holds the line afterward.

The pairs are everywhere

Once the shape is in hand it stops being a clever trick and becomes the default move. The commit where I mechanized a completely different pair names it outright: coupling a document's section index to its sections, it describes itself as "the same manifest-coupling pattern as the other check scripts." By then I wasn't inventing anything. I was reaching for a stamp I already owned.

The pairs accumulated fast. The glossary and the code, the original pair, where a retired term must not reappear anywhere the compiler can't see. An architecture diagram and the spec it claims to depict. A class diagram's types and their definitions. The wire contract and the spec that documents its fields, checked for field-level congruence so a renamed field can't drift between the two. A TODO left in the code and the entry that's supposed to track it in the work queue. The tool surface an agent is allowed to call and the underlying RPCs, checked for symmetry so neither grows an orphan. A document's table of contents and the document. Each pair gets one check that says: a claim on this side must resolve to something on that side, and the reverse. None of these is clever in isolation. The leverage is in recognizing that they are all the same check wearing different nouns.

The discipline that keeps it honest

Three things stop this from degrading into a wall of red noise, and all three are worth more than the pattern itself.

The first is that a blocking check has to be false-positive-free, and the reliable way to get there is a single unambiguous syntactic trigger plus opt-in for everything else. In the behavioral-coupling check, the trigger is one diagram glyph that marks a claim as "derived." A claim wearing that glyph must carry a backing citation to a spec section or a proposed task; any other claim may opt in to the same discipline but isn't forced to. Because the glyph is the one thing that fires the rule, the check cannot misfire on prose it was never meant to read. This is the same constraint I'd put behind every gate I trust: if you can't define the trigger crisply enough to be false-positive-free, the check will get bypassed, and a bypassed gate is worse than none. A coupling check that flags honest work teaches everyone to ignore it.

The second is an escape hatch for work that is legitimately ahead of itself. A check that demands two artifacts agree exactly will go red the moment you deliberately let one lead the other, and design-ahead-of-code is a normal, healthy state: a diagram or a spec describing where the system is going before the code catches up. A strict bijection with no relief for that would fire on every honest in-progress diff, which is the red-noise failure arriving by a second route. So each coupling carries a release valve. A claim that hasn't landed yet gets tagged as proposed, with the tag naming the open work item that will close the gap, and the check accepts the divergence as long as the tag is there. That turns a hard "these must match" into "these must match, or the gap is tracked and intentional." The work queue becomes the ledger of what is allowed to be out of sync and why; when the tagged work lands and the tag comes off, the check snaps back to demanding exact agreement and catches anything that merged still mismatched. A bijection you can't ever postpone isn't enforceable on a moving project. A bijection with a tracked exception is.

The third is knowing precisely what the check does and doesn't prove, and the honest answer is narrower than it looks. A coupling check verifies resolution, not correctness. It confirms that a claim points at something real, and it breaks the instant that something is renamed or deleted. It does not, and cannot, tell you the claim points at the right thing. The check's own documentation says this in as many words: the forced citation breaks on rename or removal but does not detect semantic contradiction. My finder check proves each finder names a call that exists; it has no opinion on whether that's the call the finder ought to be using. That sounds like a weakness, and it is a real limit, but it's also most of the value. The overwhelming majority of drift in a fast-moving, agent-built codebase is exactly this: a referent that quietly stopped existing, a name that moved, a manifest describing a system one refactor out of date. Catching all of that mechanically leaves your actual judgment for the question a script was never going to answer.

Prior art, and where this sits

Coupling code to a declared rule is old and well-trodden. ArchUnit lets you write tests that assert layering and dependency direction against compiled Java; dependency-cruiser does the equivalent for JavaScript and TypeScript; the whole tradition of fitness functions, from Ford, Parsons, and Kua's Building Evolutionary Architectures (O'Reilly, 2017), is about automated checks that protect an architectural property as a system evolves. In Birgitta Böckeler's harness-engineering vocabulary, every one of these, mine included, is a sensor: something that observes after the agent acts and lets the work self-correct. I'm not claiming the mechanism.

What I'd add is a direction and a generalization. The established tools aim almost entirely at architecture, couple code to a single rule set, and mostly run one way: the code must obey the manifest. The shape turns out to be more general than that. The second endpoint is frequently not code at all but another document, and the artifacts most prone to silent drift in an agent-built project are precisely the ones the compiler never opens: the diagrams, the glossary that carries the Domain-Driven Design (DDD) ubiquitous language, the wire contract, the work queue, a handbook's index. Those are where an agent's output and its own description of that output come apart, because nothing downstream forces them back together. Pointing the same bidirectional coupling at those pairs, and insisting it run in both directions, is the part I had to work out for myself.

Honest limits

This is one solid personal project, roughly 150,000 lines of Rust across thirty-odd crates and twenty-two components, with one person owning every artifact end to end. Treat it as a worked example, not a benchmark. The economics that make these checks cheap, a single mind deciding what each manifest means, are exactly what I haven't tested under shared ownership.

The pattern also has real edges. The annotations that make a check resolvable, the citation on a diagram glyph, the backing tag on a finder, are themselves a surface that can rot, and they cost something to keep current. False-positive-freeness depends on a clean syntactic trigger existing; where no such trigger is available, you fall back to opt-in or to plain discipline, with all the looseness that implies. And the resolution-not-correctness limit is permanent, not a bug to be fixed later: these checks guarantee your manifests describe a system that exists, never that it's the system you should have built. What they buy back is the attention you'd otherwise spend chasing stale references by eye, and in a codebase that an agent is reshaping several times a week, that turns out to be most of the attention you have.

There is a subtler failure than red noise, and it's the one I'd warn a reader about hardest: a coupling check is only as good as the channel it actually joins. I once wrote a check coupling alerting rules to the code that emits the metrics they name, and it passed clean: every rule named a real metric. But those values traveled two pipes that happened to share one namespace, and five rules named metrics emitted only on the pipe the alert engine never reads. The strings matched perfectly while the feature underneath was dead. The check was green over a coupling that didn't physically connect. Picking the hub, the specific producer the consumer truly reads from rather than every place the string appears, is its own design step, and getting it wrong buys you false confidence, which is worse than the false noise the discipline above is built to avoid.

I build domain-dense systems by directing coding agents, where the vocabulary and the contracts have to stay honest as the code moves under them. If that's a seam you're working too, I'm reachable on LinkedIn.

Written by directing an AI agent, the same way the platform it describes was built. The editing and the judgment are mine.

References

Vasyl Tretiakov, "Rails, Not Rules: Enforcing a Coding Agent's Domain Vocabulary with Checks," vasyltretiakov.dev, 1 Jun 2026 — companion essay.
Neal Ford, Rebecca Parsons, and Patrick Kua, Building Evolutionary Architectures: Support Constant Change. O'Reilly, 2017. ISBN 978-1491986356.
Birgitta Böckeler, "Harness engineering for coding agent users," martinfowler.com, 2 Apr 2026 (accessed 9 Jun 2026).
ArchUnit — architecture unit-testing for Java (accessed 9 Jun 2026).
dependency-cruiser — validate and visualize dependencies for JavaScript and TypeScript (accessed 9 Jun 2026).
Eric Evans, Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley, 2003. ISBN 978-0321125217.

Published at vasyltretiakov.dev.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.