Why AI Architectural Governance Needs Precedence Semantics

#programming #architecture #ai #devtools

Two architectural decisions overlap. An engineer in Cursor follows one. The async PR bot in CI follows the other. A reviewer signs off on a diff that flatly contradicts a decision that landed last week. Nobody is wrong. Nobody has the authority to be right. This is what AI coding governance looks like without a precedence layer — and it is the gap every system in the category is currently failing to close.

Most of the conversation about "AI coding governance" in 2026 is still about the wrong layer. Prompt rules. CLAUDE.md. .cursor/rules. RAG over an ADR folder. A reviewer agent on PRs. Policy docs in a wiki. Every one of these answers the question "how do we tell the model what we want?" — and not one of them answers the prior question: "when two of the things we want disagree, which one wins?"

That second question is not a corner case. It is the central question of any governance system that has to operate at the scale of a real codebase, with real exceptions, written by real teams over real years. And it is the question almost every tool in the category is currently dodging.

The missing layer has a name. It is precedence semantics. It is what makes governance deterministic, reviewable, and durable. Without it, "AI coding governance" is just a polite name for whichever rule the model happened to attend to this morning.

The collision nobody owns

Consider a perfectly ordinary situation. A team's data layer is governed by ADR-014: "All persistent data access goes through the repository pattern. No service may call an ORM session directly." The decision is sound. It has been the law of the codebase for a year.

Eight months later, the payments team writes ADR-022: "The payments service may invoke the Stripe SDK directly inside an idempotency-key boundary. The repository abstraction does not compose with Stripe's at-most-once semantics, and the ledger requires raw call ordering." Also sound. Also reviewed. Also accepted.

These two decisions overlap. Now four things happen at once:

An engineer in Cursor opens the payments service, sees ADR-014 because it was indexed first, and proposes a repository-based implementation.
An engineer in Claude Code, with a different CLAUDE.md, sees ADR-022 highlighted and writes a direct-SDK implementation. Both diffs get approved.
An async PR bot refactors an adjacent file and rewrites the call site to obey ADR-014, because that is what its system prompt encodes.
Six months later, a new engineer reads the codebase, sees both patterns in production, and asks: "which one are we supposed to follow?" Nobody can answer without re-litigating the conflict from scratch.

None of those four people are misbehaving. The codebase has two correct decisions and no mechanism to say which one wins where. That is the precedence problem — and it is exactly the problem that no prompt rule, no retrieval system, and no review process will ever resolve, because none of them are designed to resolve anything. They are designed to retrieve.

Why current systems can't resolve it

It is tempting to think that the conflict above is a content problem — that if the team had written ADR-022 more carefully, or pasted it higher in CLAUDE.md, the right answer would emerge. It is not a content problem. It is a structural one. Every governance substrate currently in widespread use is fundamentally a retrieval substrate, and retrieval has no opinion on conflict.

01. Prompt rules resolve by attention.
CLAUDE.md, .cursor/rules, .github/copilot-instructions.md all hand the model a block of text. When two rules disagree, the model picks one based on whichever it attended to more strongly under this temperature, this context length, this surrounding code. Reorder the file and the answer changes. That is not governance — it is a coin flip with extra steps.

02. RAG resolves by retrieval score.
Indexing the ADR folder and retrieving the top-k chunks per query feels rigorous. It is not. When two ADRs both score highly for "payments writes a charge," whichever the embedder happens to rank higher gets injected. Re-embed the corpus and the resolution can flip silently. The architecture is now a function of the vector index.

03. PR review resolves by whoever was looking.
If the reviewer who happens to be assigned remembers ADR-022, the diff lands correctly. If a different reviewer is assigned next time, the next diff lands the other way. The codebase ends up with both patterns and no record of which decision governed which file. The conflict is resolved by social process, and social process does not scale across async agents.

04. Policy docs resolve nothing at all.
The wiki page is updated, sometimes. The Notion entry is written, occasionally. Neither of them is on the path between a model and a diff. They are referenced when a human looks them up — which is exactly when the resolution is already most expensive.

The common failure underneath all four is the same: none of them have an opinion on which rule applies when several do. They surface rules. They do not resolve between rules.

What precedence semantics actually is

Precedence semantics is the small body of rules a governance system uses to answer one question, every time, the same way: "given the current task, the current file, the current scope, and the full set of architectural decisions, which decision actually applies here, and which loses?"

The answer has to be deterministic. Not "the model probably picks the right one." Not "the reviewer usually catches it." Deterministic, in the engineering sense: same inputs in, same answer out, every time, regardless of which agent or model or temperature is on the other end.

It also has to be reviewable. The resolution has to be explainable as a chain of named facts — this scope was more specific, that decision supersedes the earlier one, this status retired the override — not as an opaque embedding score.

Governance is not memory retrieval. Governance is deterministic conflict resolution over architectural constraints. Retrieval is one input to that resolution — not a substitute for it.

The five axes of resolution

A useful precedence engine resolves over a small, finite set of axes. Five of them carry almost all the weight in real codebases.

Axis	What it answers	Example
Status	Is the decision in force at all?	A `deprecated` ADR loses to any `accepted` ADR touching the same scope, even if the deprecated one was more specific.
Supersedes	Does this decision explicitly retire an older one?	ADR-031 carries `supersedes: ADR-014`. Wherever they overlap, ADR-014 is treated as deprecated.
Scope specificity	Whose scope is narrower?	ADR-022 (`services/payments/`) beats ADR-014 (`services/`) inside its narrower scope.
Priority	When scopes are equal, who is authoritative?	A security-class ADR (`priority: critical`) wins over an ergonomics-class ADR (`priority: normal`) at the same scope.
Temporal	If everything else ties, the newer decision wins.	Two equal-priority, equal-scope, neither-supersedes-the-other decisions resolve by acceptance date. Tiebreaker, not primary driver.

Two things matter about that table more than the contents of any one row. First: the axes are evaluated in a declared order, not improvised per query. Second: each axis is a fact carried by the decision itself — status, supersedes, scope, priority, accepted_at are properties an ADR declares, not inferences a model has to make. Once declared, resolution is a finite computation, not a guess.

This is why precedence semantics is the missing layer. Every term in the table is already familiar to anyone who has written ADRs for more than a year. What is new is the claim that resolving over them is a system responsibility, not a reviewer's job to do in their head.

Governance is deterministic conflict resolution

Once precedence semantics is named, the whole category reframes.

The retrieval framing: Governance is about getting the relevant constraints into context. The model takes it from there. Conflict resolution is whatever the model produces, hopefully on most runs.

The precedence framing: Governance is about computing, deterministically, the single constraint that governs this scope. The model receives that constraint, not the conflict. The same inputs produce the same answer in every agent.

The retrieval framing makes the model the conflict-resolution engine. That is exactly the wrong place to put it. Models are excellent at code synthesis under a constraint and unreliable at choosing between constraints. The precedence framing keeps the model out of the resolution and lets it do what it is good at.

Said differently: the architectural truth of a codebase is not allowed to depend on which agent ran the query. If it does, the codebase does not have an architecture — it has a sample.

Governance is a compiler problem

The frame that follows from all of this is that an AI-era governance system is shaped like a compiler.

The input is the same kind of thing teams already write: ADRs, design documents, exceptions, and the relationships between them. The output is a single, queryable, scope-aware representation that every agent — interactive, async, in CI — can ask the same question of and get the same answer. Between the two is a pipeline:

01. Normalize  — ADRs to canonical facts
02. Resolve    — precedence over the five axes  [the missing layer]
03. Compile    — one constraint per scope
04. Enforce    — pre-gen inject · post-gen check
05. Trace      — "this diff applied ADR-022 at ..."

Each stage already has a recognizable analog in software engineering. Normalization is what a parser does. Resolution is what a type checker or constraint solver does. Compilation is what an intermediate representation is for. Enforcement is what a pre-commit hook or CI gate is for. Traceability is what build provenance is for.

What is new in 2026 is that all five together describe a layer above the agent — not a feature inside any single one.

Compilers are deterministic by construction. Their outputs are reproducible. Their decisions are explainable. Their failures are localizable. Every property AI coding governance currently lacks is a property a compiler-shaped governance layer would have by default.

What this changes for engineering leaders

For an engineering leader looking at the category in 2026, the practical implication is that the question to ask of any "AI governance" pitch is no longer "does it read our rules?" It is: "what does it do when two of our rules disagree?"

If the answer is "the model figures it out," the system is a retrieval layer with a governance label on it. If the answer involves embeddings, ranking scores, or "we tune the prompt," the system is doing statistics, not governance. If the answer involves a declared resolution order over status, supersedes, scope, priority, and time — it is doing the actual job.

That distinction matters more than the surface-level tool category. A team can run Claude Code, Cursor, Copilot, Windsurf, and three in-house SDK agents on the same codebase — as most engineering orgs already do — and still have a coherent architecture, but only if the layer underneath has a deterministic answer to "which decision applies here."

The engineering teams that win the next cycle will not be the ones that picked the best assistant. They will be the ones whose architecture survived having a portfolio of assistants — because the layer that decided what the rules were did not live inside any of them.

Originally published at mnemehq.com