Bala Paranj

Posted on Jun 20 • Edited on Jun 25

The Viability Test Every AI-Dev Architecture Fails

#ai #softwaredevelopment #architecture #engineering

✓ Human-authored analysis; AI used for formatting and proofreading.

This article builds on the harness response article The Harness is Half the Architecture. The five-element framework draws from TRIZ's Law of System Completeness (Altshuller, extended by Savransky and Mann), applied here to AI-assisted development.

The law underneath the diagnosis

A previous article argued that the dominant agent architecture — "Agent = Model + Harness" is missing its other half: independent verification, declared intent, coordination protocols, and subtraction discipline. The Harness is Half the Architecture

That article diagnosed the gaps. This one prescribes the architecture. A structural law about what a working system requires, drawn from a framework that analyzed millions of patents to identify what makes engineered systems viable.

A working system has five essential elements. If any element is missing, the system does not work. If any element fails, the system does not survive. You cannot trim an essential element to ship faster. You can only replace it with something that serves the same function. Deleting it produces a non-viable system — which is the current agent architecture state.

The five elements are not new ideas. They've been identified independently across three separate fields — engineering (TRIZ), cybernetics (Beer's Viable System Model), and economics (Nalebuff and Brandenburger's value-net model). Three frameworks built from different data, arriving at the same five-part structure. That convergence is evidence the decomposition is real, not an artifact of one school's vocabulary.

Elements of a Viable System

The five-element template is not an architecture. It is a completeness law — a domain-neutral blueprint of the elements any viable system must contain. TRIZ calls it a Law, Beer calls it a Viable System Model, Nalebuff and Brandenburger call them the elements of a game. Aviation calls the same structure envelope protection. Nuclear calls it defense in depth. Control theory calls it a closed-loop regulator. None calls it an architecture.

A specific realization — fly-by-wire, reactor interlocks, pre-trade risk checks, or the AI-dev stack below — is an architecture: one way to instantiate the five elements. So the claim here is not "here is the right architecture" but the stronger, testable one:

The current AI-dev architecture is incomplete because — tested against the same viability blueprint that aviation, nuclear, trading, and control systems independently satisfy — it is missing two of five essential elements.

This matters because "our architecture is better" invites "says who?" But "the viability invariant that seven unrelated domains already satisfy, and your architecture doesn't" is a different kind of claim. It's falsifiable: show the two missing elements are present, and the claim falls. Nobody has shown that, because they aren't present. This includes comparison of the current AI architecture by OpenAI, Anthropic and LangChain.

The five elements of a complete AI-dev system

Every working engineered system has these five elements. Let's take a look at these elements for AI-assisted development.

1. The Tool — the model

The model is the working tool. It's the value-adding primary activity. The thing that produces the artifact. Code, infrastructure templates, documentation, tests. This is the focus of the harness article and the industry invests in it most aggressively.

In the current architecture, the model is treated as the system. Everything else is "harness" — support infrastructure for the primary component. The law says otherwise: the tool is one of five. Tool is not the system. It is not even the most important element. A working tool without a control unit is a machine that produces output nobody can verify. A working tool without an engine is a machine that runs without direction. The model is essential. It is not sufficient.

2. The Engine — declared intent

The engine is the driving force. In AI-assisted development, the engine is the specification — the human-authored declaration of what correct means, what the system must satisfy and the intent behind the work.

Without a specification, generation is undriven. The model produces output, but toward what? A prompt is a proto-specification. It has intent in it, but it's ephemeral, ambiguous, and discarded after use. A specification is durable, precise, versioned, and mechanically checkable. It's the difference between telling someone what you want once in conversation and writing it down in a contract both parties can refer.

The engine powers implementation. The spec drives generation. Without it, the model runs but goes nowhere — which is the failure mode of "vibe coding," where generation feels productive but the output doesn't converge toward a defined target because no target was defined.

3. The Transmission — delivery and contracts

The transmission carries the engine's energy to the tool's output. In AI-assisted development, the transmission has two layers:

The delivery rail — CI/CD, GitOps, Infrastructure as Code. The mechanism that moves a change from commit through build to deploy. This is the part most teams have, and have well. It's fast, automated, and reliable.

The contract layer — machine-readable contracts, schemas, interface definitions, and structured exports between agents. When multiple agents work on a shared codebase, the transmission between them isn't a shared filesystem — it's declared constraints each agent's output must satisfy. Coordination through specifications, not shared state.

The transmission is present in most current architectures. It works. That's the problem — a fast, reliable transmission with no control unit faithfully delivers unverified changes at machine speed. CI/CD without a verification gate is the postal service delivering bombs with perfect reliability. The transmission isn't broken. The control unit that should ride on it is absent.

4. The Control Unit — the independent oracle

The control unit measures output against the specification, issues a deterministic verdict, and feeds corrections back into generation. This is the element the current architecture is missing.

The control unit is not the model reviewing its own output. Self-verification is correlated with the generator's blind spots — Huang et al. (ICLR 2024) established that it degrades rather than improves reasoning. The control unit must be independent of the generator and must operate at a higher level of logical certainty than the tool it verifies. You don't verify a bridge with a poem — you verify it with physics. You don't verify probabilistic output with another probabilistic system — you verify it with a deterministic one: a type checker, a property-based test suite, a contract enforcer, a CEL predicate evaluator, a Z3 proof — anything that is mechanical, deterministic, and uncorrelated with the model's failure modes.

It's bidirectional, not just a gate. The oracle doesn't merely pass or fail output. It feeds corrections back into generation — the generate → verify → fix loop. A failed assertion tells the model what failed and why, steering the next generation attempt toward the specification. The control unit governs the tool; it doesn't just inspect the output.

It operates on the transmission, not beside it. Verification isn't a separate pipeline. It's a gate that attaches to the CI/CD rail. Every change that travels the transmission passes through the oracle. Strip the gate and the transmission faithfully delivers unverified changes. The oracle rides the rail.

5. The Casing — boundaries and continuity

The casing is the connection between the system and its environment. In AI-assisted development, it has two faces:

Boundaries — Parnas-style information hiding, hexagonal architecture, module isolation. The casing around each component that prevents agents from reaching across boundaries into modules they don't own. Without boundaries, agents cross them at the first opportunity, and the system loses the architectural integrity that makes independent verification possible.

Continuity — the stance that the system assumes its own incompleteness and compensates through continuous monitoring and update. Vassilev's NIST proof (June 9, 2026) formalized this: no finite set of rules is universally robust. The casing holds the system against an adversarial, changing environment — not by claiming completeness, but by continuously expanding coverage and accepting that gaps will always exist.

The casing also includes the subtraction discipline — the boundary between what belongs in the system and what dilutes it. Scope management. Named essence. The force that prevents accumulation from becoming bloat. This isn't trimming an essential element (the law forbids that). It's trimming scope and features to keep the system coherent — the difference between removing a brake system (non-viable) and removing a feature that blurs the product's identity (increased ideality).

In practice, this is the power to say no — and in 2026, it's a cost-of-survival issue, not a design preference. Agentic bloat — where agents add ten thousand lines to solve a ten-line problem — is a major driver of spiraling inference bills. Every unnecessary line generated is tokens burned, surface area expanded, and verification load increased. A casing without subtraction is a system that accumulates indefinitely, and the bill for that accumulation arrives monthly in the inference invoice. The subtraction discipline is the element that bends the cost curve: less generated, less verified, less maintained, less attacked.

The diagnosis using the law

Map the current "Model + Harness" architecture against these five elements:

For diagrams, refer: https://gist.github.com/sufield/3d511936af9f80914f7e8622e0e0b1cb

Strong and Present:

Tool (the model) — heavily invested, rapidly improving, the focus of the industry.
Transmission (CI/CD) — mature, fast, reliable. Most teams have this.

Strong and Absent:

Control Unit (independent verification) — the system strongly needs it but it's completely absent. Replaced by self-verification, which is not verification. The most critical missing element.

Weak and Absent:

Casing (boundaries + continuity) — both poorly understood and absent. No enforced module boundaries for agents, no continuous-monitor stance, no subtraction discipline. The industry doesn't yet recognize this element needs to exist.

Weak and Present:

Engine (intent/specification) — present as prompts and AGENTS.md files, but ephemeral and not mechanically checkable. A proto-engine, not a real one.

Per the law: if any element is missing, the system does not work. The current architecture has a powerful tool, a fast transmission, and a weak engine — but no control unit and no casing. By the law's definition, it is an incomplete system. It produces output. It cannot verify that output is correct. It cannot maintain architectural integrity over time. It cannot distinguish what belongs from what dilutes.

The Law of Non-Uniform Evolution names which element lags and why: subsystems evolve at different rates. The tool (generation) evolved fastest — models improved dramatically in capability, speed, and cost. The transmission (CI/CD) was already mature. The engine (specification) is emerging but weak. The control unit (verification) barely exists. The casing (boundaries) is absent. The system's capability is constrained by its weakest element, not its strongest. This is the reason why a more powerful model makes the problem worse, by generating more unverified output faster.

The complete architecture

  Developer                                                            Deployed
   Intent ──────► ENGINE ────────► TRANSMISSION ────────► TOOL ──────► Software
  (spec,          (spec-            (CI/CD,              (model,
   contracts,     driven             GitOps,              generation)
   invariants)     dev)              contracts)
                                        │
                                        │ ◄── CONTROL UNIT rides the rail
                                        │     (oracle: verify, gate, steer)
                                        │
                          ┌─────────────┴─────────────┐
                          │ CASING                    │
                          │ boundaries, continuity,   │
                          │ subtraction, drift guards │
                          └───────────────────────────┘

The model generates. The specification drives generation toward a declared target. The transmission delivers changes. The oracle verifies each change against the specification before it passes — and feeds corrections back when it doesn't. The casing holds the system's architectural integrity against the environment and against its own tendency to accumulate.

Every element is a peer. None serves the model. Each serves the system. A failure in any one is a failure of the system's primary function — producing correct, intent-aligned software — regardless of the strength of the other elements. The model's excellence cannot compensate for an absent oracle, any more than a powerful engine compensates for failed brakes.

Multiple agents at once — the concurrency test

The five-element law asserts completeness for one agent. The case that matters in practice is N agents editing the same system simultaneously — the scenario the harness article lists as "an open research problem." The completeness law answers it, and the answer is simpler than the framing of "orchestrating hundreds of agents" suggests.

The core principle: agents don't overwrite each other's progress because they never work on the same thing. The problem is decomposed so each agent works independently on a bounded task, owns a bounded module, and communicates with other agents only through interfaces. This is Parnas's information hiding applied to agents: program to an interface, not to an implementation. Each module has a contract. Agents produce output that satisfies their module's contract. New functionality composes through the interfaces. Existing features are reused, not rewritten.

This is not a new idea. It's the same decomposition discipline that makes microservices, Unix pipelines, and library ecosystems work: bounded units, declared interfaces, composition through contracts. The reason it looks like an "open research problem" in the harness frame is that the harness frame has no specification layer — and without declared interfaces, agents have no boundaries, so they inevitably step on each other. The solution isn't a smarter orchestrator. It's the boundaries and contracts that make orchestration unnecessary.

Every web developer has already experienced it at work. A frontend team and a backend team start a project on the same day. Nothing exists yet — no frontend, no backend. They work in parallel from day one, independently, without overwriting each other's progress. How? They agree on the REST API contract first. The frontend team builds against the contract. The backend team builds behind it. Neither needs to know how the other works. Neither can corrupt the other's module. They compose through the interface. When both sides are done, the system works — not because someone orchestrated their keystrokes, but because both sides satisfied the same contract.

Replace teams with agents and the architecture is identical. Agent A builds the backend behind the API contract. Agent B builds the frontend against it. They don't coordinate through a shared filesystem. They don't need an orchestrator watching both. They compose through the declared interface. The "open research problem" of multi-agent coordination is a problem every engineering organization solved the moment they split work across teams with a contract between them. The only reason it reappears as unsolved for agents is that the harness frame has no concept of a contract.

The five elements map onto this directly:

Shared Engine = the coordination protocol. All agents target the same declared spec. The spec defines the interfaces, the module boundaries, and the contracts. Agents conform to one shared set of declarations rather than negotiating with each other.

Deterministic Transmission = coordination through interfaces, not shared state. Agents communicate by producing structured output through declared interfaces — never by writing to a mutable shared blackboard where concurrent writes race. Same input produces byte-identical output. Every contribution is provenance-tracked and diffable. Parallel agents are safe because they're separated by interfaces, not competing on shared files.

Per-agent Control + meta-level oracle on the composed result. Each agent verifies its own output against its module's contract. The meta-level oracle is the safety net: it verifies the composed result — the assembled system after all agents' outputs are integrated through their interfaces — against the cross-cutting invariants that no single module's contract can express. Compound reasoning, transitive closure, cross-resource path analysis. This catches the one thing decomposition alone can't: two modules each satisfying their own contract but composing into a system that violates a cross-cutting property.

Boundaries = mechanical isolation. Enforced at build time, module boundaries mean agents on different modules physically cannot corrupt each other. Drift guards stop agent A from re-introducing what agent B deleted.

Why it scales. N agents = N viable five-element sub-systems, each owning a bounded module behind an interface. The system of agents is itself a viable system one recursion level up, held together by the shared spec and composed through the declared interfaces. You don't orchestrate them imperatively. You decompose the problem, declare the interfaces, and let each agent self-regulate against its own contract.

What it guarantees and what it doesn't. It guarantees consistency: no globally-invalid composed state ships, because agents can't overwrite each other (decomposition) and the composed result is verified against cross-cutting invariants (meta-level oracle). It does not guarantee throughput: agents may contend on interface definitions or serialize at the composition gate. The law yields a viable multi-agent system, not an optimal-throughput one — throughput is a separate optimization layered on top. And there's one hard dependency: the meta-level oracle must reason across the composed output (compound, transitive), and the spec must be expressive enough to encode the cross-module invariants.

Agents don't need orchestration. They need decomposition, interfaces, and contracts — the same things that make any modular system work. The missing piece was never a smarter coordinator. It was the specification layer that defines the boundaries agents work within.

Why the harness frame can't discover this

"Agent = Model + Harness" is "the tool plus everything else." That frame was a transitionary metaphor from an era that assumed the model was the center of the universe and everything else existed to serve it. In 2026 we know better: the contract is the center of the universe. The model is one element — the tool — in a system whose viability is determined by the spec it builds against, the oracle that verifies it, and the boundaries that contain it. The harness frame collapses four distinct elements — engine, transmission, control unit, casing — into one subordinate category. It guarantees under-investment in three of them, because the frame only has vocabulary for "things that serve the model." The control unit doesn't serve the model — it governs it. The casing doesn't serve the model — it contains it. The engine doesn't serve the model — it drives it. None of these are discoverable from inside a frame that asks "what does the model need?" They're only discoverable from a frame that asks "what does the system require to produce correct software?"

The law answers that question: five elements, each present, each a peer, each essential. That's the complete architecture. Everything else is an incomplete system generating output it cannot verify, toward a target it hasn't declared, inside boundaries it doesn't enforce — at machine speed.

You might ask: the architecture is clear, but can it work at the scale the industry is building toward — hundreds of agents, billions of tokens, thousands of changes a day? The honest answer: the architecture is correct, and performance is an optimization layered on top of a correct architecture. That's the right order. Optimizing an incorrect architecture — self-verification with no independent oracle, shared filesystem as coordination, no specification to verify against — is optimizing the wrong thing faster. The teams treating multi-agent coordination as an "open research problem" aren't stuck on performance. They're stuck on the architecture — because the frame they're working in has no vocabulary for the two elements that solve it. The engineering challenge of making a compound-reasoning oracle fast enough for production scale is real work. It is not research. It is the kind of engineering that happens after the architecture is right, not before.

The system does not care who sits in the Tool slot. It does not care if a human is typing code at a keyboard or an agent is generating it without touching one. The five elements are the same. The interfaces are the same. The contracts are the same. The verification is the same. The boundaries are the same. A viable system requires an Engine, a Transmission, a Tool, a Control Unit, and a Casing — regardless of whether the Tool is a person or a model. That is why the solution was already known: every organization that successfully coordinates human teams through contracts, interfaces, and independent review has already built a complete system. Replacing the human with an agent doesn't change the architecture. It changes the speed. And speed without the other four elements is the definition of the problem, not the solution.

The completeness law: Altshuller's Law of System Completeness (TRIZ), extended by Savransky (fifth element: casing) and Mann (comparison with Beer's Viable System Model and Nalebuff/Brandenburger's Co-opetition). The independent convergence of three frameworks on the same five elements is the evidence the decomposition is structural. The recursion property (every viable system contains viable sub-systems) is Beer's contribution and grounds the multi-agent concurrency treatment. Applied here to AI-assisted development; applied separately to cloud security posture in Cloud Computing is Missing One Component. Everyone Builds the Wrong Five.

DEV Community