DEV Community

Bala Paranj
Bala Paranj

Posted on

The Harness is Half the Architecture. Here's the Half That's Missing.

Vivek Trivedy's "The Anatomy of an Agent Harness" is one of the better-written article on AI agent infrastructure this year. It defines the harness clearly — "Agent = Model + Harness; if you're not the model, you're the harness" — and systematically derives each component: filesystem for durable storage, bash for general-purpose tool use, sandboxes for safe execution, memory for continual learning, compaction for context management, the Ralph Loop for long-horizon work.

The infrastructure engineering is real and useful. The derivation from model limitations to harness features is clear. If you're building an agent, you need most of what the article describes.

The problem isn't what the article covers. It's the frame the article thinks in, and it's worth naming precisely because it's the same frame the entire industry uses.

The engine is not the car

"Agent = Model + Harness" is "Car = Engine + Harness." That equation tells you what the builder thinks is the system and what's just support infrastructure.

But a car is not an engine with support infrastructure. A car is a system composed of distinct subsystems: powertrain, chassis and frame, braking system, electrical and electronic systems, fuel intake and exhaust, thermal management and HVAC, safety and body systems. Each subsystem has its own function, its own engineering constraints, its own rate of evolution. The braking system doesn't serve the engine — it governs it. The electrical system doesn't make the engine faster — it coordinates the signals between everything else. The safety systems don't improve horsepower — they protect the occupants from what happens when other subsystems fail. No automotive engineer would collapse all seven into "not-engine." Each one is designed, tested, and evolved independently, because each one solves a different problem under different constraints.

"Agent = Model + Harness" makes that collapse. Everything that isn't the model — filesystem, bash, sandbox, memory, compaction, the Ralph Loop, verification, intent, coordination — gets lumped into one subordinate category called "harness." But these are not one thing, and they are not subordinate. The word "harness" itself encodes the error: it means something that wraps and serves the primary component. In a car, brakes don't serve the engine. The electrical system doesn't serve the engine. The safety systems don't serve the engine. Each is a peer subsystem of the vehicle, as important as the powertrain, designed and tested independently because each solves a different problem under different constraints. A car without brakes is not a car with a missing accessory. It's not a car. And a failure in any subsystem — braking, electrical, fuel, safety — is a failure of the vehicle's primary function. The car doesn't degrade gracefully when the brakes fail; it fails as a car. The vehicle's ability to do its job is constrained by its weakest subsystem, not its strongest. The engine's excellence is irrelevant if the brakes don't work.

The same is true here. Independent verification is not a harness around the model — it's a peer subsystem of the agent, as essential as the model itself. Declared intent is not support infrastructure for generation — it's the subsystem that makes generation directed. Coordination protocols are not model wrappers — they're the subsystem that makes multi-agent work coherent. None of these serve the model. Each serves the system. Each is as important as the model. And just like the car, the agent system's primary function — producing correct software — fails at its weakest subsystem. If verification is absent, the system produces unverified output regardless of the capability of the model. If intent is undeclared, the system generates in the wrong direction regardless of how fluently. If coordination protocols don't exist, multiple agents contradict each other regardless of how good each one is individually. The model's excellence cannot compensate for the absence of a peer subsystem, any more than a powerful engine compensates for failed brakes. Calling these subsystems "harness" buries them under the engine and guarantees they'll be under-designed, under-resourced, and treated as secondary — which is the done by industry now.

The harness article builds better chassis and a better fuel system — real, necessary work — and calls it the whole car. The peer subsystems that don't serve the model but govern it (verification), direct it (intent), and coordinate it (protocols) are absent. The frame "Model + Harness" has no vocabulary for subsystems that are peers of the model rather than servants of it. That's engine-centric thinking. Systems thinking starts from a different question: what does the whole vehicle require to operate safely? — and discovers subsystems the engine-centric frame structurally cannot see.

The guardrail model has a proven ceiling — and it's not an opinion

In June 2026, NIST senior scientist Apostol Vassilev published a mathematical proof in the peer-reviewed journal IEEE Security and Privacy (released June 9, 2026), extending Gödel's incompleteness theorems to AI systems. The finding: no finite set of guardrails placed on an AI system is universally robust against adversarial prompts. You can add more rules to address contradictions you discover, but you're back where you started — the new rule set is still finite, still incomplete. This is not a conjecture. It is a theorem, building on logic that has stood since 1931.

The harness architecture is the kind of system Vassilev's proof addresses. System prompts, constraints, self-verification loops, tool restrictions, safety checks — every guardrail the harness wraps around the model is a finite set of rules. The proof says that set will always have gaps, and the gaps are findable. Making the guardrails better, more numerous, more carefully tuned — all of which the harness article advocates — does not escape the ceiling. It moves it higher. The ceiling remains.

Vassilev's own prescribed solution is telling, because it's the opposite of the harness roadmap. He doesn't prescribe better guardrails. He prescribes a continuous-monitor-and-update model: red teams constantly probing for new weaknesses, continuous updates hardening the system against discovered failures, and operational resilience that accepts exploitation will occur and prioritizes containment and recovery. The goal, in his words, is "to reach a state where the cost of finding new exploits exceeds attackers' resources" — not to achieve universal robustness, which is formally unattainable, but to make the economics of attack prohibitive through relentless, continuous effort.

That is a fundamentally different architecture than "build the right harness and the agent works." It's a system that assumes its own incompleteness and compensates through continuous expansion and independent verification — not through better wrapping of the model. The harness frame assumes the constraints can be made sufficient. The proof says they can't. The question is what you build once you accept that.

You might be thinking: Vassilev's proof is about prompt security — adversarial jailbreaking of AI guardrails. How does it apply to agent code generation?

The mathematical structure is the same, and makes the extension legitimate. Vassilev's proof isn't about prompts specifically — it's about any system that uses a finite set of rules to constrain an unbounded input or output space. Gödel's incompleteness doesn't care whether the finite rule set is a safety guardrail filtering prompts or a self-verification loop checking generated code. The structure is identical: a finite set of rules (the harness's constraints, system prompts, self-checks) attempting to govern an unbounded space (all possible code the model might generate, all possible states the system might reach). The proof says that structure is incomplete — there will always be outputs the finite rule set fails to catch. Vassilev himself identifies the mechanism: natural language's richness makes the space of inputs (and by the same logic, outputs) effectively limitless, so compliance-checking built on finite rules is infinitely ambiguous. An agent generating code from natural-language prompts operates in this space — the ways a subtle architectural error, a misunderstood requirement, or an internally-consistent-but-globally-wrong pattern can appear in generated code are unbounded, and no finite set of self-checks will catch all of them. The domain is different. The math is the same.

Even a perfect engine doesn't eliminate the other subsystems

The bet behind the harness frame is: make the model good enough and these problems dissolve. Trivedy's own closing says it — "as models get more capable, some of what lives in the harness today will get absorbed into the model." The companion article makes the bet explicit: "these guardrails will almost surely dissolve over time, but to build robust agent applications today, they're useful tools." The roadmap is: invest in the engine, and eventually the engine handles everything. The guardrails are temporary scaffolding, not permanent subsystems.

Vassilev's proof already forecloses part of this bet: no improvement in the model eliminates the incompleteness of its guardrails, because the incompleteness is a property of any finite rule system, not of this model's capability. But the bet fails for additional reasons that have nothing to do with Gödel and everything to do with how systems work.

Computer hardware taught us where this reasoning breaks. CPUs got exponentially faster — and the system didn't get proportionally faster, because memory bandwidth didn't keep up (the memory wall), storage I/O didn't keep up, network latency didn't keep up. Each subsystem evolved at its own rate, governed by its own physics, and the system's actual performance was always constrained by the slowest subsystem, not the fastest. Making the CPU perfect didn't eliminate the need for cache hierarchies, memory controllers, or I/O schedulers — it made them more critical, because a faster CPU waiting on a slow subsystem wastes more cycles per second.

The same structure applies here, and it's the same systems principle. Even if the model becomes perfect — flawless generation, infinite context, zero hallucination — the system still needs:

  • Independent verification, because a perfect generator still can't be its own oracle — this is both a logical requirement (Vassilev/Gödel: no finite self-checking rule set is complete) and a structural one (correlated failure modes don't cancel out by becoming more capable). A brilliant student who grades her own exam still has a conflict of interest, no matter how smart she is.
  • Declared intent, because the model doesn't know your business context regardless of its capability. "Make this bucket public" is correct for your CDN and a breach for your customer data. No amount of model intelligence resolves that ambiguity — only a human stating the intent does. Intent is not in the configuration; it's in the business decision, and no model will ever have been in the room where that decision was made.
  • System coherence protocols, because multiple agents producing independently correct output that contradicts each other is a coordination problem, not an intelligence problem. Distributed systems proved this decades ago: you don't solve consensus by making the nodes smarter. You solve it with protocols. Smarter nodes without protocols just disagree faster.
  • A subtraction discipline, because deciding the essence of the product, what belongs and what dilutes — is a judgment the model has no basis for making, since it doesn't know what your product is trying to be. Only the builder knows that, and only if they've named it.

These subsystems don't serve the model. They govern it. They evolve at different rates, driven by different constraints — human judgment, organizational clarity, coordination theory — that have nothing to do with model capability. Just as a faster CPU made the memory wall more visible, a more capable model makes the verification gap, the intent gap, and the coherence gap more consequential, because a faster generator producing unverified, undirected, uncoordinated output at higher volume is a bigger problem, not a smaller one.

The bet that model improvement will absorb these problems is the hardware industry betting in 2003 that faster clock speeds would eliminate the need for multi-core architecture, cache coherence protocols, and memory hierarchy design. It didn't. The subsystems had their own physics.

The specific gaps, section by section

With that frame in mind, here's what the harness architecture is missing in each area it covers.

Self-verification is not verification

The article's approach to correctness is "self-verification" — the model reviews its own output, runs tests it wrote, and decides whether it did well. Huang et al. (ICLR 2024) established that large language models cannot self-correct reasoning without external feedback — and that performance can degrade after self-correction. This is not a contested finding. A generator checking its own output has correlated failure modes by construction: the same blind spots that produced the error will evaluate the error as correct. The research is settled; the industry hasn't absorbed it.

In a companion article on improving their coding agent, the same team describes self-verification as their single biggest performance improvement, calling models "exceptional self-improvement machines." They then describe the exact failure mode Huang et al. predicted: "the agent wrote a solution, re-read its own code, confirmed it looks ok, and stopped." Their fix is to prompt the model more aggressively to self-verify — intercepting the model before it exits and forcing another verification pass. At no point is an independent oracle introduced. The most common failure of self-checking is addressed by more self-checking.

What the harness architecture needs and doesn't have is an independent oracle — a declared property, contract, or invariant authored separately from the model, checked mechanically by something with no correlated blindness. That layer is absent. Vassilev's research makes this worse than a generic self-checking problem: traditional software zero-day exploits are hard to find, often requiring nation-state resources. But AI systems accept natural language as input, and the complexity and richness of language makes compliance-checking built on a finite rule set infinitely ambiguous — the number of ways intent can be hidden in plain language is effectively limitless. A model policing its own language-based input is trying to enforce a finite rule set against an infinite ambiguity surface. At the scale the article aspires to — hundreds of agents on a shared codebase — self-verification is no verification, and the system produces volume with no independent quality signal.

There is no declared intent anywhere

The harness architecture has prompts, memory files, tool descriptions, and context injection. What it does not have is a specification — a human-authored declaration of what correct means for a given task, against which the output is mechanically checked.

The article's frame is "working backwards from desired agent behavior to harness engineering." That means: observe what we want the agent to do, then build infrastructure to help it do that. The intent is inferred from behavior we hope to produce, not declared as a constraint the output must satisfy. The model guesses what you want from the prompt. The harness helps it guess better, longer, with more context. At no point does anyone state, formally, what the output must be — in a form a machine can check.

This is the difference between inference and declaration, and it's not a semantic quibble. When intent is inferred, every generation is a fresh guess, and there's no ground truth to check it against. When intent is declared — as a type, a contract, a property, an invariant — the guess becomes a claim that can be verified. The harness has elaborate machinery for helping the model make better guesses. It has nothing for checking whether the guess was right, because there's nothing to check it against.

Accumulation without subtraction

Every mechanism in the harness adds. The filesystem grows. The memory file grows. The Ralph Loop intercepts exit and forces more generation. Compaction summarizes context to make room for more. Planning decomposes goals into more steps. Subagents spawn more agents. The measure of progress throughout is: did the agent produce more work?

Nothing in the architecture asks "does this output belong?" Nothing subtracts. There's no mechanism for identifying code that dilutes the product's essence and removing it. No concept of named essence against which accumulated output is measured. No principle equivalent to "simplicity — the art of maximizing the amount of work not generated."

When generation cost is near zero, subtraction becomes the scarcer and more valuable discipline. A system that only accumulates — more code, more files, more memory, more agents — will bloat, because there's no force in the architecture pushing the other direction. The harness optimizes for throughput; nobody optimizes for "does what we have still cohere into a product that is this thing and not a diffuse collection of generated output."

There's a direct connection to Vassilev's prescribed solution here that's easy to miss. He frames the goal as reaching an economic equilibrium where the cost of finding new exploits exceeds attackers' resources. Code that doesn't exist has zero attack surface. Every line of unnecessarily generated code is a line an attacker can probe, a line that must be verified, a line that can harbor the gap Vassilev proved will always exist in a finite rule set. Subtraction — maximizing the work not generated — isn't a simplicity aesthetic. It directly shrinks the surface area the attacker must search, raising the per-exploit cost toward the equilibrium Vassilev describes. The harness architecture's accumulation-only design pushes in the opposite direction: more generated surface, more gaps to find, lower cost per exploit. A system that only adds code is a system that makes the attacker's job cheaper with every iteration.

System coherence is a listed research problem, not a shipped feature

The article's "future of harnesses" section lists "orchestrating hundreds of agents working in parallel on a shared codebase" as an open research problem. That means: the shipped architecture does not handle multiple agents producing mutually contradictory output, because it's an unsolved problem for them.

But this isn't a frontier research challenge. It's a well-understood problem. Distributed systems engineers spent forty years on it. Multiple independent actors producing output that must be globally consistent is what consensus protocols, ordering guarantees, and interface contracts solve. The mechanisms exist: specifications as coordination protocols, contract enforcement on merge, architectural boundaries between agents. What they require is a specification layer — the layer the harness architecture doesn't have.

Without specifications, adding more agents means more invisible decisions, more inconsistency, more contradiction — not more productivity. The article acknowledges this implicitly by listing it as unsolved. But the reason it's unsolved within the harness frame is that the frame has no concept of declared constraints that all agents must satisfy. You can't coordinate agents through a filesystem alone, any more than you can coordinate microservices through a shared database. The coordination mechanism is the specification. Remove it and coordination is impossible regardless of the plumbing quality.

The Ralph Loop — the revealing design

The Ralph Loop — Trivedy's term from the same article — deserves specific attention because it reveals the architecture's assumptions clearly. It's a pattern that intercepts the model's attempt to stop, reinjects the original prompt in a fresh context window, and forces the agent to continue generating until a completion goal is met. The filesystem carries state across iterations.

Notice what's being optimized: duration of generation. The problem being solved is "the agent stopped too early." The solution is to force it to keep going.

Now notice what's not being checked: whether what was generated in the last iteration was correct before starting the next one. The loop continues from the filesystem state, but there's no gate between iterations that verifies the accumulated output against a declared property. Each iteration assumes the prior iteration's output is sound. If it isn't — if iteration 3 introduced a subtle architectural mistake — iterations 4 through 20 build on that mistake, and the system diverges further from correctness with each loop.

A long-horizon execution loop without an independent verification gate between iterations is a mechanism for generating more wrong output, faster, with more confidence. The longer the horizon, the more the verification gap matters, because errors compound across iterations with no corrective signal except the model reviewing its own work — which, as discussed, is not independent verification.

The missing layer

This is a description of a required layer that doesn't exist in the dominant architecture:

A specification layer. Human-authored declarations of what correct means — properties, contracts, invariants, types — that exist independently of any model's prompt or output. These serve two functions: they give the agent a target that isn't a guess, and they give the verification gate something to check against.

An independent verification gate. A mechanism that checks the agent's output against the specification, where the checker is not the model that generated the output. This can be a type checker, a property-based test suite, a contract enforcer, a CI pipeline — anything that is mechanical, deterministic, and uncorrelated with the generator's blind spots.

A subtraction discipline. Something in the architecture that asks "does this belong" and removes what doesn't — measured against a named essence of the product. Without it, the system only accumulates, and accumulation without curation is bloat.

Coordination through specifications, not filesystems. For multi-agent work, the shared artifact isn't a filesystem — it's a set of declared constraints all agents must satisfy. The filesystem is the storage layer; the specification is the coordination layer. Conflating them is why multi-agent coherence is still listed as an unsolved research problem.

Each of these is the same move: introduce a human-authored declaration of intent that the system checks mechanically, independent of the model. The harness makes the model a better producer. The specification layer makes the system accountable for what it produces. You need both. Right now the industry is building the first and calling it the whole architecture.

Why this matters beyond one article

The harness frame — "Agent = Model + Harness, and the harness is everything that isn't the model" — is not Trivedy's invention. It's the consensus view. It's how LangChain builds, how most agent frameworks think, and how investor dollars flow. The article is the clearest articulation of a position the whole industry holds.

That's why engaging with it directly matters. The argument here is not one programmer's opinion against another's. Vassilev's NIST proof formally demonstrates that the guardrail model — wrapping a generative system in a finite set of constraints — has a mathematical ceiling that no amount of engineering removes. Huang et al.'s ICLR research demonstrates that self-verification degrades rather than improves model output. Decades of distributed systems research demonstrate that coordination requires protocols, not shared storage. These aren't opinions. They're findings.

If the consensus architecture has no independent verification, no declared intent, no subtraction, and no specification-based coordination — and if hundreds of teams are building production systems on that architecture — then the industry is building toward a ceiling that has been formally proven to exist. The failures will come when agent-built systems hit production at scale and nobody can explain why they're wrong, because nobody ever declared what right was, and the system that generated the output was the same system that certified it as correct.

The harness article ends with "the model contains the intelligence and the harness makes that intelligence useful." Here's the question that frame never asks: useful for producing what, and how do you know it's right — using something other than the system that produced it?

Vassilev's proof answers that question with mathematical finality: you don't know, and you can't, within the constraints of the model-plus-harness frame. The way forward is not a better harness. It's a system that assumes its own incompleteness and compensates through continuous independent verification against declared intent — which is the architecture the current consensus hasn't built.


This is a direct response to Vivek Trivedy's "The Anatomy of an Agent Harness" (March 10, 2026) and "Improving Deep Agents with Harness Engineering" (February 17, 2026). The infrastructure engineering in those pieces is sound; the gap is the layer above it. The formal basis for this argument: Vassilev, "NIST Mathematical Proof Supports Transition to a Continuous-Monitor-and-Update Security Model for AI Systems" (IEEE Security and Privacy, June 9, 2026); Huang et al., "Large Language Models Cannot Self-Correct Reasoning Yet" (ICLR 2024). If you have a specific counter-argument to either finding, that's the conversation worth having.

Top comments (0)