synthaicode

Posted on Apr 26

My Harness Is Not a Cage. It's an Org Chart.

#agents #architecture #ai #llm

Your AI agent did not fail because the model was weak.

It failed because it made a decision no one had authorized it to make.

Maybe it skipped an escalation.
Maybe it treated a missing requirement as obvious.
Maybe it chose one tradeoff over another because a threshold told it to.

The dangerous part is not that the AI made a mistake.
The dangerous part is that the system allowed the decision to happen invisibly.

This is not a tooling problem. It is a definition problem.

What AI Actually Is

Before designing any harness, we need to agree on what we are harnessing.

My working definition:

AI is a machine that executes the work of structuring information according to a given purpose.

Two constraints follow immediately from this definition:

Purpose is supplied externally. AI does not generate its own goals. A car does not decide where to go. AI does not decide what to optimize for.
Structuring information is not the same as making judgments. A car can move faster than a human. That does not mean it decides the route.

This is not a limitation to be engineered around. It is the definition itself.

Where Harness Engineering Goes Wrong

The harness engineering movement — which crystallized in early 2026 — defines the harness as everything except the model: tools, memory, guardrails, feedback loops, retry mechanisms, confidence thresholds.

The formula is clean: Agent = Model + Harness.

But there is a category error embedded in it.

When AI agents were not yet capable of chaining actions, humans performed the orchestration manually. They connected outputs, prioritized next steps, and filled in the gaps when something was unclear. That human orchestration contained two things mixed together:

Execution work — connecting outputs, sequencing steps, formatting results
Judgment work — resolving tradeoffs, filling in unknowns, deciding priorities

Harness engineering took this human orchestration and delegated it to the harness — without separating execution from judgment first.

The result: the harness now contains judgment calls that were never made explicit. They are buried in threshold values, fallback rules, and priority weights that someone configured without realizing they were making decisions on behalf of the system.

If the definition is wrong, refining the methodology only embeds the error deeper.
You cannot harness your way out of a category mistake.

The Two Points That Belong to Humans

Information structuring work always contains two types of unresolvable moments:

1. Tradeoffs — situations where two valid paths exist and the choice depends on values, priorities, or context that the AI was not given.

2. Unknowns — gaps in information that cannot be filled by inference without risk of fabrication.

These are not edge cases. They are structurally guaranteed to appear in any non-trivial task. Project managers have known this for decades. Every project begins with a risk register. Unknowns are logged on day one, not discovered in production.

The design question is not whether these moments will occur. It is where does authority go when they do.

Confident thresholds and risk scores do not answer this question. They are themselves tradeoff decisions — and tradeoff decisions belong to humans by definition, not by preference.

The threshold is not a parameter. It is a judgment.
And judgments, by definition, belong to humans.

The Same Principle Already Exists Everywhere

This is not a new idea. We have solved it before, in two adjacent domains.

Software engineering: well-designed systems do not suppress exceptions. They surface them to the caller. A try-catch that swallows every error and continues execution is not robust engineering — it is a liability. Harness engineering that handles every unknown internally, without escalating to a human, is structurally identical.

Organizational design: every role in a functioning organization operates within a defined scope of authority. When a situation exceeds that scope, it escalates. Not because the person is incapable, but because the decision belongs to a different level of authority. This is not failure. It is the system working as designed.

AI organization design needs the same structure. The escalation path is not a fallback. It is a first-class design element.

My Harness

Everything except tradeoffs and unknowns belongs in the AI. Those two points belong to humans — by definition.

My harness enforces exactly two constraints:

No speculation. When the AI encounters an unknown, it does not infer, guess, or fill the gap. It surfaces the unknown to the human who owns the decision. This forces the escalation path to activate rather than allowing silent fabrication.

Separate the executor from the checker. The AI that performs a task does not verify its own output. A separate agent — with a different role, different context, different prompt — checks the work. This is not redundancy. It is the same principle behind code review, audit functions, and quality control in any mature organization. A single agent checking its own work is equivalent to a developer reviewing their own pull request the moment after writing it.

These two constraints did not come from observing AI failures and patching them. They came from asking what an AI organization needs to look like, given what AI is by definition.

The harness is not a cage built around an unpredictable system. It is an org chart built around a well-defined one.

The Design Sequence

Most teams build in this order:

Deploy the agent
Observe failures
Add guardrails to prevent recurrence

This embeds the failure mode into the design. Each guardrail is a patch over an undefined boundary.

The sequence should be:

Define what the AI is (information structuring machine, externally purposed)
Define what it cannot do (resolve tradeoffs, fill unknowns)
Design the escalation path for those two cases
Deploy the agent within that structure

The intelligence layer comes after the organizational layer. Not before.

Conclusion

Harness engineering asks: how do we make AI agents reliable?

That is the right question with the wrong starting point.

The problem is not how to control AI.

The problem is how to handle the events that inevitably occur while AI structures information toward a given purpose: unknowns, tradeoffs, verification points, and handoffs.

A harness is not a mechanism for controlling AI.

It is a structure for handling what happens during AI work:
unknowns, tradeoffs, checks, authority boundaries, and handoffs.

You do not put guardrails on a car to prevent it from flying. The definition already draws that boundary.

Design the organization first. The harness follows from that.

The organizational structure described in this article — explicit role boundaries, judgment delegation, and cross-reference traceability between work units — is implemented in XRefKit:

https://github.com/synthaicode/XRefKit