TL;DR
- An agent doesn’t truly work because of the model, but because of the harness controlling it.
- Moving from demo to production requires handling errors, state, memory, and observability.
- A well-designed harness reduces model unpredictability and shifts complexity into code, making the system reliable and usable in real-world scenarios.
In article Harness Engineering: The Most Important Part of AI Agents we saw a fundamental point: the problem with agents isn't (only) the model, but the system around it.
But what does it really mean to build that system?
The moment everything breaks
There's a fairly universal phase: you've implemented a demo, it works well, the model responds, uses a tool, maybe even completes multi-step tasks, and everything looks promising.
Then you try to use it in a real-world context, and the problems emerge:
- invalid outputs
- incorrect API calls
- infinite loops
- loss of context
It's not that the model got worse—the system's complexity increased without having a harness solid enough to manage it.
The harness as a control system
It becomes clear that the harness isn't just a "container"—it's more like a control system designed to guide the model along a precise path, reducing its freedom when necessary and allowing it when useful.
This is a delicate balance: too much control means loss of flexibility; too little control means loss of reliability.
And this is where the real design work begins.
Error handling becomes the main case
In traditional software, errors are edge cases. Anyone with experience in agent-based systems knows that errors are the norm.
The key idea, however, is that a well-designed harness does not assume everything will go well—quite the opposite.
It therefore introduces mechanisms such as:
- validating outputs before using them
- retrying when something goes wrong
- falling back to alternative paths
- controlled interruption of loops
This is what makes the system usable.
State and memory: the invisible problem
Another issue that emerges very early is state management: an agent without memory is little more than a stateless function—but adding memory introduces complexity:
- what to store
- for how long
- how to update the state
- what happens when it becomes inconsistent
These decisions must be made when structuring the harness.
And it's precisely here that many subtle bugs tend to arise.
Observability: knowing what's happening
When something goes wrong (and sooner or later it will), the important question is:
"Can I understand what happened?"
Without logging and tracing, working with agents becomes almost impossible.
Because you need to see:
- every step of the reasoning
- every tool call
- every output transformation
And not just for debugging, but to evolve the system.
Moving complexity to the right place
An interesting aspect is that, as you improve the harness, the system becomes more predictable—even without changing the model.
This happens because complexity is being moved out of an "opaque" component (the model) and into code that can actually be controlled.
It's a shift in strategy:
- less blind trust in the model
- more explicit control in the system
Which, ultimately, is software engineering.
In fact, we can say that building agents today is much closer to traditional software engineering than it might seem.
There are flows, states, error handling, integrations, observability…
The only difference is that instead of deterministic functions, there's a probabilistic model.
The harness is what holds everything together—and that's what makes the difference between something that only works in a demo and something that truly works in production.
Top comments (0)