'Verified' changed meaning: what agentic engineering demands from development teams

#devops #ai #programming #webdev

In the April 29 Fragments, Martin Fowler linked three articles on agentic programming that arrive at the same conclusion. The investment that matters is not in code generation speed, but in verification, harness, and code legibility for agents.

What Chris Parsons says about 'verified'

The most direct contribution is from Chris Parsons, in his third update of 'Coding with AI'. The core argument: the concept of "verified" had to evolve alongside the throughput of AI agents.

Parsons writes: "'Verified' used to mean 'read by you'. With modern agent throughput, it has to mean 'checked by tests, by type checkers, by automated gates, or by you where your judgement matters'. The check still happens; it just does not always happen in your head."

Verification did not disappear. It migrated from human reading to automated tooling when the volume generated by agents outpaced individual review capacity. This is not a degradation of the process: it is a necessary adaptation.

Parsons also distinguishes two modes of AI use in development. Vibe coding is generating code without looking at it, without taking responsibility for the structure. Agentic engineering is when the engineer remains responsible for quality and AI is a tool in the process, not a replacement for the engineer.

On the senior engineer's role in this context, Parsons writes: "The way out is to train the AI so the diffs are right the first time, to make yourself the person on the team who shapes the harness, and to make that work the visible thing you are measured on."

Harness Engineering as a verification layer

Birgitta Böckeler, in an article published on martinfowler.com in discussion with Chris Ford, develops the concept of Harness Engineering. The central idea: computational sensors (tests, static analysis, type checkers) improve AI-generated code quality more reliably than human review.

Böckeler writes: "LLMs are great for exploratory and fuzzy rules, but once you have something objective, converting it to formal, unambiguous, deterministic format can give more assurance." When a quality rule becomes objective and deterministic, formalizing it in code (as a test or a type) provides more assurance than relying on human review to catch it.

The practical point: before optimizing prompts for the agent, verify whether the verification harness can detect the problems the agent might introduce. A weak harness means better prompts just produce more sophisticated bugs.

Identifiers and LLM performance

The third article is from Adam Tornhill on the impact of function length on LLM-generated code quality. The relevant point: LLMs rely heavily on names, structure, and local context to infer meaning. When meaningful identifiers are replaced with arbitrary names, model performance drops significantly.

This has a direct implication for teams using agents to work on legacy code with inconsistent naming. The quality of the agent's output is partly a function of the quality of identifiers in the input code.

What the three articles have in common

All three converge on the same thesis: in agentic programming, the competitive edge is not in generating faster. It is in review surfaces that scale, a robust harness, and code structured so the agent can reason about it accurately.

The right decision depends on where the actual bottleneck is in your process: if generation is the bottleneck, improving prompts makes sense. If verification is the bottleneck, investing in harness has higher returns.

Fonte: Fragments: April 29. Martin Fowler