There's a moment in every team's adoption of coding agents where the question shifts from "can it write the code?" to "can I trust what it wrote?" Throughput is no longer the bottleneck. Verification is.
This is not a new problem. We've been here before, with humans.
A note on terminology
Before going further, it's worth being precise about what Continuous Delivery means because the term is often conflated with Continuous Deployment, and the difference matters here.
Continuous Deployment means every change that passes your automated checks goes straight to production. No human in the loop, no held-back releases, no batching.
Continuous Delivery, as Dave Farley and Jez Humble defined it in their 2010 book of the same name, is the discipline of keeping software in a state where it could be released at any time. The decision of whether to actually deploy is a business decision, made by humans, on a cadence that suits the product. What CD guarantees is not that you ship constantly — it's that you could.
That distinction is the whole point. Continuous Delivery is about readiness. It is the system that proves, on every change, that the software still works the way you said it would.
The throughput problem
Agents produce code faster than humans can read it. This is not a flaw; it's the entire premise. But it changes the economics of a team. The cost of producing a change collapses. The cost of verifying a change does not.
If your verification process depends on a human carefully reading every diff before it lands, you've built a system where the agent must wait for the human. The agent's throughput is capped at the human's reading speed. You bought a sports car and parked it in traffic.
The alternative is to invest in a verification process that scales with code volume rather than with human attention. This is what Continuous Delivery has been quietly building for fifteen years.
What CD actually provides
A mature deployment pipeline is a series of progressively more expensive, progressively more confident checks: compile, unit tests, integration tests, contract tests, performance tests, security scans, deployment to staging, smoke tests in staging. Each stage is automated. Each stage runs on every change. No change moves forward without passing the prior stage.
The pipeline is the source of truth about whether a change is safe to release. It doesn't care who wrote the change. It doesn't ask whether the author was tired, or junior, or in a hurry, or in fact a human at all. It applies the same gates to every change.
This is exactly the property you want when an agent is one of your contributors.
Farley's deeper point
In Modern Software Engineering, Farley argues that the real practice of software engineering is the management of two things: complexity and uncertainty. The techniques he advocates (working in small batches, automating feedback, separating concerns, treating engineering as an empirical discipline) are not arbitrary best practices. They are the techniques that let humans make confident decisions in the face of systems they cannot fully hold in their heads.
Agents are subject to the same constraints, for the same reason. An agent does not "understand" your codebase in some privileged way that exempts it from needing feedback. It is making local decisions under uncertainty, and it benefits enormously from a system that tells it quickly whether those decisions held up.
Small batches help: an agent given one well-scoped task produces one well-scoped change that the pipeline can verify in isolation. Fast feedback helps: an agent that gets a failing test back in two minutes can fix it; an agent that gets it back after an overnight build has already moved on to something else. Automated quality gates help: the agent doesn't need to be trusted in some absolute sense, because the pipeline doesn't trust anyone.
Trust through accountability
The instinct, when you start using a coding agent, is to ask how much you can trust it. This is the wrong question, because the answer is "it depends, and you won't know until something breaks."
The better question is: what does it take to hold any contributor accountable for the changes they introduce? Once you have a real answer to that, the agent is just another contributor running through the same gates. You stop relying on faith in the model and start relying on the evidence the pipeline produces.
This is the same shift CD asked of human teams. The argument against CD was rarely that the practices were bad. It was that they felt unnecessary if your developers were good enough. Farley's response, repeated patiently across both Continuous Delivery and Modern Software Engineering, was that "good enough" is not a strategy. Systems that depend on the unbroken vigilance of skilled humans fail the moment vigilance lapses, which it always eventually does. Systems that depend on automated, repeatable verification do not.
Agents make this argument harder to dodge. Whatever residual confidence you had in human carefulness as a safety net, the agent is going to outrun it. The pipeline is now the safety net. If you don't have one, you're about to find out why you needed it.
Where to start
If you want agents to do more of your work, the highest-leverage thing you can build is not a better prompt or a more capable model. It's the boring infrastructure Farley has been describing since 2010: a pipeline that runs on every change, tests you trust, environments that mirror production, feedback that arrives in minutes rather than days.
Continuous Delivery was always about giving teams the confidence to move faster without breaking things. Agents are the case where that confidence becomes load-bearing.
Top comments (0)