Don't Abandon the Discipline

#ai #codequality #productivity #softwareengineering

You've adopted agentic coding tools, and you can now output a complete feature in an afternoon. The speed is unlike anything you've experienced before, and it's addictive. You just want to keep building, keep pushing new features out into your customers' hands.

This is the reality for a lot of us right now. But as adoption of AI coding tools continues, I've noticed a concerning trend. Day by day, teams are abandoning established, known-good practices that have helped us deliver software well for years. Practices like Agile, code reviews, customer validation, and automated testing are being left by the wayside in favour of a "speed at all costs" approach. I've seen sentiments including:

Human code reviews are irrelevant for AI code. As long as it works, reviews just slow things down.
There's no point doing Agile development. The overhead takes away from the speed of AI development.
Automated testing is unnecessary. If the feature works, why bother?

The common theme is that best practices are being dismissed because they are perceived to slow down the otherwise fast pace of agentic coding. This misses the point entirely. Yes, writing code used to be the bottleneck. AI has compressed or removed it. But writing code was also where a lot of understanding happened. Developers discovered edge cases during implementation, challenged assumptions while debugging, and built mental models line by line. AI didn't just speed up the bottleneck, it moved it. The constraint is now understanding: knowing what to build, knowing that what you built is right, and having the confidence to ship it. None of that got faster.

Agile Was About Validation, Not Velocity

Agile has been the industry's best answer to building quality software for decades. It put a focus on sustainable development, working software, quality practices, and crucially, early iteration and continuous delivery.

But here's what people forget: Agile was never about building faster. Speed was a side effect of breaking work into smaller chunks and validating as you built. The real value was in compressing learning loops. Frequent contact with reality, so you could course-correct before investing too much in a wrong direction.

Small sprints, small PRs, daily standups - these were all mechanisms to force feedback. The chunk size was small because building was slow. Small chunks were the only way to learn fast enough to avoid wasting months on the wrong thing.

Now, people are applying AI speed to skip the learning, not accelerate it. You can build a feature in an afternoon, but you still don't know if users want it until they touch it. That hasn't changed. The 2025 DORA Report makes this tangible: AI adoption actually correlates with a 7.2% reduction in delivery stability. DORA's core finding is that AI is an "amplifier." It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones in equal measure.

By regressing from Agile back to waterfall, and then applying the speed gains from AI, you're now building your misconceptions perfectly, efficiently, and potentially irreversibly.

The Double Abdication

Something special happens when you combine the resurgence of waterfall-like practices with the adoption of AI coding tools. You lose understanding from two directions at once.

The first abdication is validation. Moving from Agile back to waterfall removes continuous validation. You plan, then you build, but you don't understand what you're building the way you do with iterative delivery. In Agile, you continuously challenge and validate assumptions through iteration. In waterfall, you assume your assumptions are correct and find out whether they were at the end.

The second abdication is authorship. AI coding has a subtle but significant side effect: you are no longer writing code, you are reviewing someone else's. Your understanding of the code is fundamentally more superficial than when you write it yourself, think through it as you go, and build a mental model line by line.

When you combine these, the worst case scenario looks like this: assume how the feature should work, generate it, review it superficially, ship it. You end up with code that no one on your team actually understands at an ownership level, based on assumptions that nobody properly validated.

And the code you generate is still your code. It has your name on it. You own the consequences. When the system goes down at 3am on a Saturday and you have to wake up and fix it, there is no one more suitable to delegate to. If you have no clue how it works, you are owner in name only. If you own the planning and know how the feature should work, you might be able to figure it out at the eleventh hour. But if you double-abdicated and can't explain the feature end to end, let alone the code, you're flying blind.

The data backs this up. CodeRabbit's study of 470 pull requests found that AI-authored code contains 1.7x more issues than human-written code, with 75% more logic errors and 3x more readability issues. And Graphite's data shows that only 24% of PRs exceeding 1,000 lines of code receive any review comments at all. Reviewers don't carefully evaluate large AI-generated changesets. They rubber-stamp them.

You've neither designed nor learned. You've orchestrated. Maybe this gets you the outcome you want. But you're trading one competitive edge that was scarce (truly proper Agile validation and deep understanding) for one that everyone can buy for $200 per month (rapid build time). When everyone has access to something, that's not a competitive advantage - that's table stakes.

Was Agile a Stepping Stone?

Let's consider the devil's advocate position. Perhaps Agile was the best we could do given slow build times, but it was never the ideal end state. Maybe waterfall was always conceptually right. Plan comprehensively, build once, ship. If building is now near-instant, doesn't comprehensive upfront planning make sense again?

No. Because the argument rests on a false promise: that when something is wrong, you can "just regenerate."

You can't. You have two options, and both are bad. You do a partial regeneration, patching the parts that were wrong while keeping the parts that weren't. This leaves echoes. Maybe it fixes the immediate problem, but it will never be truly clean. Remnants of invalid architecture accumulate as tech debt, and over time your architecture reflects every wrong assumption you ever made, partially corrected but never resolved. Option two, you do a full regeneration. Throw it all away and start again. Assuming some of it was fine, you've now discarded working code and spun the wheel of chance again, hoping the AI produces something better this time. This isn't engineering. This is gambling.

As Eisenhower put it, "Plans are worthless, but planning is everything." Understanding doesn't come from the plan itself. It comes from the process of planning and then confronting reality. The plan is a hypothesis. Iteration is how you test it. AI-accelerated waterfall risks skipping the thinking entirely, because doing is so cheap that thinking feels like a waste of time.

An InfoQ experimental study found that in unstructured AI coding sessions, 80% of tokens were spent after the agent declared the task complete. All of that effort went into debugging, resolving incomplete implementation, and correcting assumptions. The "build fast, fix later" approach doesn't even save time. The thinking you skip upfront just comes back as rework downstream.

Theory, Meet Practice

Even with those problems, you might still feel like the logic holds. If we just plan well enough, AI-accelerated waterfall should work. It makes intuitive sense.

But in practice, reality continues to surprise us. We're just surprised faster, and more expensively, because we built more before discovering we were wrong.

When building is cheap, it feels wasteful to spend time planning and validating. Why spec it out when you can just build it and see? Why write acceptance criteria when the AI can generate the feature in twenty minutes? Because the cost was never in the building. The cost is in being wrong at scale. Wrong assumptions baked into architecture. Wrong UX shipped to customers. Wrong patterns replicated across the codebase by an AI that doesn't know they're wrong, and will happily propagate them into every file it touches.

Augment Code found that multi-file AI coding tasks succeed at only 19.36% accuracy, while single-function tasks achieve 87.2%. The difference is specification quality. The less you think before generating, the worse the output, and this scales non-linearly with complexity. A vague prompt for a small function might produce something close enough. A vague prompt for a multi-service feature produces something that looks correct, passes a cursory review, and fails in production in ways you never anticipated because you never thought through the edge cases.

There's a deeper problem here too. When you tell an AI "give me all the users born after a certain date who like cheese," you get working code. But you have no idea how it works. You didn't think about what data structure to use, how to filter efficiently, or what happens when the dataset is large. You skipped all of that, and the code runs, so it feels like it doesn't matter.

But it does matter, because that thinking is where understanding comes from. When a developer solves a problem by hand, they spend far more time thinking about it than writing code. They need to understand the API they're calling, the concept of the operations they're performing, and they need to sequence the solution in their head. This compounds over time and becomes knowledge. AI lets you skip straight to a working result without building that mental model. You can use the output without understanding the output. Which is fine until something breaks, or until you need to make a judgment call the AI can't make for you.

Agile With Larger Brushstrokes

So what actually works? Not abandoning discipline for speed. But not clinging to pre-AI chunk sizes either. The old model of sub-200 line PRs and two-week sprints was optimised for a world where building was slow. That constraint has genuinely changed.

Think of it this way. In waterfall, you might plan six months of work, build it, and find out at the end whether your assumptions were right. Agile compressed that. You plan a task thay might be sixty minutes of work, build it, validate, and iterate. What AI enables is somewhere in between: you plan six hours of work, generate it, and validate through the stack. The planning horizon grew because AI made implementation faster, but it didn't grow to six months. The feedback loops are still short. You're still validating continuously. The chunks got bigger, but the discipline didn't disappear.

AI changed the cost of building, but it didn't change the cost of being wrong. So the chunk size can grow, but only if the feedback loops are preserved at the right points. Concretely, this looks like three shifts.

Specs as collaborative hypotheses, not contracts. Product, design, and engineering converge on a shared specification before AI generates anything. Not Big Design Up Front. A focused conversation, fifteen to sixty minutes, that surfaces assumptions, defines edge cases, and gives the AI coherent direction. Pre-AI, developers accumulated this understanding incidentally during slow implementation. AI removes that incidental learning, so you need to replace it with something deliberate. The spec is that replacement. It's a hypothesis about what to build and how, not a guarantee that you've thought of everything.

Stacked PRs as iterative delivery. AI can generate a large changeset in hours, but it shouldn't land as a single monolithic pull request. Instead, it lands as three to five sequential PRs, each validated independently. Preview deployments per PR mean design and product see real output continuously, not at a staging gate at the end. Each PR in the stack is a learning checkpoint where assumptions meet reality. A monolithic AI-generated PR is waterfall at the code level, while a stack of validated PRs is Agile at the code level.

Layered review that matches the new reality. AI review tools handle the mechanical checks: style, common bugs, security patterns. Humans focus on what AI genuinely can't judge. Does this architecture make sense for where we're headed? Does this match what the customer actually needs? The review bottleneck isn't removed. It's redirected to where human judgment is irreplaceable.

The DORA Report is explicit about this: "working in small batches amplifies AI's positive effects." Teams with strong engineering discipline see dramatically better outcomes from AI adoption. Teams without it see negative ROI, producing technical debt faster than they ever could by hand. The discipline isn't the obstacle. It's the multiplier.

The Bottleneck Moved. Fix It, Don't Remove It.

The teams struggling with AI adoption are the ones treating every source of friction as an obstacle to remove. Code review is slow? Skip it. Sprint planning takes time? Drop it. Writing tests feels redundant when the AI code works? Don't bother.

The teams succeeding are the ones recognising that the bottleneck has shifted from building to understanding, and investing accordingly.

Pre-AI, understanding came partly for free through the act of writing code. AI removed that incidental learning, and nothing automatically replaced it. The practices that work in an AI-native world (spec-driven development, stacked validation, layered review) aren't new overhead bolted onto a fast process. They're the deliberate replacement of understanding that used to come for free during slow implementation. Without them, you're building fast but building blind.

You can have the speed. You can have the quality. But only if you invest in the thinking that the speed demands. The competitive advantage was never building fast. Everyone can do that now. The competitive advantage is understanding what to build, and knowing that what you built is right. That was always the hard part. It still is.