Turning Tables on Coding Agents: Becoming the Skeptical Stakeholder

#ai #coding #productivity #webdev

When I began using Claude to develop a fully custom application for a customer use case, I was truly amazed at how fast LLMs can move. But with that amazement came a small noise in the back of my brain. How was I to review and understand all of this code? I was fairly new to React and did not have the expertise needed to truly evaluate the quality of the code being written, beyond UI validation. I tried to go slow, understanding the code and the reasoning behind each decision as I went.

Over the last few months, as I moved to a React based application and accelerated my usage of Copilot and Claude, that noise only got louder.

The 100 Miles Per Hour Problem

Since then, agentic tools have taken over every company’s objectives. Seeing entire applications built in a matter of hours has taken every CEO by storm. On the projects I worked on recently, we delivered 3 features per sprint. But ever the careful child growing up, a question kept coming back to me. Just because we can go at 100 miles per hour, should we? Do we truly need to build full applications in a matter of hours? What are we missing, and what is the cost of moving that fast?

That cost became apparent very quickly. Each PR now came with tens of files changed, since the time to develop an end to end feature had shrunk to a couple of days. How is anyone supposed to review all of that code in one go? One solution was to use an agent to review PRs as well. That worked well. The suggestions sometimes caught things a human reviewer might have missed. But it did not solve the deeper problem.

Two Layers of the Same Problem

Eventually, after a month of developing this way, one thing became very clear. I did not truly understand the depth of any feature I had not built myself. When speed is the priority, this gap is easy to overlook. And this problem had two layers to it.

The first layer was expertise. On concepts I was already familiar with, I could catch what the agent missed. Incorrect assumptions, questionable design decisions, edge cases. But when I worked on new concepts, I did not have the context to know what I was missing. The agent could be missing edge cases or making wrong assumptions and I would have no way of seeing it.

The second layer was learning. Earlier, I built knowledge by doing. Filing bugs across all features, spending time understanding a repo deeply, that was how real understanding was developed. With agents, that time disappeared. But here is what I missed. I was so focused on building fast that I overlooked the fact that I could have been learning faster too. The same LLMs accelerating my output could have been accelerating my understanding. That advantage was sitting right there and I walked past it.

The Mental Note That Never Gets Revisited

With every feature shipped in a hurry to meet shorter delivery expectations, tech debt kept accumulating in my brain. A mental note to review the design later, to understand it more deeply. In customer interactions, more and more questions were met with some version of “let me get back to you.”

Turning the Tables

And it was those customer interactions that pointed to the solution. Working in enterprise software, we are constantly asked to explain features and decisions to non-technical stakeholders. We write extensive design documents walking customers through our entire approach, the alternatives we considered, and the reasoning behind every choice. The goal is always to help a skeptical stakeholder understand and trust the output. I realized this was exactly what I needed to do with the coding agent. Not just accept what it produced, but become that skeptical stakeholder myself.

For every design decision Claude returned, I turned the tables and asked it to explain the approach as if I were a new engineer on the project:

“Explain this code change to me as if I am a new engineer joining this project today. What problem does it solve, what assumptions did you make, and what alternative approaches did you consider and reject?”

Every bug fixed with the help of a coding agent required the agent to explain its reasoning before we moved on:

“Before we proceed, walk me through how you identified the root cause of this bug. What assumptions are you making about the system state? What other parts of the codebase could this fix affect?”`

Every design document created with a coding agent had to be written in plain, non-technical language. Sometimes I pushed further, asking what other approaches could have been used to solve the same problem.

The Multi-Agent Review Framework

With the advent of multi agent workflows, this practice became even more rigorous. But before any agent writes a single line of code, I now start with a master prompt that sets the standard for the entire implementation:

“Before writing any code, create or update the design document for this feature in plain, non-technical language. Include the problem being solved, the assumptions being made, the approach chosen, and the alternatives considered and rejected. Once the design document is reviewed and approved, implement the changes. After implementation, the following agents will review the output in sequence: a solution architect who will validate the design decisions and flag scalability or maintainability concerns, an end user who will validate that the feature behaves as a non-technical stakeholder would expect, and a senior engineer with no prior context to this codebase who will review the diff for correctness, edge cases, and anything undocumented. No code should be considered complete until all three review passes are done and their findings addressed.”

At first, I did not expect this multi pass review to surface anything significant. But every single time, it found something. A missing edge case. An assumption that was never validated. A design decision that made sense to the agent that built it but was completely opaque to the agent reviewing it with fresh eyes.

The three review agents each bring a different lens:

The solution architect validates whether design decisions will hold up at scale and whether assumptions are explicitly supported by the code:

“You are a solution architect reviewing this feature. Identify any design decisions that could create scalability or maintainability problems. Flag assumptions that are not explicitly validated in the code.”

The end user reads the design document and the code and describes what the feature does in plain, non-technical language, flagging anything that does not match what a real user would expect:

“You are a non-technical end user of this feature. Based on the code and design document, describe what this feature does in plain language. Flag anything that seems inconsistent with what a user would expect.”

The senior engineer reviews the diff as someone who has never seen this codebase before, flagging anything unclear, undocumented, or likely to cause problems down the line:

“You are a senior engineer who has never seen this codebase before. Review this diff for correctness, edge cases, and anything that seems unclear or undocumented. Do not assume context that is not visible in the diff itself.”

This takes a little time upfront. But it is essential to ensure that we not only go fast, but also go far.

From Black Box AI to Black Box Codebases

Previously, many non-technical customers struggled to trust AI results due to the inherent black box nature of the output. That problem was addressed by building in explainability and evidence packages. As software engineers, we now need the same discipline. Our codebases should not become black boxes that only coding agents can navigate and debug. The reasoning behind every code change needs to be encoded alongside the PR, for every line added or modified. That is what will ensure we develop fast and expand our understanding at the same rate.