Recently, Amazon convened a mandatory engineering meeting after a string of outages tied to AI-assisted code changes. One outage took down the shopping experience for six hours. Another cost AWS hours of downtime after engineers let an AI coding tool make changes without adequate review.
Amazon already had an approval process in place. It was either bypassed or not enforced. The fix was not to add a committee or freeze AI tool usage. It was targeted: developers now explicitly mark code as "AI-assisted" in commits, and that code gets a dedicated senior engineer review before merging, scoped to the highest-risk systems like checkout, payments, and inventory.
Process that exists but is not followed is not process. And the fix was not more bureaucracy. It was visibility into what AI produced and accountability for what ships. That lines up with what a decade of DORA research tells us: heavyweight approval processes are one of the strongest predictors of poor delivery performance. The answer to AI-related risk is not more gates. It is the right gates, in the right places, enforced consistently.
Addy Osmani put it simply in his AI coding workflow post: "I remain the accountable engineer." Review the code, understand it, never merge blindly. Trust but verify. The human stays in the loop not to slow things down, but because someone has to be accountable for what ships.
The question is where human involvement creates the most value. I think it comes down to three things.
Capture intent before code. If the spec does not define what "done" looks like in concrete, verifiable terms, no amount of review downstream will catch a feature that works but misses the point. This matters more when AI writes the code, because AI will build exactly what you ask for, including the wrong thing if you asked poorly. Structured acceptance criteria and clear scope before any code gets written is the highest-leverage investment you can make.
Automate the guardrails, not the gates. AI makes writing tests dramatically faster. TDD, integration testing, performance testing, security scanning. These are practices that teams claimed they could not invest in because of feature delivery pressure. That barrier is gone. The practices that define high-performing teams should become table stakes, not aspirational.
Keep as few human gates as humanly possible. The merge approval is where human judgment earns its keep. Not a committee. Not a change advisory board. An experienced engineer who understands the change, confirms it matches the intent, and approves it. If the upstream process and automated guardrails are doing their jobs, that should be enough.
The Accelerate principles still hold: small changes, fast feedback, automated pipelines, deploy frequently. What changes when AI writes the code is that the process around intent and spec quality has to be more disciplined than it was before. The pipeline stays fast. The inputs get tighter.
What does your process look like for ensuring intent before AI-generated code enters the pipeline?
Top comments (0)