Hello Devs š
We talk a lot about AI speeding up coding.
Youāve probably seen numbers like:
"AI increases developer productivity by 25 to 35 percent."
Honestly that feels true, and multiple surveys and reports do show developers reporting significant boosts in productivity and time saved on routine tasks due to generative AI tooling. For example, large industry surveys show more than 80 percent of developers saying AI improved their individual productivity, and real-world telemetry showing task completion rising significantly with AI assistance.
Autocomplete feels better. Boilerplate goes faster. Refactoring can take fewer keystrokes.
But hereās something Iāve been thinking about lately:
If we are writing code faster, are we reviewing it faster too?
Because if review becomes the bottleneck, DORA metrics wonāt improve. They might even get worse.
Letās break this down in simple terms.
First, What Are DORA Metrics?
The DevOps Research and Assessment program defined four key engineering metrics:
- Deployment Frequency
- Lead Time for Changes
- Change Failure Rate
- Time to Restore Service
These metrics basically answer:
- How often do you ship?
- How long does it take to ship?
- How often does it break?
- How fast do you fix it?
If AI code review truly helps engineering performance, we should see positive movement across these.
Read more about DORA here š https://getdx.com/blog/dora-metrics
The AI Code Review Paradox
Recent industry data suggests something surprising: teams with high AI adoption often see code review time increase sharply, even as code output rises. In Faros AIās 2025 data, code review time grew by around ~91 percent as the volume and size of pull requests (PRs) increased.
That raises a question:
Why would review take longer when code is written faster?
Possibilities:
- More PRs are being created
- Larger diff sizes requiring deeper review
- "AI-generated but needs cleanup" code is common
- Reviewers double-check AI output carefully given trust concerns
So while coding accelerates, review capacity does not automatically scale to match.
Thatās the paradox.
Letās Connect AI Review to Each DORA Metric
Instead of looking at tools, letās think logically.
1ļøā£ Deployment Frequency
If reviews are faster, PRs merge faster.
If PRs merge faster, deployments can happen more often.
AI review can help by:
- Giving instant feedback on PR creation
- Catching obvious issues before a human reviewer steps in
- Reducing back-and-forth cycles
But this only works if:
- The feedback is accurate
- Developers trust it
If itās noisy, it slows things down instead.
2ļøā£ Lead Time for Changes
Lead time = commit ā production.
Where does most delay happen?
Usually:
- Waiting for review
- Fixing issues after review
- CI failures
If AI review catches issues immediately inside the PR, it can reduce:
- Rework cycles
- Manual review load
- Idle waiting time
Some organizations estimate meaningful time savings per PR when review insights trigger faster fixes, which compounds across a team.
3ļøā£ Change Failure Rate
This measures how often deployments lead to failures.
Quality matters here.
Data from several sources shows that larger PRs and heavier reliance on AI-generated content can increase bug rates unless issues are caught before merge. In the Faros dataset, bug rates rose slightly even as output increased.
If review catches high-severity issues earlier:
- Fewer production incidents
- Fewer hotfixes
- Less rollback stress
But false positives or poor insight quality donāt help. High-signal detection does.
4ļøā£ Time to Restore Service
This one is indirect.
Better code quality means fewer incidents, and thus less firefighting.
AI alone doesnāt magically accelerate incident resolution, but it can prevent some incidents in the first place, easing the load on on-call teams.
Where Different Tools Fit
Now letās look at tools categories:
Diff-First AI Review Tools
These look mainly at what changed in the PR.
Examples include lightweight PR assistants and AI summaries.
Pros:
- Fast feedback
- Easy setup
- Helpful for small changes
Cons:
- Limited context
- Can miss architectural issues
- Sometimes noisy
If developers start ignoring comments, DORA does not improve.
Static Analysis Tools
Rule-based analyzers like SonarQube catch security bugs, code smells, and other structural issues.
Pros:
- Strong baseline quality checks
- Good compliance signals
Cons:
- Rule-based rather than contextual
- Doesnāt inherently reduce review friction unless integrated deeply
Context-Aware AI Review
More advanced systems tools, like Qodo, try to analyze broader context beyond just the diff.
The idea is:
- Understand surrounding files
- Recognize patterns in your codebase
- Provide more precise suggestions
If precision is high and developers trust it, this can meaningfully reduce review cycles.
If not, it becomes another notification stream to ignore.
What the Data Suggests
Hereās what current reports indicate:
- High AI adoption can increase review time if processes donāt adapt, because PR volume and size grow faster than review capacity.
- Developers widely report productivity improvements from AI while expressing mixed confidence in AI-generated code.
- Survey data shows many developers remain cautious about trusting AI output and often verify it manually.
The key insight for me:
AI code generation alone does not improve DORA metrics.
AI review maturity does.
My Honest Take
If you are:
- Writing more code because of AI
- Creating more PRs
- Increasing review load
Then you need to scale review alongside generation.
Otherwise:
- Deployment frequency stalls
- Lead time increases
- Review fatigue grows
AI code review can improve DORA metrics.
But only if:
- It integrates directly into your PR workflow
- It provides high-signal feedback
- Developers actually trust it
Without trust, metrics donāt move.
And thatās the part most discussions skip.
Thank You!!š
Thank you for reading this far. If you find this article useful, please like and share this article. Someone could find it useful too.š
Top comments (6)
The review paradox you describe is real and I'd add a dimension specific to AI-heavy teams: review complexity grows faster than review volume. When a human writes a data transformation, reviewers roughly know where the bugs hide. When an AI writes the same function, reviewers have to verify correctness from first principles because they can't rely on "I know this person's typical mistake patterns."
Building fintech data pipelines, we've noticed this specifically in Change Failure Rate. We actually saw CFR increase for the first 3 months after integrating AI-assisted coding ā not because the AI wrote bad code, but because reviewers trusted the confident-looking output more than they should have and merged faster. The fix wasn't better AI review tooling, it was changing the human process: requiring a domain expert sign-off on any AI-touched financial calculation, regardless of how clean the AI review looked.
Your point about trust being the core variable is the right frame. The DORA improvements don't come from the AI review tool ā they come from teams developing calibrated trust in it over time. That takes months of deliberate feedback loops, not just tool adoption.
Thanks Vic!
Your fintech example makes sense too. Trusting confident looking AI output too quickly can definitely raise Change Failure Rate at first. I like that you fixed it with a process change, not just more tooling š«”
Exactly - the tooling instinct is strong because it feels like progress. But the real fix was making the human review step mandatory before any AI-suggested change went to staging. Sometimes the most effective intervention is organizational, not technical. Great article and discussion!
š
Great read! The point about trust in AI feedback really resonates ā speeding up code generation doesnāt automatically boost DORA metrics unless review quality keeps pace. Worth thinking about how review capacity scales with AI usage, not just output.
Thanks Andy!