The Numbers Don't Add Up
Every vendor pitch says the same thing: AI will make your developers 50% faster. GitHub claims Copilot users complete tasks 55% faster. Sounds great on a slide deck.
Here's the reality check.
METR's randomized controlled trial — actual experienced open-source developers, on their own repos, doing real work — found that AI tools made developers 19% slower. Not faster. Slower. And the kicker: developers believed they were 24% faster while being measurably slower.
Faros AI's study across 10,000+ developers and 1,255 teams tells a similar story. Individual throughput goes up. Developers merge 98% more pull requests. But PR review time balloons by 91%. Bug rates increase 9% per developer. PR sizes grow 154%.
At the company level? No measurable productivity improvement.
That's not a tooling problem. That's a systems problem.
Where AI Actually Helps (and Where It Doesn't)
AI coding assistants are genuinely good at:
- Boilerplate and repetition. Config files, test scaffolding, CRUD endpoints. The stuff nobody wants to write.
- Exploration and prototyping. Trying three approaches in the time it took to try one.
- Translation between languages and frameworks. Porting patterns from Go to TypeScript, or vice versa.
- Documentation first drafts. Getting from blank page to decent starting point.
AI coding assistants are bad at:
- Architecture decisions. LLMs don't understand your system's constraints, trade-offs, or history.
- Debugging production issues. They lack context about your infrastructure, traffic patterns, and failure modes.
- Code review. Automated review catches style issues. It misses the "this will cause a race condition under load" kind of problems.
- Knowing when not to write code. The best engineering decisions are often about what you don't build.
The pattern is clear: AI accelerates the cheapest part of software development (writing code) while doing nothing for the expensive parts (design, review, debugging, deployment, maintenance).
The Bottleneck Shift Nobody Talks About
When developers produce more code faster, the bottleneck moves downstream. Every team we've worked with that adopted AI coding tools aggressively saw the same sequence:
- Week 1-4: Developers feel faster. PR volume spikes.
- Month 2: Review queues back up. Senior engineers spend all day reviewing AI-generated PRs.
- Month 3: Bug reports climb. AI-generated code passes CI but fails in edge cases nobody tested.
- Month 4: Lead times are longer than before because everything is stuck in review.
Faros AI's data confirms this: developers on high-AI-adoption teams touch 47% more PRs per day but the review bottleneck absorbs all the gains. Amdahl's Law in action — the system moves at the speed of its slowest link.
What Engineering Leaders Should Actually Do
1. Fix the Pipeline, Not the Typing Speed
If your deployment pipeline takes 45 minutes and your code review queue is 3 days deep, making developers type faster changes nothing. Invest in:
- Automated review gates that catch the obvious stuff before a human looks at it
- Smaller PR culture — AI makes it easy to generate massive changesets, but massive PRs are review killers
- Deployment confidence — feature flags, canary releases, automated rollbacks
2. Measure What Matters
Stop measuring lines of code or PRs merged. Start measuring:
- Lead time from commit to production
- Change failure rate — are you shipping more bugs?
- Review turnaround — is your review queue growing?
- Time to recovery when something breaks
If AI adoption increases PRs merged but also increases your change failure rate, you're not winning. You're just failing faster.
3. Use AI for the Boring Stuff, Humans for the Hard Stuff
The best teams we see treat AI as a force multiplier for the tedious work:
- Generating test cases from specifications
- Writing migration scripts
- Producing first-draft documentation
- Scaffolding new services from templates
And they keep humans firmly in control of:
- System design and architecture
- Security-critical code paths
- Performance-sensitive implementations
- Cross-team integration points
4. Don't Skip the Learning Phase
The METR study found one reason AI slowed developers down: context switching between their own thinking and the AI's output. Experienced developers had deep mental models of their codebases. AI suggestions often didn't match those models, requiring extra time to evaluate and adapt.
This means junior developers might get bigger speed gains (they have less established mental models to conflict with), but they also need more supervision, not less. AI doesn't replace mentorship. If anything, it makes mentorship more important — someone needs to catch the plausible-looking-but-wrong code that juniors accept uncritically.
The Honest Assessment
AI-assisted engineering is here to stay. 82% of developers use AI tools weekly. About 27% of production code is now AI-authored. That's not going backwards.
But the productivity gains are real only when you redesign your entire delivery pipeline around higher code volume — not when you bolt AI onto an unchanged process and hope for the best.
The teams getting actual value from AI are the ones that:
- Invested in review automation before increasing code output
- Set up quality gates that catch AI-specific failure modes
- Trained their developers on when to use AI and when to think for themselves
- Measured end-to-end delivery metrics, not just coding speed
The rest are just generating more code to review, more bugs to fix, and more PRs to merge — while wondering why the roadmap still slips.
Need help integrating AI tools into your engineering workflow without creating new bottlenecks? Let's talk.
Top comments (0)