Developers using AI coding tools believe they're 24% faster. Measured objectively, they're actually 19% slower. That's a 43-point perception gap, and it explains a lot about why AI tool adoption keeps rising while trust keeps falling.
The data comes from the METR study, which measured experienced open-source developers working on their own repositories with tasks they selected themselves. This wasn't a lab experiment with toy problems. These were real developers doing real work, and the AI made them measurably less productive while making them feel more productive.
Here's what's actually happening, and why it matters for how you use these tools.
The perception gap is the finding
The headline "AI makes you slower" is catchy but misses the point. The real finding is that developers can't tell whether AI is helping them or not.
The study found that developers consistently overestimated the time savings from AI assistance. They reported feeling faster, more confident, and more productive. The clock said otherwise. The gap came from time spent on activities that feel productive but aren't: reviewing AI suggestions, debugging AI-generated code, re-prompting after bad outputs, and context-switching between their own thinking and the AI's suggestions.
This matches what the Stack Overflow 2025 Developer Survey found at scale: trust in AI accuracy dropped to 29%, down from 40% the year before. 46% of developers actively distrust AI tool accuracy. Developers are using the tools more while trusting them less, which is a strange place to be.
Where AI actually helps (and where it doesn't)
The METR study doesn't say AI is useless. It says AI on familiar codebases with experienced developers doesn't save time on average. That's a specific finding about a specific context.
Other data tells a more nuanced story. The CodeRabbit analysis of 470 GitHub repositories found AI-authored code has 1.7x more issues than human-written code. But the distribution matters:
| Task type | AI advantage | AI disadvantage |
|---|---|---|
| Boilerplate generation | High (repetitive, low-context) | Low |
| Bug fixing on unfamiliar code | Medium (broad knowledge) | Medium (misses local context) |
| Architecture decisions | Low (needs full project context) | High (hallucinates patterns) |
| Security-sensitive code | Very low | Very high (2.74x more XSS vulns) |
| Code explanation | High (good at summarizing) | Low |
| Refactoring existing code | Low (METR finding) | High (19% slower) |
The pattern: AI excels at tasks where broad training data matters more than local project context. It struggles where deep understanding of a specific codebase is required. Refactoring your own code on a project you know well is the worst case for AI assistance because you already have the context the AI lacks.
Why experienced developers get less benefit
This is counterintuitive. You'd expect experienced developers to get more from AI tools because they can better evaluate and direct the output. Google Chrome engineering lead Addy Osmani made this argument and it's partly right: experts do use AI more effectively than beginners.
But the METR study suggests a ceiling effect. When you already know a codebase well, the AI's contribution is marginal. You can write the code yourself about as fast as you can prompt, review, then correct the AI's attempt. The overhead of the human-AI collaboration loop (prompt, wait, read, evaluate, accept/reject, fix) eats the time savings from not typing the code yourself.
For beginners or developers working on unfamiliar codebases, the calculus is different. The AI brings knowledge the developer doesn't have. But the GDC 2026 report found that among game developers who use AI, 81% use it for research and brainstorming, 47% for code assistance, and only 35% for prototyping. Most developers have already figured out that AI works better for exploration than execution.
What to actually do with this information
The METR study doesn't mean you should stop using AI tools. It means you should be more intentional about when you use them.
Use AI for code you don't know. Exploring a new library, understanding an unfamiliar codebase, generating boilerplate for a framework you've used twice. These are contexts where the AI's broad training data genuinely helps.
Don't use AI for code you know well. If you've written similar code 50 times, your fingers are faster than the prompt-review-fix loop. The METR study specifically found that experienced developers on familiar repositories were the ones who got slower.
Watch for the perception trap. If you feel like AI is making you faster, check the data. Track how long tasks actually take with and without AI. The 43-point perception gap means your intuition about AI productivity is probably wrong.
Treat AI output like a junior developer's PR. Review it with the same scrutiny. The CodeRabbit data shows AI code has 1.75x more logic errors and 8x more excessive I/O operations. A quick skim is not sufficient review.
Use AI for understanding, not generation. "Explain this function" and "What does this error mean" are consistently the highest-value AI use cases across every study and survey. The generation use case gets all the marketing attention, but the comprehension use case delivers more actual value.
The productivity question is the wrong question
The deeper issue with the METR study is that "does AI make developers faster?" might be the wrong question. Speed is one metric. Code quality, security, maintainability, developer satisfaction are others.
The Cortex 2026 Engineering Benchmark found incidents per pull request increased 23.5% and change failure rates rose ~30% across engineering teams adopting AI tools. Developers are shipping more code, but the code is causing more problems.
If AI makes you feel 24% faster while actually making you 19% slower and producing 1.7x more bugs, the net effect on your project is significantly negative. The teams that figure this out and use AI selectively (for comprehension and exploration) rather than universally (for all code generation) will outperform those that don't.
The tools will get better. The models will improve. But the perception gap will persist as long as developers rely on how AI-assisted coding feels rather than measuring what it produces.
Top comments (1)
true