DEV Community

Cover image for Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.
Matthew Hou
Matthew Hou

Posted on

Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.

Last month, METR published a study that should make every developer uncomfortable.

They took 16 experienced open-source developers — people who knew their codebases inside out — and randomly assigned tasks to be done with or without AI tools.

Predicted Measured Post-study belief
Speed impact +24% faster -19% slower "It helped me"

I've been using AI coding tools daily for the better part of a year. When I read that study, my first reaction was "well, those developers must have been doing it wrong." My second reaction was: that's exactly the kind of thinking the study warns about.

The perception gap is the real finding

The speed numbers get all the attention, but I think the important finding is the perception gap. We feel faster because AI handles the boring parts — boilerplate, syntax, the stuff that feels like work but isn't where the actual difficulty lives. Meanwhile, the hard parts get harder: understanding what AI changed, verifying it's correct, keeping a mental model of code you didn't write.

Simon Willison — the guy behind Datasette and one of the most prolific AI-assisted developers I know of — wrote something that stuck with me:

"I no longer have a solid mental model of what my projects can do and how they work."

This is a developer who's built 80+ tools with AI assistance. If he's struggling with mental models, maybe the issue isn't experience level.

Attention is the actual bottleneck

Here's how I think about it now:

Before AI:  Think → Write → Test → Debug
With AI:    Describe → Review → Verify → Debug AI → Debug your understanding
Enter fullscreen mode Exit fullscreen mode

The writing step got cheaper. Everything else got more expensive. And "reviewing code you didn't write" is cognitively harder than "writing code you understand" — anyone who's done code review knows this.

"AI turned us all into Jeff Bezos — automated the easy work, left all the hard decisions." — Steve Yegge

The METR study essentially confirmed what a lot of us have been feeling but didn't want to admit: AI coding tools don't save time. At best, they redistribute where your attention goes. At worst, they create an illusion of productivity while the cognitive load actually increases.

What I actually changed

I stopped optimizing for speed. Instead, I started asking: "where is my attention going?"


1. I front-load the thinking, not the prompting.

Before I touch any AI tool, I write down — in plain text — what I want, why I want it, and what "done" looks like. Not for the AI. For me. This takes 5-10 minutes and it's the most impactful thing I do all day, because it forces me to think before generating.

Kent Beck calls this the distinction between "augmented coding" and "vibe coding." The latter is hoping the AI gives you working code. The former is knowing what working code looks like before the AI writes it.


2. I treat verification as the actual job.

I used to think of code review as a chore you do after the real work. Now it IS the real work. StrongDM's team took this to the extreme — their "Dark Factory" setup has zero human code review. All investment goes into tests, tools, and simulations. The humans define what correct looks like. The machines do everything else.

I'm not there yet, but the direction is clear: my value isn't in writing code. It's in defining what "correct" means for my specific context.


3. I stopped measuring productivity in output.

More lines of code is not more productivity. More PRs is not more productivity. The Harness 2025 survey found that 67% of developers spend more time debugging AI-generated code than they would have spent writing it themselves. If that's you, generating more code faster is making things worse, not better.

The metric I care about now: how much of my attention went to decisions only I can make? Architecture choices, user-facing trade-offs, "should we even build this" — that's the stuff AI can't do. Everything else, I want to automate not because it's faster, but because it frees up mental bandwidth for the hard problems.

The uncomfortable implication

If the METR study is right — if AI tools don't actually save time for experienced developers on familiar codebases — then the value proposition of AI coding isn't "10x productivity." It's something more subtle:

The ability to spend your attention on higher-impact work, if you're disciplined enough to actually do it.

That's a much harder sell than "write code faster." It requires you to know what high-impact work looks like, and to resist the dopamine hit of watching AI generate 200 lines in 3 seconds.

I don't have this figured out. Some days I still catch myself vibe coding and pretending the output is good because it compiled. The METR study's perception gap isn't just about their participants — it's about all of us.

But at least now, when I feel productive with AI, I stop and ask: am I actually productive, or does it just feel that way?


Top comments (52)

Collapse
 
leob profile image
leob • Edited

Maybe we should move away a bit from the idea of using AI tools for "coding" only, and use it more in an 'advisory' role instead, as virtual brainstorming buddies to sound ideas off of - to generate ideas ...

Coding, yes, but only for the "boring" stuff, setting up the nitty gritty of a project (tooling etc), pure boilerplate etc - not the parts where writing the code actually feels like a worthwhile thing to do!

Collapse
 
matthewhou profile image
Matthew Hou

Yeah, the "advisory role" framing resonates. I've actually been shifting toward that myself — using AI more as a thinking partner than a code generator. The best sessions I have are when I describe a problem and go back and forth on approaches before writing anything.

And you're right about the boilerplate distinction. There's a real difference between "code I need to exist" and "code I need to understand." AI is great at the first category. For the second, I'd rather write it myself and have AI poke holes in it afterward.

Collapse
 
leob profile image
leob

Wholly agree! This also reminds me of another recent article on dev.to, where the author argues that actually coding, even before AI arrived, was never more than 20-25% of the work anyway (I think he mentioned an even lower percentage) - the rest is thinking, planning, testing, debugging, deploying etc ... so we're now using AI to automate part of those 20% - maybe we should see how it can help more with those 80% !

Thread Thread
 
matthewhou profile image
Matthew Hou

That's the reframe I keep coming back to. We're optimizing the part of the job that was already the smallest slice. The 80% — understanding requirements, debugging across system boundaries, figuring out what to build in the first place — that's where the real leverage is. I've been getting more value from using AI as a thinking partner during design than as a code generator during implementation. The code part is almost the easy part.

Thread Thread
 
leob profile image
leob • Edited

Totally, and the advantage is also that it's low risk - you ask for advice or ideas, and then you use them or you don't - but when AI spits out a few hundred lines of code, the onus is on you to check/review it, and make sure there are no bugs or security holes in it ... I do think the whole "AI for coding" debate might need a bit of a rethink as to what 'strategies' are in fact most productive (smallest pains, biggest gains) ... keep an open mind!

Collapse
 
mahima_heydev profile image
Mahima From HeyDev

This matches what I’ve seen on real codebases - the “speedup” from AI shows up in greenfield work, but it often flips negative once you’re debugging across boundaries (tests, CI, infra, prod data). One thing that helped our team was treating AI output like a junior PR: tight diff size, explicit acceptance criteria, and running the full test suite before you trust the change. I’m curious if METR broke down the slowdown by task type (feature work vs refactors vs bugfixes), because the variance there is huge.

Collapse
 
matthewhou profile image
Matthew Hou

The "treat AI output like a junior PR" framing is exactly right — tight diff size is the key part people miss. I've noticed the moment a single AI-generated change touches more than ~200 lines, my ability to catch subtle bugs drops off a cliff. On the METR task type breakdown — they didn't publish granular splits, but from what I've seen in my own work, refactors are where AI hurts the most. The existing code has implicit constraints that AI doesn't see, so you end up debugging context it never had.

Collapse
 
matthewhou profile image
Matthew Hou

"Movement vs review tax" — that's a really useful framing. I've started thinking about it as an attention budget. AI is great at generating movement, but every line it writes draws from your review budget. The net gain depends entirely on whether the movement was in a direction you actually needed to go.

The teams I've seen handle this well aren't trying to make AI write more code — they're investing in making the review step cheaper. Better types, better tests, better module isolation. If you can glance at a diff and know whether it's right in 10 seconds instead of 10 minutes, that's where the real productivity gain lives.

Collapse
 
mahima_heydev profile image
Mahima From HeyDev

This matches what I’ve seen in teams adopting AI tooling - the easy path is cranking out more code, but the real constraint becomes attention and review bandwidth.

The “treat verification as the job” point is huge: you need fast feedback loops (tests, linters, tracing) so the human can spend time on intent, not spelunking.

One thing that helped us is front-loading constraints in a short checklist before prompting, then requiring the AI to propose tests and failure cases first.

Curious if METR broke down the slowdown by codebase familiarity or just raw experience level?

Collapse
 
matthewhou profile image
Matthew Hou

The "propose tests and failure cases first" pattern is something I've been converging on too. It flips the dynamic — instead of generating code and then figuring out if it's right, you're establishing what "right" means upfront. Curious how detailed your checklists get. Mine started as 3-4 items and have grown to about 10, which makes me wonder if I'm over-engineering the prompt instead of the code.

Collapse
 
hilton_fernandes_eaac26ab profile image
Hilton Fernandes

I think AI is useful for developing code in code bases one is not acquainted to. Due to its nature of learning from existing code, it usually brings fragments of code that are up to date with the new and updated version of API's and techniques. It's useful too for creating routine tasks that are already very well established -- that is, boiler plate code. It doesn't particularly excel in new tasks. In this case the generated code should be seen as prototype coding: it exposes problems and possible solutions, but it's code that's not ripe and must be used to inspire the writing of useful code.

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

Adapting boilerplate code is fine and valid, like create-react-app, only more generic. Our industry shouldn't have needed expensive LLM models to do that though. Debugging? AI can understand Tailwind and TypeScript, but a legacy web project from 2016? No chance, unless it's just boilerplat from ten years ago.

Collapse
 
matthewhou profile image
Matthew Hou

"Prototype coding" is a great way to put it. That's pretty much how I treat AI output now — it's a first draft that shows me the shape of a solution, not the solution itself. Especially useful when you're working with an unfamiliar API and need to see what the integration surface looks like before committing to an approach.

The key shift for me was stopping to expect production-ready code and starting to expect "good enough to learn from." Once you adjust that expectation, the frustration drops significantly.

Collapse
 
signalstack profile image
signalstack

The 'attention redistribution' framing is the right diagnosis. Generation got cheap. Verification didn't.

I run a few AI models in production — parallel workloads, different models handling different tasks. The pull is always toward more: more agents, more parallelism, more throughput. But the real constraint doesn't change: how much cognitive load does it take a human to audit what came out?

A setup with three models producing clean, auditable outputs beats ten models producing plausible-but-questionable ones. Every time. The overhead compounds.

The point about expertise interfering with AI output is underappreciated. When you already have a strong mental model, a confident-but-wrong suggestion doesn't just waste time — it has to be actively rejected. That rejection costs more than silence would have. For a junior dev with weak priors, AI fills gaps. For someone who already knows the answer, it often adds noise you have to fight through.

The Dark Factory direction is the honest conclusion. You don't eliminate the human verification cost. You push it earlier, into test design and spec writing. Which is basically just the old TDD argument wearing new clothes.

Collapse
 
matthewhou profile image
Matthew Hou

The cognitive load point is the one most people skip over. "Just add more agents" sounds great until you're spending more time reviewing outputs than you saved generating them. I've hit that wall — at some point you realize the bottleneck was never typing speed.

And yeah, the TDD parallel is real. Writing good specs and test cases upfront is basically the same discipline, just reframed for a world where the machine writes the first draft. The skill shifts from "can I write this" to "can I define what correct looks like before anything runs."

Collapse
 
ingosteinke profile image
Ingo Steinke, web developer

Where's the "dopamine hit" when AI generates 200 lines of code that should have been 20, hides at least one subtle bug within and adds five paragraphs of text and a desperate call to action, and when you pinpoint the error it utters verbose excuses, fixes the error and adds to other ones. This is just bullshit making me even more disappointed and angry when fellow coworkers insits that AI makes them "more productive". Hope this study will open their eyes!

Collapse
 
matthewhou profile image
Matthew Hou

Ha, you're describing a very real pattern. The verbosity is genuinely one of the most annoying things — you ask for a 5-line fix and get 80 lines of refactored code plus an essay explaining why.

I think the frustration your coworkers cause is actually a separate problem from the tool itself. The tool has real limitations. But "AI makes me more productive" and "AI makes me feel more productive" can both be true for different tasks and different people. The METR data just makes it harder to hand-wave away the gap between perception and measurement.

Collapse
 
xwero profile image
david duymelinck

I read, the next.js rebuild from Cloudflare yesterday. And the part that struck me is their way of working. They define small tasks and let AI work on those.
This is concrete example of the AI is good at doing small things line I'm hearing in presentations.

So I guess spec driven AI is out and issue driven AI is in. Like you would do if you had a team of developers.

Collapse
 
matthewhou profile image
Matthew Hou

That Cloudflare post is a great example. "Small well-defined tasks" is exactly where AI shines — it's basically the same conclusion the METR study points to, just from the other direction.

"Spec driven AI is out, issue driven AI is in" — I like that framing. Treat AI like a junior dev who's great at executing clearly scoped tickets but terrible at interpreting a vague spec. The better your issue description, the better the output. Which is, like you said, the same workflow you'd use with a human team.

Collapse
 
gass profile image
gass

Don't get trapped in the weeds. Use AI as an assistant, not for writing code. Every issue related to skills degrading is related to AI coding for them. If you are programmer, program you lazy bastard. It will give you all you need: understanding of the project, contexts, practice, speed at typing, mental gymnastics. In every discipline professionals need to practice to improve or maintain skills, so don't give that practice to the machine. Is simple really.

Collapse
 
matthewhou profile image
Matthew Hou

The skills degradation angle is underrated. I've caught myself reaching for AI on things I used to just... do. And every time I did, the understanding got a little shallower.

That said, I don't think it's all-or-nothing. There are parts of coding where the practice builds understanding (architecture decisions, debugging, core logic) and parts where it's just mechanical repetition (config files, boilerplate wiring). I'm trying to be more deliberate about which category something falls into before deciding whether to hand it off.

Collapse
 
cognix-dev profile image
cognix-dev

The "redistribution" framing is exactly the right diagnosis. But I'd argue it's a symptom of a design problem: most AI coding tools are optimized for generation speed, not for reducing the human verification cost that follows.
That's what we tried to address with Cognix. Instead of asking "how fast can we generate code?", the design question was "how much human attention does verifying this output require?" Multi-stage validation, quality gates before the code reaches you — the goal is minimizing the attention tax, not just moving it somewhere else.
If the bottleneck is always human verification, the tool should be designed around that bottleneck.

Collapse
 
matthewhou profile image
Matthew Hou

"How much human attention does verifying this output require" is a better design question than most AI tool companies are asking. The generation speed race feels like it's hitting diminishing returns — the bottleneck moved downstream months ago. I haven't tried Cognix yet but the framing is right. The tools that win long-term will be the ones that make review faster, not generation faster.

Collapse
 
cognix-dev profile image
cognix-dev

Thanks for your reply. Your feedback has given me courage. I'll implement the approach to improve human review speed more carefully!

Collapse
 
vasughanta09 profile image
Vasu Ghanta

Insightful take on the METR study—eye-opening how developers perceived a 24% speed boost but measured 19% slower.
Prioritizing attention on verification over raw output makes total sense for real productivity.

Collapse
 
matthewhou profile image
Matthew Hou

Thanks! That perception gap was the thing that stuck with me too. 24% faster in your head, 19% slower on the clock — it's a pretty humbling data point. Makes you wonder how many other "productivity gains" are just vibes.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.