DEV Community

Developers Think AI Makes Them 24% Faster. The Data Says 19% Slower.

Matthew Hou on February 24, 2026

Last month, METR published a study that should make every developer uncomfortable. They took 16 experienced open-source developers — people who kn...

Read full post

leob • Feb 25 • Edited

Maybe we should move away a bit from the idea of using AI tools for "coding" only, and use it more in an 'advisory' role instead, as virtual brainstorming buddies to sound ideas off of - to generate ideas ...

Coding, yes, but only for the "boring" stuff, setting up the nitty gritty of a project (tooling etc), pure boilerplate etc - not the parts where writing the code actually feels like a worthwhile thing to do!

Matthew Hou • Feb 25

Yeah, the "advisory role" framing resonates. I've actually been shifting toward that myself — using AI more as a thinking partner than a code generator. The best sessions I have are when I describe a problem and go back and forth on approaches before writing anything.

And you're right about the boilerplate distinction. There's a real difference between "code I need to exist" and "code I need to understand." AI is great at the first category. For the second, I'd rather write it myself and have AI poke holes in it afterward.

leob • Feb 26

Wholly agree! This also reminds me of another recent article on dev.to, where the author argues that actually coding, even before AI arrived, was never more than 20-25% of the work anyway (I think he mentioned an even lower percentage) - the rest is thinking, planning, testing, debugging, deploying etc ... so we're now using AI to automate part of those 20% - maybe we should see how it can help more with those 80% !

Matthew Hou • Feb 28

That's the reframe I keep coming back to. We're optimizing the part of the job that was already the smallest slice. The 80% — understanding requirements, debugging across system boundaries, figuring out what to build in the first place — that's where the real leverage is. I've been getting more value from using AI as a thinking partner during design than as a code generator during implementation. The code part is almost the easy part.

leob • Mar 1 • Edited

Totally, and the advantage is also that it's low risk - you ask for advice or ideas, and then you use them or you don't - but when AI spits out a few hundred lines of code, the onus is on you to check/review it, and make sure there are no bugs or security holes in it ... I do think the whole "AI for coding" debate might need a bit of a rethink as to what 'strategies' are in fact most productive (smallest pains, biggest gains) ... keep an open mind!

Matthew Hou • Mar 2

Exactly — and the low-risk part is what makes it a no-brainer starting point for people who are still hesitant. You ask for advice, you evaluate it, you use it or you don't. No one's committing AI-generated code to main in that workflow. It's actually the safest possible way to get value from AI while building intuition for where it's reliable and where it falls apart. I've started calling it "advisory mode first, generation mode later" — and honestly, some tasks never graduate from advisory mode, and that's fine.

Hilton Fernandes • Feb 25

I think AI is useful for developing code in code bases one is not acquainted to. Due to its nature of learning from existing code, it usually brings fragments of code that are up to date with the new and updated version of API's and techniques. It's useful too for creating routine tasks that are already very well established -- that is, boiler plate code. It doesn't particularly excel in new tasks. In this case the generated code should be seen as prototype coding: it exposes problems and possible solutions, but it's code that's not ripe and must be used to inspire the writing of useful code.

Ingo Steinke, web developer • Feb 25

Adapting boilerplate code is fine and valid, like create-react-app, only more generic. Our industry shouldn't have needed expensive LLM models to do that though. Debugging? AI can understand Tailwind and TypeScript, but a legacy web project from 2016? No chance, unless it's just boilerplat from ten years ago.

Matthew Hou • Feb 25

"Prototype coding" is a great way to put it. That's pretty much how I treat AI output now — it's a first draft that shows me the shape of a solution, not the solution itself. Especially useful when you're working with an unfamiliar API and need to see what the integration surface looks like before committing to an approach.

The key shift for me was stopping to expect production-ready code and starting to expect "good enough to learn from." Once you adjust that expectation, the frustration drops significantly.

signalstack • Feb 25

The 'attention redistribution' framing is the right diagnosis. Generation got cheap. Verification didn't.

I run a few AI models in production — parallel workloads, different models handling different tasks. The pull is always toward more: more agents, more parallelism, more throughput. But the real constraint doesn't change: how much cognitive load does it take a human to audit what came out?

A setup with three models producing clean, auditable outputs beats ten models producing plausible-but-questionable ones. Every time. The overhead compounds.

The point about expertise interfering with AI output is underappreciated. When you already have a strong mental model, a confident-but-wrong suggestion doesn't just waste time — it has to be actively rejected. That rejection costs more than silence would have. For a junior dev with weak priors, AI fills gaps. For someone who already knows the answer, it often adds noise you have to fight through.

The Dark Factory direction is the honest conclusion. You don't eliminate the human verification cost. You push it earlier, into test design and spec writing. Which is basically just the old TDD argument wearing new clothes.

Matthew Hou • Feb 25

The cognitive load point is the one most people skip over. "Just add more agents" sounds great until you're spending more time reviewing outputs than you saved generating them. I've hit that wall — at some point you realize the bottleneck was never typing speed.

And yeah, the TDD parallel is real. Writing good specs and test cases upfront is basically the same discipline, just reframed for a world where the machine writes the first draft. The skill shifts from "can I write this" to "can I define what correct looks like before anything runs."

Ingo Steinke, web developer • Feb 25

Where's the "dopamine hit" when AI generates 200 lines of code that should have been 20, hides at least one subtle bug within and adds five paragraphs of text and a desperate call to action, and when you pinpoint the error it utters verbose excuses, fixes the error and adds to other ones. This is just bullshit making me even more disappointed and angry when fellow coworkers insits that AI makes them "more productive". Hope this study will open their eyes!

Matthew Hou • Feb 25

Ha, you're describing a very real pattern. The verbosity is genuinely one of the most annoying things — you ask for a 5-line fix and get 80 lines of refactored code plus an essay explaining why.

I think the frustration your coworkers cause is actually a separate problem from the tool itself. The tool has real limitations. But "AI makes me more productive" and "AI makes me feel more productive" can both be true for different tasks and different people. The METR data just makes it harder to hand-wave away the gap between perception and measurement.

Matthew Hou • Mar 2

The context switching tax is underrated. I tracked my own workflow for a week and realized I was spending ~15 minutes per session just re-establishing context after switching between tools — which model knows about this codebase, which one I already gave the architecture doc to, where did I leave off. That's not AI being slow, that's me managing AI being slow. Consolidating the interface helps, but honestly the bigger win was just picking one tool per task type and sticking with it instead of shopping around mid-flow.

Gass • Feb 25

Don't get trapped in the weeds. Use AI as an assistant, not for writing code. Every issue related to skills degrading is related to AI coding for them. If you are programmer, program you lazy bastard. It will give you all you need: understanding of the project, contexts, practice, speed at typing, mental gymnastics. In every discipline professionals need to practice to improve or maintain skills, so don't give that practice to the machine. Is simple really.

Matthew Hou • Feb 25

The skills degradation angle is underrated. I've caught myself reaching for AI on things I used to just... do. And every time I did, the understanding got a little shallower.

That said, I don't think it's all-or-nothing. There are parts of coding where the practice builds understanding (architecture decisions, debugging, core logic) and parts where it's just mechanical repetition (config files, boilerplate wiring). I'm trying to be more deliberate about which category something falls into before deciding whether to hand it off.

david duymelinck • Feb 25

I read, the next.js rebuild from Cloudflare yesterday. And the part that struck me is their way of working. They define small tasks and let AI work on those.
This is concrete example of the AI is good at doing small things line I'm hearing in presentations.

So I guess spec driven AI is out and issue driven AI is in. Like you would do if you had a team of developers.

Matthew Hou • Feb 25

That Cloudflare post is a great example. "Small well-defined tasks" is exactly where AI shines — it's basically the same conclusion the METR study points to, just from the other direction.

"Spec driven AI is out, issue driven AI is in" — I like that framing. Treat AI like a junior dev who's great at executing clearly scoped tickets but terrible at interpreting a vague spec. The better your issue description, the better the output. Which is, like you said, the same workflow you'd use with a human team.

cognix-dev • Feb 25

The "redistribution" framing is exactly the right diagnosis. But I'd argue it's a symptom of a design problem: most AI coding tools are optimized for generation speed, not for reducing the human verification cost that follows.
That's what we tried to address with Cognix. Instead of asking "how fast can we generate code?", the design question was "how much human attention does verifying this output require?" Multi-stage validation, quality gates before the code reaches you — the goal is minimizing the attention tax, not just moving it somewhere else.
If the bottleneck is always human verification, the tool should be designed around that bottleneck.

Matthew Hou • Feb 28

"How much human attention does verifying this output require" is a better design question than most AI tool companies are asking. The generation speed race feels like it's hitting diminishing returns — the bottleneck moved downstream months ago. I haven't tried Cognix yet but the framing is right. The tools that win long-term will be the ones that make review faster, not generation faster.

cognix-dev • Feb 28

Thanks for your reply. Your feedback has given me courage. I'll implement the approach to improve human review speed more carefully!

Waqas Rahman • Feb 26

Lacking the "mental model" of your code/project really slow downs any debugging, fixing, and more specifically the possibilities of adding new feature. AI will kept on adding more files/functions for a feature where you could have guided it to use one already defined because you yourself dont have clear idea of your code.

Matthew Hou • Feb 28

This is one of my biggest frustrations. AI doesn't know your codebase has a perfectly good utility for exactly the thing it's about to reimplement from scratch. I've started including a "reuse these existing modules" section in my prompts, basically a mini architecture guide for the AI. It helps, but it's another thing you have to maintain. The dream is a tool that understands your codebase well enough to do this automatically.

Vasu Ghanta • Feb 25

Insightful take on the METR study—eye-opening how developers perceived a 24% speed boost but measured 19% slower.
Prioritizing attention on verification over raw output makes total sense for real productivity.

Matthew Hou • Feb 25

Thanks! That perception gap was the thing that stuck with me too. 24% faster in your head, 19% slower on the clock — it's a pretty humbling data point. Makes you wonder how many other "productivity gains" are just vibes.

Gábor Mészáros • Feb 24

it's not that big of a surprise really.
we are yet to formalize how to use this tool

Matthew Hou • Mar 1

You nailed it — we're still in the "figuring out how to hold the tool" phase. What I keep seeing is that the developers who get the most out of AI coding tools are the ones who've invested time in structuring their projects for AI, not the ones chasing better prompts. Things like explicit module boundaries, clear interface contracts, comprehensive test suites. The tool itself matters less than whether your codebase is designed to be navigated by something that can't hold the full picture in its head at once.

Artem Koltunov • May 15

Your workflow shift framing — from "think-write-test-debug" to "describe-review-verify-debug your understanding" — is the clearest description of what happened to our team. We ran three experiments with Copilot Chat and Cursor on production SDK code. Cursor integration finished in 20h vs 40h estimate — a textbook 2x win. But during review we found the AI had built a complete fetch-download-reupload chain for an image when the ID was already in the response object. The developer felt fast. The code had hidden waste. That's the perception gap in action: the "verify" and "debug your understanding" steps ate more time than the generation saved. Simon Willison's quote hits hard — even people shipping 80+ AI-assisted tools admit they've lost their mental model. What I find interesting is Kent Beck's distinction you mention: our 25-40% sustainable gains came specifically from the "augmented coding" side, never from vibe coding. The moment we relaxed discipline, the gap widened.

Harjot Singh • May 30

The METR perceived-vs-actual gap is the most important AI-coding result nobody internalizes. The 19%-slower finding makes sense once you account for the hidden tax: reviewing output you didn't write, re-prompting, and chasing confidently-wrong code. The dopamine of fast generation masks the time lost verifying it.

The nuance the headline flattens: it averages across task types. On the well-trodden boilerplate-y 80% the speedup is real; on novel/complex work the review-and-correct tax flips it negative. Which is the actual lesson - route the tool to where it's genuinely faster and stop forcing it onto the work where you're net slower. Measuring instead of vibing is the whole point. Great breakdown of a study more people should sit with.

MaxxMini • Feb 24

Point #2 really resonates. I've been building a finance app (React + IndexedDB, zero backend) and the single decision that saved us the most time was choosing to eliminate the backend entirely.

Not because AI couldn't generate API endpoints - it could, easily. But every generated endpoint was another thing to verify: auth, validation, error handling, edge cases. By keeping everything client-side, the verification surface area shrank dramatically.

The interesting paradox: constraining the architecture before touching AI made the AI-assisted parts faster, not slower. Less code to review = less cognitive load = the METR gap narrows.

Your Kent Beck distinction between "augmented coding" vs "vibe coding" maps perfectly to this. When I know the architecture constraints upfront, AI output is predictable. When I let AI suggest the architecture... that's where the 19% slowdown lives.

Matthew Hou • Feb 25

Constraining the architecture before touching AI made the AI-assisted parts faster" — this is one of the cleanest examples I've seen of the principle in practice. Fewer moving parts = smaller verification surface = AI actually helps instead of creating work. The React + IndexedDB choice is a great case study. Every API endpoint you didn't write is also an endpoint you didn't have to verify, debug, and maintain. That's the math people miss when they say "but AI can generate backends in seconds.

klement Gunndu • Feb 25

The attention redistribution framing is sharp -- reviewing code you did not write is harder than writing code you understand captures it perfectly. Curious if the perception gap narrows with stricter pre-prompting like you describe.

Matthew Hou • Feb 25

That's the question I'm still working through honestly. My instinct says stricter pre-prompting narrows the gap but doesn't close it — because the hardest part of review isn't catching syntax or logic errors, it's verifying intent. You can constrain the output format, but you can't fully pre-prompt "does this actually solve the right problem." That still requires human judgment.

Dejan • Mar 2

Yes, you have a point. But the key is to make sure that the AI doesn't dominate the human. While the AI can help with the overall design and architecture and the rest of the steps, I think the real speed is given to the AI by giving it the prompts that drive the idea and its operation, defining its role, continuous situational awareness, periodic logs, and literally giving it the actions that a human would do when doing something. Otherwise, the AI will go one way on a large project today and the other way tomorrow, and as one developer said, "The AI wrote a project in a month, but I revised it in a year."

Matthew Hou • Mar 2

"The AI wrote a project in a month, but I revised it in a year" — that's going to age really well as a quote. It captures something a lot of teams are learning the hard way right now.

Your point about giving AI specific prompts that define its role and actions is key. The more you treat it like a tool with clear boundaries, the better it performs. The moment you hand it vague direction and hope for the best, you're setting yourself up for exactly that revision cycle you're describing.

Riccardo Bernardini • Feb 26

Could you add a link to that study? I would like to read it. Thanks.

Steve Pryde • Feb 28

Study was not last month. It's from mid 2025 using tools from early 2025
metr.org/blog/2025-07-10-early-202...

Riccardo Bernardini • Feb 28

Thank you!

Steve Pryde • Feb 28

Study was not last month. It's from mid 2025 using tools from early 2025
metr.org/blog/2025-07-10-early-202...

Feel like some things might have changed since then.

Matthew Hou • Feb 28

Good catch on the timeline — you're right, and I should've been clearer about that. Tools have moved fast since early 2025. My gut says the core finding (verification is the bottleneck, not generation) still holds, but the magnitude has probably shifted. Would love to see an updated study with current-gen tools.

Cynthia Shen • Feb 25

Superhelpful！