In my post a month ago, I referenced a couple of studies on AI-assisted coding productivity - Anthropic's skill formation research and the METR developer speed study. I used them to make a broader point and moved on fairly quickly.
A month later, both studies have follow-ups. The picture's shifted.
The original finding
Quick recap if you missed it. METR ran a study in early 2025 - 16 experienced open-source developers, 246 tasks, each averaging about two hours. All screen-recorded. The developers could use whatever AI tools they wanted, most going with Cursor Pro and Claude Sonnet.
The result: developers worked 19% slower with AI assistance, while believing they were 20% faster.
A 43-point perception gap between what happened and what people thought happened. Hmm.
The time saved generating code was getting eaten by context switching, verifying AI suggestions, and integrating outputs with existing codebases. The AI was fast at producing code - everything around that code was slower.
The update nobody shared
In February 2026, METR published a follow-up. An expanded study - 57 developers now, over 800 tasks, across 143 repositories. The headline numbers looked similar on the surface: -18% for returning developers, -4% for new ones.
But here's where it gets interesting. The researchers themselves think those numbers are probably wrong.
Between 30% and 50% of developers told METR they were choosing not to submit tasks they didn't want to do without AI. Let that land for a second. Developers were actively filtering out the tasks where AI would help most, which means the study was systematically missing the highest-uplift work.
Not just that though. METR struggled to even recruit participants, because developers increasingly refused to work without AI access at all - even for paid research. The people most bullish on AI's value were self-selecting out of the study entirely.
METR's own conclusion: developers are likely more sped up from AI tools now than their early-2025 estimates suggest. But their data is - their words - "only very weak evidence for the size of this increase." They're pivoting to six alternative research methodologies to try to get cleaner signal.
So the original "19% slower" headline was real, but incomplete. The updated picture is messier and more honest - which, in research, usually means closer to the truth.
It's not just about speed
While METR was wrestling with measurement, Anthropic published something that I think matters more.
They ran a controlled trial with 52 junior developers learning Python's Trio library - an async programming library none of them had used before. Half got AI assistance, half didn't.
The AI group scored 17% lower on comprehension assessments. 50% average versus 67%. The biggest gaps showed up in debugging - understanding when and why code is incorrect.
That alone is worth sitting with. But the more useful finding was buried in the interaction patterns.
Developers who delegated code generation entirely to AI - "write this function for me" - scored below 40%. Developers who used AI for conceptual inquiry - "why does this work this way?", "what's the difference between these approaches?" - scored 65% or higher.
Same tool. Same study. Dramatically different outcomes based on how people used it.
The researchers put it pretty bluntly: "cognitive effort - and even getting painfully stuck - is important for fostering mastery." Not a comfortable finding for anyone selling AI as a shortcut to competence. But an incredibly useful one for anyone trying to use these tools well.
The mode you're in matters
This is the bit I keep coming back to. The research isn't saying "AI bad" or "AI good." It's saying the way you engage with it determines what you get out of it.
Delegation mode - generate this, fix that, write the test - saves time on tasks you already understand. Comprehension mode - explain this, why would I choose this pattern, what am I missing - builds the understanding that makes you better at the delegation later.
The developers who scored highest weren't avoiding AI. They were using it differently. Asking it to explain rather than just produce. Generating code and then interrogating it - "walk me through what this does and why" - rather than shipping it straight into the codebase.
It's the difference between using a calculator because you understand the maths and using one because you don't. (For reference, I did an art degree, so please don't ask me to back that statement up).
Where this is heading
If the trajectory is engineers moving from hands-on coding toward supervision, architecture, and review - agents handling multi-hour tasks end-to-end, the role shifting rather than disappearing - then raw coding speed becomes less important over time. What becomes more important is the stuff that's harder to measure: judgment, systems thinking, knowing when something's wrong before you can articulate why.
The perception gap isn't just about speed. It's about what we think we're getting from these tools versus what we're actually developing. And that question's only going to get more relevant.
More on that next time.
Top comments (0)