DEV Community

pickuma
pickuma

Posted on • Originally published at pickuma.com

Why Some Developers Still Refuse AI Coding Assistants in 2026

By now the narrative around AI coding tools is almost entirely written by adopters. Productivity numbers — some measured, many not — circulate constantly. Conference talks describe the 10× developer. The default assumption inside most engineering orgs is that if you're not using Copilot, Cursor, or something equivalent, you're leaving performance on the table and you'll eventually be forced to catch up.

That assumption deserves pressure. A meaningful fraction of experienced engineers — senior ICs, architects, principal engineers with decades of production code behind them — have tried these tools and walked away. Some never tried them at all and have articulate reasons why. These aren't Luddites. They're the people whose judgment your organization trusts for its hardest problems. Treating their skepticism as irrational makes it harder to understand the real tradeoffs at stake.

The Skill Atrophy Case Is Not Hypothetical

The most structurally interesting objection isn't about the tools themselves — it's about what extended use does to you.

Anthropic published research in early 2026 studying how AI assistance affects the formation of coding skills. The result was direct: participants who used AI assistance during a programming task scored 17% lower on a subsequent knowledge assessment than those who coded by hand — roughly equivalent to the gap between a B+ and a C. The sharpest degradation showed up in debugging questions, which makes sense: debugging requires maintaining a mental model of what the code is actually doing, and if a model is generating the code, that model-building step atrophies.

The researchers also found that how you use the tool determines almost everything. Developers who asked follow-up questions, requested explanations, and used AI as a comprehension scaffold rather than a code dispenser performed substantially better than those who delegated the thinking entirely. But here's the honest version of that finding: the tool's default affordance is delegation. The path of least resistance — accept the suggestion, move on — is the path that degrades your skills fastest. You have to actively resist the tool's intended use to get the learning benefit.

For a junior developer still building mental models of data structures, concurrency, and failure modes, this is a genuine concern. For a senior engineer who already has those models locked in, the calculus is different. Which is precisely why the opt-out crowd skews heavily toward experience.

The Review Burden That Doesn't Show Up in Productivity Charts

Here's a finding that gets less attention than the headline productivity numbers. A 2025 paper at arXiv (abs/2510.10165) tracked what happened to core developers — the experienced engineers who own the hard parts of a codebase — after their teams adopted Copilot. Peripheral contributors became measurably more productive. Core developers reviewed 6.5% more code and saw their own output drop 19%.

The mechanism is intuitive once you see it. AI tools lower the cost of generating code faster than they lower the cost of understanding and verifying code. A junior engineer who might have spent two hours writing a feature now produces it in 30 minutes — but the output needs to be reviewed by someone who can catch subtle regressions, security issues, and architectural drift. That someone is the senior engineer. So the senior engineer's load increases even as the team's raw output rises.

This shows up in a separate data point from a 2025 research study cited by MIT Technology Review: teams with high AI adoption merged 98% more pull requests but saw review time increase by 91% and bug counts rise 9%. Organizational DORA metrics — the ones that actually track delivery quality — stayed flat. The coding step got faster. The integration and verification step got slower.

If you're the person carrying the review burden, "AI makes the team more productive" isn't wrong exactly — it's just describing someone else's experience.

The 2025 Stack Overflow Developer Survey found that 46% of developers actively distrust the accuracy of AI tool output, and only 29% reported trusting it — an 11-point drop from 2024. Among the top reasons developers said they'd still consult a human in a future with capable AI: "when I don't trust AI's answers" (75%). Widespread adoption and widespread trust are not the same thing.

Licensing and Provenance: Still Unresolved

The copyright status of AI-generated code remains genuinely unclear, and experienced developers working in regulated industries or building software with significant IP exposure are right to pay attention.

GitHub acknowledges that a small fraction of Copilot output may be reproduced verbatim from training data. Whether that's legally actionable is a question no appellate court has yet resolved. Under most copyright frameworks, only humans can hold copyright, which means AI-generated code may be unownable — but if it's substantially similar to a licensed original, the original author might still have a claim. The middle ground between "clearly original" and "clearly infringing" is wide and legally unexplored.

For developers building proprietary software at companies that take IP risk seriously, or for open-source maintainers who care about license compatibility, this ambiguity is a concrete operational problem. It's not paranoia. It's due diligence.

There's a related concern that doesn't get a legal name but matters practically: provenance. If you accept a Copilot suggestion and it's wrong in a subtle way, tracing why it's wrong — what pattern in what training data produced this output — is essentially impossible. With your own code or code you can link to a known source, you can reconstruct the reasoning. With AI-generated code, you can't. For systems where failure modes matter (financial, medical, infrastructure), that auditability gap is real.

Flow Disruption Is Context-Dependent, Not Universal

The productivity case for AI coding tools often rests on removing friction: autocomplete extends what you can accomplish without breaking concentration. The productivity case against them makes the opposite claim: the tools introduce a different kind of friction that interrupts the deeper, harder cognitive work.

Research on developer flow finds that optimal programming performance requires sustained attention without the kind of cognitive mode-switching that prompting imposes. A 2025 study found that 68.81% of AI model recommendations disrupt a developer's ongoing mental flow — including 8.83% of suggestions that were technically correct but poorly timed. You're three levels deep in a recursive problem, holding the full call stack in working memory, and the autocomplete surface serves you something plausible but wrong for your context. Now you're evaluating the suggestion, rejecting it, and trying to recover the mental state you were in. The interruption cost isn't zero.

This is highly individual and task-dependent. For greenfield work — writing tests, scaffolding CRUD endpoints, generating boilerplate — the tool's suggestions are close enough to what you'd write that acceptance is low-cost. For the kind of work that defines senior engineering — performance-critical code, subtle concurrency, novel architectural patterns, debugging production incidents — the suggestions are less accurate precisely because the problems are less common in training data, and the cost of a misfire is higher.

A METR study cited by MIT Technology Review measured this gap directly: experienced developers believed they were 20% faster with AI tools, but objective timing showed they were 19% slower. The subjective experience of productivity is real. The measurement of it is different.

Where Each Side Is Right

The refusers are right about specific things. Skill atrophy is a real risk if you use these tools passively, and the research now says so explicitly. The review burden on experienced engineers is a real cost that productivity statistics tend to obscure. Licensing provenance is a genuine unresolved concern for anyone building commercially significant software. And for tasks requiring deep, sustained concentration, the interruption dynamic is a legitimate performance consideration, not a preference.

The adopters are also right about specific things. For large classes of common programming tasks — especially tasks where the problem is well-defined, the codebase is well-structured, and the output is easy to verify — AI tools reduce friction and move work faster. For developers who use them deliberately, asking for explanations and maintaining comprehension alongside generation, the productivity gains are real. For early-career engineers working on well-understood problems, the acceleration is substantial.

The honest synthesis isn't that one camp is wrong. It's that the tools have uneven value across task types, experience levels, and usage patterns, and that the industry's marketing materials are written for the best-case scenario. The experienced developer who declines to use Copilot for core logic while using it for tests and boilerplate is not being irrational. They're applying the same analytical rigor to their tools that they apply to everything else.

The more interesting question for 2026 is what happens as the cohort of developers who learned primarily through AI assistance reaches senior roles. The Anthropic skill research suggests that how you build your mental models early determines how robust they are later. If the default usage pattern is high cognitive offloading, the population of engineers capable of doing the review work — the work that currently catches what AI generates — may shrink. That's a structural concern worth tracking, and it's one the refusers have been pointing at for two years.


Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.

Top comments (0)