AI Over-Reliance in Software Engineering: Signs, Risks, and How to Measure It

#meta #blogging #webdev

You paste a stack trace into Claude, get a fix, ship it. The next day you do it again. Three months later an outage hits and you notice you can't reason about the failure without a chatbot open in another tab. That isn't laziness. It's a gradual shift in how you work, and the hard part is that you are the worst-positioned person to notice it in yourself.

This is a piece about that shift: what AI over-reliance looks like in practice, why it resists self-detection, and how to put a number on it before it costs you in a high-stakes moment.

What over-reliance actually looks like

Using an LLM is not the problem. Over-reliance is the quiet removal of a step you used to perform yourself. A few patterns worth watching for in your own week:

You open a prompt before you have read the error message.
You accept generated code you could not have written unaided — and could not debug if it failed at 2 a.m.
You stop forming your own hypothesis. The model's first suggestion becomes the hypothesis by default.
You have lost your feel for whether a task is hard, because the internal gauge that used to estimate difficulty has been outsourced.

Not every shortcut is decay. Generating a migration file, scaffolding a test, or recalling an API signature you have written a hundred times is just leverage. The signal that matters is narrower: could you still do the thing if the tool went away for a day? If the honest answer has quietly moved from "yes, slower" to "not really," that is the pattern this article is about.

The risk is the feedback loop, not the tool

Over-reliance is dangerous because AI assistance feels like productivity even in the cases where it delivers none.

A 2025 randomized controlled trial from METR put numbers on that gap. Experienced open-source developers working in codebases they knew well took 19% longer to finish tasks when they used AI tools — yet they expected a speedup going in, and still believed they had been faster afterward. Their perception ran in the opposite direction of the measurement.

That gap is the whole problem. If your own sense of "this is going well" can be wrong by that margin on a controlled task, you cannot self-report your way to an accurate picture of your dependency. You need a signal from outside your own head.

One term worth handling carefully here is "AI psychosis." It is a loose, non-clinical phrase, used mostly for heavy chatbot users developing distorted beliefs through long unbroken conversations. For engineers the relevant concern is narrower and far less dramatic: skill atrophy and eroded judgment. Don't pathologize a normal workflow — but do take the judgment erosion seriously, because it is real and it can be measured.

Skill atrophy is invisible by design. The tool covers for the gap every working day, so the deficit never surfaces in normal work — only in the moment the tool is wrong, unavailable, or out of its depth. By then it is an incident, not a code review comment.

How to measure your dependency

You cannot manage what you cannot see, and "I feel fine" is exactly the reading the METR result tells you to distrust. A few ways to get a real signal:

Periodic self-assessment. Structured check-ins beat vibes. The Atrophy iOS app is one example built specifically around this question — a recurring self-assessment aimed at surfacing dependency patterns rather than a one-time quiz. Treat any tool like this as a prompt to reflect, not a verdict; a scheduled prompt still beats waiting for an outage to grade you.

An AI-free day. Pick one working day a month and turn the assistants off. Track the friction. Mild friction is healthy. If routine work becomes genuinely difficult — not slower, difficult — that is your measurement.

The explain-it-back test. After you accept generated code, close the chat and explain the change in your own words, out loud or in a comment. Code you cannot explain is code you are renting, not code you own.

The accept-without-reading ratio. For one week, mark every AI suggestion you accepted without reading it line by line. A rising ratio is the most honest leading indicator you have.

Staying sharp without putting the tools down

The goal is not abstinence. It is keeping AI as leverage instead of letting it become the load-bearing wall. Habits that hold that line:

Read before you prompt. Give yourself a fixed window — five minutes is plenty — on the raw error and the relevant code before you ask anything.
Write the hypothesis first. One sentence on what you think is wrong, written down before the model speaks. Then let the AI confirm or challenge it.
Use AI to interrogate, not to answer. Ask it to find holes in your approach rather than hand you one. You stay the author of the decision.
Keep a decision journal. Record the technical calls you make and the reasoning behind them, in your own words. Articulating a decision is the part of the work AI most quietly absorbs, and writing it down forces the thinking back into your hands.

A decision journal only helps if it lives somewhere you will actually revisit. A single searchable workspace — separate from the codebase, holding your prompts, the options you rejected, and the "why" behind each call — turns scattered notes into a record you can audit later.

None of this asks you to be a purist. It asks you to stay the engineer in the loop — the one who still forms the hypothesis, still owns the decision, and could still do the work if the tab closed.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.