Rafael Costa

Posted on Mar 25 • Edited on Jun 26

The Mental Model Problem of AI-Generated Code

#ai #agents #productivity #software

The Mental Model Problem: Why AI-Generated Code Is More Expensive Than It Looks

In physics, you never trust a result just because the math produced it.

You take the output and attack it, check limiting cases — does the equation reduce to something known when you push a parameter to zero or infinity? You plug in extreme values, look for dimensional inconsistencies, and compare it against independent derivations. The computation is merely a tool; the verification is the methodology.
Then, if and only if you can't break the result, you can start to believe it. And it's win-win because, even if you do break it, that means you learned something specific about where the original reasoning went wrong — which is, sometimes, equally or more valuable than the result itself.

I trained as a physicist — years of condensed-matter theory, all the way through a PhD. Now I build and ship software products. The career changed; the verification instinct didn't. And somewhere along the way, I noticed that the discipline that's second nature in physics is almost perfectly inverted in how most developers use AI coding tools.

Much of the industry is converging on one workflow: AI generates code, you review it (as a matter of fact, even the review process is often automated). And the response to the quality problems this creates is to bolt guardrails on top — better review tools, AI-on-AI review chains, automated quality gates. All of that addresses a real problem. But it's addressing it from the wrong end.

There's a different workflow that I've found consistently more effective for nontrivial work, and far fewer people center it as their default:

You write the code. AI tries to break it.

This isn't about who types the first draft, but a cognitive fact: for nontrivial work, AI is often more useful as a critic than as a first author. And once you internalize that, your entire workflow changes.

Why AI Often Works Better as Critic Than First Author

When an AI model generates code, it's predicting plausible token sequences given your prompt. It doesn't have intent. Hell, it frequently doesn't even know your system's history, nor does it understand why you picked one data structure over another six months ago, or what edge case took your team a week to discover. It produces something that looks like a solution. Sometimes it is one, but often it's a sophisticated guess that drifts from your constraints in ways that are expensive to find.

When the same model critiques code, the dynamic is fundamentally different. You've given it a concrete artifact to reason about. Now, it can trace logic paths, check boundary conditions, ask "what happens if this input is null, or negative, or enormous?" and even compare your implementation against known patterns and spot deviations. What's central is this: critique is a constrained task — the model is operating within the boundaries of something that already exists. Generation is more-or-less an unconstrained task — the model is making architectural decisions it can have no basis for.

This isn't just a practical observation. There's a cognitive mechanism underneath it that explains why the difference is so large.

The Mental Model Problem

In nontrivial systems, the most expensive bottleneck is usually the mental model someone holds of how the system works.

When you write code yourself — even rough, incomplete, first-draft code — you're building that mental model as you go. Every decision, even the ones you make quickly, leaves a trace in your understanding. You know why the function is structured this way. You know which constraints you're encoding and which you're deferring. You know where you cut corners and where you were careful.

When AI generates code and you review it, nobody holds the mental model. The AI never had one in the first place. And you're trying to reconstruct that by reading the output — reverse-engineering intent from an artifact that was produced without any. This is possible for trivial code. For nontrivial systems, it's where time goes to die. Even just-a-little seasoned developers know how expensive this is. It's why code reviews are so much more draining than pair programming — in the latter, the mental model is shared in real time; in review, particularly of code you're somehow "far away from," it has to be reconstructed from the artifact alone.

I think of it as the difference between navigating a city you've walked through and navigating a city from a map someone else drew. Both get you places. But when something unexpected happens — a road closure, a detour, a constraint that wasn't on the map — the person who walked the city knows six alternatives. The person with someone else's map is lost.

This is why AI-generated code that "works" can be more dangerous than AI-generated code that breaks. Broken code surfaces the gap immediately - at least it should. Working code that you don't fully understand creates what I call orphaned architecture — a system with no mental model owner. A couple of months later, when something downstream fails, you'll debug a design whose rationale exists only in a conversation history you've long since closed.

What "You Generate" Actually Means

I don't mean "you handwrite every line."

I mean you author the first intent-bearing artifact. This is the thing that gives the system its center of gravity before AI starts expanding it.

That might be the domain model, the core function, whatever invariants must hold, or a couple of key test cases that define correct behavior. It could be the state machine, the architectural skeleton, or a short ADR explaining which tradeoff you're accepting and why.

I'm not coming from a manual purity perspective. The crucial detail is that someone — you — has made the decisions that carry judgment, and those decisions exist in a form the model can now reason about. Once that exists, AI becomes dramatically more powerful, because it's critiquing your structure instead of silently inventing one.

False Velocity and the Missing Mental Model

The evidence is piling up, and it's converging on a pattern that should worry anyone paying attention.

A CMU study accepted at MSR '26, analyzing 807 Cursor-adopting repositories against matched controls, found that velocity gains were real but transient — they faded within months — while code complexity increases were persistent, creating a self-reinforcing debt cycle. An IEEE Spectrum piece from January documented something worse: newer models producing code that doesn't crash but silently fails to do what was intended — avoiding errors by removing safety checks or generating fake output that matches the expected format. And METR's own follow-up revealed that they had to redesign the study because developers increasingly refused to participate if it meant working without AI on half their tasks. The tool that makes you slower has become the tool you can't imagine working without.

The industry's reaction is reasonable: add more review layers. AI reviewers reviewing AI-generated code, quality gates, automated scanning.

This addresses but a symptom of the disease: nobody holds the mental model.

When AI generates code and a human reviews it, the human is doing the most cognitively expensive possible version of review: building a mental model from scratch by reading someone else's output. There's no reasoning to reconstruct. There's no intent to discover. There's just an artifact that looks plausible, and you have to determine whether plausible is correct.

When AI generates code and another AI reviews it, you may catch surface defects — style violations, common security patterns, obvious bugs. But you still haven't solved the real problem: nobody owns the reasoning that gave the system its shape. That's fine for boilerplate, but potentially endgame for code that encodes judgment.

"But My AI Has Full Repo Context Now"

The obvious counterargument: tools have gotten better. Cursor indexes your repo. Claude Code reads your file tree and does the beautiful /init thing. You can inject conventions via agents.md or .cursorrules. Copilot has repo-wide context. Some teams — mine included — have experimented with architectures where a large-context model ingests the entire codebase and compresses it for downstream agents. If AI can see your system, doesn't the mental model problem go away?

That narrows the issue, but doesn't quite close it.

Context-aware tools can see what your code looks like. They can match conventions, follow existing patterns, stay stylistically consistent. That's a real upgrade over a blank-slate chat prompt, and I'm not pretending otherwise. Generated code from a context-aware tool is substantially better than what's produced with a model that's never seen your repo.

But context is not intent. The tool can see that you use a specific pattern for error handling across your codebase. What it can't see is whether that pattern actually represents a deliberate architectural choice or legacy debt you haven't cleaned up yet. It can see your data model, but not which constraints are load-bearing and which are accidental — which fields exist because of a product decision or due to a migration you never finished. It can see what you decided. It can't see what you considered and rejected, which is often the more important half of understanding a system. Context windows capture artifacts, not decision trees.

And here's the part that actually strengthens the inverted workflow: context-aware AI is an even better critic than it is a generator. A model that can see your full codebase, your conventions, your patterns — and then reviews your new code against all of that — catches things a context-free critic never would. "This function doesn't follow the error handling pattern you use everywhere else." "This data flow is inconsistent with how the rest of the system handles state." "Your naming here deviates from the convention in these twelve other files."

Context makes AI-as-critic dramatically more powerful. It makes AI-as-generator incrementally better. That asymmetry is exactly the point.

When AI-First Generation Is the Right Call

I want to be precise about the boundary, because overselling the inverted workflow would be exactly the kind of false clarity I'm arguing against.

AI generation is the right default when the mental model doesn't need an owner, or the path is so well-trodden that the decisions are obvious. Config files, project scaffolding, CORS setup, CI pipeline boilerplate — nobody needs to deeply understand why the YAML looks the way it does. The code doesn't encode intent; it encodes convention. Let the model handle that.

AI generation is also great as a research tool: "show me three different approaches to X" is not asking the model to build your system. It's asking it to widen your field of view before you make a decision. Same with translation ("rewrite this Python function in Go") — intent is fully specified; the generation is mechanical.

The workflow flips when the code should incarnate your actual product decisions. Core logic, business rules, architectural boundaries. Anything where the reason for a design choice is as important as the choice itself. Anything where, if someone asked you "why is it structured this way?", the answer matters.

The Workflow

Here's the concrete version, if you want to try it on one real feature:

Write the skeleton. Not the whole feature — just the parts that carry intent. Module boundaries, data model, core function, invariants, key test cases. Don't optimize for completeness. Optimize for decisions. Every line should reflect a choice you made for a reason you could articulate if pressed. There's no problem with using AI to refine this, brainstorm alternatives, or even generate a thoroughly guided first draft — as long as you understand that the mental model is yours, and the AI is just a tool to help you build it.

Have AI attack it. Not "review this" — that's too passive. Ask for adversarial input: "What inputs would break this? What assumption am I making that might not hold? Write tests that target the riskiest parts of this design. Argue against my architectural choice — under what conditions is it the wrong call? Am I overlooking any established patterns that would solve this more robustly?" The goal is to find the holes in your design, not just surface-level defects.

Fix what the critique reveals. Because you designed the system, you'll know exactly where each fix goes. No reverse-engineering required. This is where the convergence advantage is most obvious.

Then let AI expand. Once the core is solid and yours, hand AI the periphery: documentation, error messages, logging, additional test cases, boilerplate around the edges. This code is easy to verify because you have a clear architectural spine to compare it against.

The first time you try this, it'll feel slower. You'll miss the rush of watching code appear.
Give it one full feature cycle, though.
Then compare not just time, but confidence in what you shipped and speed of the next change in that module.

The Discipline

Every conversation about AI coding eventually arrives at the same question: how much can AI do?

I think the more useful question is: where is the mental model, and who owns it?

For boilerplate, nobody needs to own the mental model. Let AI generate. For the core of your system — the logic that encodes why your product exists — the mental model is the most valuable artifact you produce. More valuable than the code itself, because code can be rewritten but understanding can't be downloaded that easily.

Physics taught me this before software did. You don't trust a result because the computation produced it. You trust it because you attacked it and it survived. The computation is cheap. The verification is where understanding lives.

The question is not how much code AI can write. The question is whether your workflow preserves a human owner of the system's mental model.

Write the structure, let AI break that, and even use it to explore alternatives or cut corners... but understand the core of what you deliver.
The mental model is the most expensive thing in your system. Don't let it become an orphan.

DEV Community