Paulo Victor Leite Lima Gomes

Posted on May 21

cognitive debt is the ai code smell nobody wants to measure

#ai #productivity #softwareengineering #career

Thoughtworks called out "cognitive debt" in the latest Technology Radar, and I think that phrase is going to age annoyingly well.

Not because it is a cute new label. We have enough cute labels.

Because it names the thing many teams are quietly feeling with AI-assisted development: the codebase is growing faster than the team's understanding of it.

That is the uncomfortable part.

AI tools can make a team ship more code. Sometimes much more code. They can draft tests, fill in boilerplate, wire APIs, translate old modules, generate migration plans, and do the boring first pass that nobody wanted to do at 5 PM.

I use these tools. I like these tools.

But there is a version of AI-assisted development where the repo becomes full of technically working code that fewer humans can explain. That is not productivity. That is moving the bottleneck from typing to comprehension.

And comprehension is where software actually lives.

technical debt was never only about messy code

When people talk about technical debt, they usually point at visible ugliness: the weird helper function, the endpoint with seven flags, the service that needs three deploys to change one behavior.

That debt is real.

But the more expensive debt is cognitive: the gap between what the system does and what the team can reason about.

You feel it when a simple change requires three senior engineers in a meeting because nobody trusts the local code path. You feel it when everyone agrees a module is important, but nobody wants to touch it because "the last person who understood it left."

AI can create that situation faster.

Not because generated code is always bad. Sometimes it is clean, boring, and useful.

The risk is that AI can produce code at a speed that outruns the team's ability to build a mental model. You get more files, abstractions, edge cases, tests, integration points, and confidence-shaped text around all of it.

The repo looks healthier than the team feels.

That gap is cognitive debt.

the dangerous phrase is "it works"

"It works" is a useful sentence during a spike.

It is a dangerous sentence during a review.

When an AI assistant generates a change, the first temptation is to verify the outcome and move on. The test passes. The API returns the right shape. The deployment is green. Nice.

But software teams do not maintain outcomes in isolation. They maintain decisions.

Why is this cache invalidated here? Why does this retry happen before the transaction boundary? Why is this field optional in the API but required in the database? Why did we choose this migration path instead of the obvious one?

If nobody knows, the team has accepted code without accepting ownership of the reasoning.

AI-generated code can pass review while still weakening the system, because code review often checks correctness more than transfer of understanding. We look for bugs, style issues, security risks, and test coverage. Those matter. But we also need to ask whether the reviewer can explain the change after the tool is gone.

If the answer is no, the team did not really review it. They supervised it.

There is a difference.

explanations are not documentation

One easy answer is: "Ask the AI to explain the code."

Yes. Do that.

But do not confuse explanation with durable understanding.

Generated explanations are useful as a starting point. They are not a substitute for the team deciding what it believes.

The model can explain the code it just wrote in a way that sounds coherent. That does not mean the architecture is good, the tradeoff was intentional, or the explanation captures the actual production constraint.

The useful artifact is not "the AI explained it."

The useful artifact is a human-owned decision.

That can be a short PR note, an ADR, a test name that encodes the business rule, or a comment near a genuinely non-obvious boundary. It does not need to become a bureaucracy museum.

But some decision needs to survive.

Otherwise the codebase slowly fills with orphaned choices.

tests should encode intent, not just coverage

AI is pretty good at generating tests that increase the number.

That is not the same thing as increasing confidence.

The tests I want in an AI-heavy codebase are the ones that make intent harder to lose. A test called returns_400_for_invalid_input is fine, but not very rich. A test that says does_not_recalculate_settled_interest_after_statement_close carries a business rule.

That matters because cognitive debt often appears when the code still works locally but the meaning has drifted.

Generated tests can lock in implementation details nobody cares about and miss the weird domain rule everyone assumes is obvious until the new code violates it.

The job is to ask: what would a future maintainer need to know about this behavior to change it safely?

Then write that down as executable pressure.

review the prompt-shaped diff

One practical habit I like is reviewing AI-assisted code as a prompt-shaped diff.

Do not only ask "is this diff correct?"

Ask:

What instruction probably produced this shape?
Did the tool optimize for speed, symmetry, generic best practice, or actual system constraints?
Did it introduce an abstraction because the problem needed one, or because generated code loves tidy patterns?
Did it preserve the naming and architecture of the existing codebase?
Did it explain uncertainty, or only present the final answer?

This is not about being suspicious for fun. AI tools have a style: they smooth over awkward local history, generalize, create reasonable-looking helpers, and sound confident about code paths they have not lived with.

That can be useful.

It can also sand away the weird but important parts of your system.

Good reviewers protect those parts.

the career angle is obvious

The durable engineering skill in the AI era is preserving understanding while output increases.

That sounds less exciting than "10x developer with agents," but it is much closer to the job. Senior engineers know which constraints are real, which shortcuts are acceptable, and which beautiful refactor will make next quarter miserable.

AI makes that judgment more important, not less.

If your value was mostly producing boilerplate, the tools are coming for that. If your value is understanding systems deeply enough to change them safely, the tools can make you more powerful.

But only if you refuse to become a passive merge button.

The best engineers I know use AI like a fast junior engineer with no production memory. Helpful, occasionally brilliant, and not someone you let redefine the architecture alone.

what i would measure

If a team is serious about avoiding cognitive debt, I would not start with a giant policy.

I would start with a few boring signals:

How often do reviewers ask for rationale, not just code changes?
How many AI-assisted PRs include the tradeoff in the description?
How often do generated tests encode business rules instead of implementation details?
How many modules have only one person who can safely explain them?
How often does a team revert code because nobody understood the edge case?
How often does an incident reveal that a "small generated change" crossed a hidden boundary?

None of this is perfect. Measurement can get silly quickly. But the absence of measurement means the only thing you see is output: PR count, line count, ticket throughput, cycle time.

Those metrics can all improve while comprehension gets worse.

the practical rule

My rule is simple:

AI can write the first draft, but the team must own the final mental model.

That means generated code should come with enough human-shaped context to maintain it. Why this design. What invariant matters. What should not be "cleaned up" later. What test captures the intent. What rollback exists if the tool was confidently wrong.

This does not need to slow everything down. Often, it is a few PR sentences and one better test name.

But it changes the posture.

You are no longer asking, "Can the model produce working code?"

You are asking, "Can our team still understand the system after accepting this?"

That is the question worth keeping.

AI-assisted development is going to keep getting faster. The code will keep coming. The demos will keep looking magical. The pricing pages will keep promising more output per engineer.

Fine.

Just remember that the codebase is not the asset by itself. The asset is the codebase plus the team's ability to reason about it.

When those two drift apart, you are not moving faster.

You are borrowing understanding from the future.

And the future always sends an invoice.