DEV Community

Your Codebase Has Technical Debt. But Does Your Team Have Comprehension Debt?

Julien Avezou on May 18, 2026

Have you ever been thrown into an unfamiliar codebase while deadlines got tighter, stress levels rose, and incidents became harder to resolve? I h...

Read full post

Aryan Choudhary • May 18

This resonates a lot honestly, even though I haven’t worked in a large enough engineering environment yet to fully experience comprehension debt at scale.

But I have started noticing smaller versions of this feeling already while transitioning from side projects/tutorial-style development into existing systems at my previous work/smaller projects.

That feeling of:
“The code exists… but the mental model doesn’t yet.”

And I think your point about AI accelerating implementation faster than understanding is especially important. Because it’s very easy to mistake “things are moving faster” for “the team understands the system better.”

Your line:
“You are debugging your own lack of context.”
…hit particularly hard.

Curious though:
Do you think comprehension debt is mostly a scaling problem, or can small teams/startups accumulate it just as dangerously because things move faster and documentation/knowledge sharing gets deprioritized?

Julien Avezou • May 18

I am glad this resonated with you Aryan.

To answer your question, having experienced working in both a small team and a bigger tech organisation, I think that comprehension debt is definitely a problem for both even though the complexity for solving it does increase with scale.

Here is more context for you:
For a larger org, there are so many changes happening on a daily from numerous teams. Teams change over time, either from internal or external movements of labor. Legacy codebases start getting bloated. Suddenly a large incident hits and costs the company a lot of money because of changes nobody understands. Just like technical debt, comprehension debt becomes exponentially larger and difficult to manage at scale.
For a smaller team it is important to move fast and keep iterating to find the flow that works so its arguably acceptable to not be bogged down by heavy processes. However its important to lay healthy foundations early on for your foundations engineering team to set a culture of knowledge sharing. In a smaller team, if your star engineer with most of the knowledge suddenly leaves your startup then you have a big comprehension gap. So comprehension debt is important to keep track of early even if its as simple as promoting more knolwedge sharing and keep an audit trail of changes and light documentation of your systems.

Aryan Choudhary • May 19

This makes a lot of sense honestly.

The “star engineer leaves and suddenly there’s a huge comprehension gap” example especially made the problem click for me in a much more concrete way. Because yeah, even small teams can probably operate on invisible assumptions for a long time without noticing the risk buildup underneath.

I also like that you’re not framing this as “slow everything down and document every atom,” but more as building healthy knowledge-sharing habits early before scale amplifies the gaps.

And the AI angle keeps making this feel more important to me the more I think about it. Because if implementation speed keeps increasing, then shared understanding almost becomes its own scaling bottleneck.

Really appreciate the detailed response btw. Feels like one of those concepts that becomes more obvious the moment someone finally gives it a name.

Mykola Kondratiuk • May 19

honestly this hits harder with AI agents in the mix. they can refactor a module in 3 minutes. catching up to what actually changed and why it's safe to ship still takes days.

Julien Avezou • May 19

True. Teams will be increasingly composed of both agents and human developers and ensuring that knowledge is shared sustainably.
Do you have any learnings to share on how to achieve this?
In one of my previous posts you mentioned adding friction/thinking where it matters to ensure the right questions are being considered: 'one hard block before architecture lock'. Curious if you have other recommendations.

Mykola Kondratiuk • May 19

The most durable thing I have tried is a one-para intent file per module - not what the code does but what we explicitly ruled out and why. Agents can surface it during a refactor, humans read it at review. Keeps the friction point you mentioned alive without needing a Confluence doc nobody opens after day one.

Julien Avezou • May 19

Nice recommendation, makes sense, thanks.

Mykola Kondratiuk • May 19

glad it resonated — the ruled-out-and-why part is the bit that actually survives agent turns. most everything else gets re-derived from the code anyway.

S M Tahosin • May 24

"Comprehension debt" is the perfect term for this. I've seen teams rewrite perfectly functional code simply because the original authors left and the remaining team couldn't understand the mental model behind it. If the code is clean but the team's mental map is outdated, the velocity still drops to zero. How do you usually recommend teams measure this before it becomes critical?

Julien Avezou • May 24

Ensure your teams have good knowledge sharing practices in place, that there are solid manual code review processes in place where necessary and that there is no critical system that is only owned by one engineer.
Here are some factors to look at and attempt to measure:

high criticality
high complexity
high change velocity
high incident sensitivity
high dependency surface area
high ownership concentration
high AI acceleration risk when AI is used without strong human guardrails

Rasmus Ros • May 18

Useful framing. It maps to a real failure mode.

My main concern is false precision. A scorecard can imply understanding is in better shape than it is. Two reviewers can share the same blind spots. Docs can exist and still be stale. Recent exposure does not mean someone can reason through failures under pressure.

The harder problem is distribution of understanding, not just counts. A team can look covered on paper and still be fragile if the real mental model lives with the same few people.

I would treat the score as a prompt for deeper review, not evidence that risk is controlled.

Also worth looking at signals from the review system itself. In Gerrit, for example, you can inspect who actually reviews what, who repeatedly becomes the bottleneck, who gets bypassed on risky areas, and where approval patterns are too concentrated. That may give you a better read on comprehension concentration than self-reported coverage fields alone.

Julien Avezou • May 18

Good points, thanks for sharing.
I agree that relying on the score itself could be misleading. Using it as a prompt for deeper review is the safer idea of how to use the scorecard, as you suggest.
I attempt to cover distribution of understanding with the Minimum Viable Coverage component.
Self-reported coverage has its limitations. You mention several here. This scorecard can be used as an awareness tool to prompt action where otherwise none would be taken.

Rasmus Ros • May 19

That makes sense. Thanks for clarifying how you intend the scorecard to be used and where Minimum Viable Coverage fits. Used as an awareness tool and a prompt for deeper review, it feels much more grounded :)

Andrii Krugliak • May 20

Comprehension debt is the right name for this. The part that bites me: when I run agents across a codebase, the agent forgets the decisions between sessions the same way the team does, so I started making it write a one-line why into the PR before it can merge. Cut my re-reading time on incidents by maybe half.

Julien Avezou • May 20

I really like that process. Doesn't cost much to do on the moment, but can save plenty of headaches down the road.

Andrii Krugliak • May 21

The cheap part is what makes it stick. Any practice that needs its own tool or ritual ends up competing for attention and usually loses. A one-line "why" in the PR survives because it rides infrastructure the team already reads. The failure mode I still hit: the agent writes the why to clear the gate, not to inform the next session. I started grading those lines in review the same as code, which helped more than I expected.

Peter Witham • May 20

This is such a great "hit hard" topic that needs all the attention it can get.

I have somewhat of a running joke with developer friends that AI is the new product team when it comes to making projects move faster than the engineering teams can track and understand.

I think it falls very much inline with those wonderful comments we get like "it's just a one line change" and ship it. Without ever stopping to evaluate if it really does have a greater impact or not, let alone what that one line change means to the bigger picture.

I think it also should have a warning sticker that says "Have you thought about technical debt" every time.

Julien Avezou • May 20

Hey Peter, I am glad this post resonated with you.
It's true! AI is the product team that never rests haha
Agree, a quick productivity increase can hide longer term effects that are not always positive if tech or comprehension debt accumulates.

Suny Choudhary • May 21

This is a strong framing.

Technical debt is visible because it eventually shows up in slow builds, messy code, bugs, or painful refactors. Comprehension debt is quieter. It shows up when only one person understands why something exists, when new developers are afraid to touch a module, or when the team can change code but cannot explain the consequences.

AI can make this worse if teams are not careful. It can generate code faster than the team can understand it, which means the codebase grows while shared understanding shrinks.

The real test is not “can we ship the change?” It is “can the team explain the system after the change?”

Julien Avezou • May 21

Well said Suny! Comprehension debt is hard to track indeed, I am attempting here to measure it.

mote • May 23

The comprehension debt framing is underrated. I've seen teams with "clean code" that still take weeks to onboard new engineers because the domain knowledge lives in people's heads, not the codebase.

Code that's easy to delete requires another dimension: 叙事性. The code should tell a story about why decisions were made, not just what the code does. That context is what lets future maintainers make informed changes without breaking implicit contracts.

How do you measure comprehension debt in practice? Just code review feedback time, or something more systematic?

Julien Avezou • May 23

Agree. I put forward in this post some metrics to estimate comprehension debt levels:

Comprehension Debt =

System pressure

dependency pressure
ownership concentration
optional AI acceleration

minus

Human coverage

documentation quality
reviewer redundancy
recent exposure
operational familiarity

Mininglamp • May 20

AI makes code production faster, but understanding doesn't scale with output speed. Have seen teams ship 3x more PRs with AI assistance while incident resolution time goes up — fewer people truly understand the changes being made. The hard part with any comprehension debt metric is getting teams to measure it before a production incident forces them to.

Julien Avezou • May 20

Agree. Do you have any metrics you track on your end to measure comprehension debt?

NOVAInetwork • May 19

Comprehension debt is real even as a solo founder.
I use AI heavily for implementation but the
architectural decisions still have to live in my
head. An AI agent can write the handler in minutes.
Understanding why the codec layout is structured
that way, why this fee constant exists, why this
capability gate was added takes much longer.

The pattern I found: if the decision is not
documented at the layer where it is enforced, it
gets lost. A later AI session will change it
because it only sees the local context, not the
reason behind the structure.

The fix that worked for me: every protocol decision
is recorded where it is enforced, not in a
separate doc. Golden-vector tests lock byte
layouts. Constants are named for their purpose.
Validation hooks carry comments about why they
reject, not just what they reject. The codebase
becomes its own knowledge base.

AI is the best tool I have ever used. But it is a
tool. The understanding still has to come from the
person building the system.

dannyblinks28 • May 19

Man, this hits close to home. I’ve definitely been on teams where we were pushing out features like crazy, but if a major incident happened on an off-hour, everyone was terrified because only one specific dev actually knew how that legacy service functioned. Now that we're using AI assistants to spin up boilerplate and refactor chunks of code in seconds, that disconnect feels twice as fast. It’s way too easy to review and merge a PR that "works" without actually absorbing how it impacts the broader system architecture. We really need to start treating code readability and team knowledge sharing as critical sprint deliverables, not just nice-to-haves we'll get to later.

Julien Avezou • May 19

I couldn't agree more. I am hearing this issue more and more.
Going through those kind of major incidents that are tricky to debug is definitely a thought provoking experience.

Christie Cosky • May 19

I recently asked a developer questions about a feature they wrote and merged 5 weeks ago. They sent me Claude Code answers to every question I sent. I'm not convinced they understand anything they wrote.

I've found that the higher-quality the code I write with AI, the better I understand it. By going through multiple refactoring revisions to improve readability/maintainability and writing easy-to-understand unit tests (also refactored for clarity), I understand it better. Not as well as I understand code I hand-wrote myself, but enough to decrease the comprehension debt. Makes it easier for the code reviewer to understand and easier for me and others to understand it in the future as well.

Julien Avezou • May 19

There is definitely a point to be made about making the AI generated code more readable through explicit rules that are shared across the team. Not as effective as hand-written code but better than just blindly prompting.

Theo Valmis • May 20

One dimension worth adding to the scoring methodology: asymmetric comprehension — where one person holds 90% of the understanding of a system and the rest of the team holds 10%. AI accelerates that asymmetry. When the person with the highest comprehension uses AI to ship faster, the delta between their understanding and everyone else's grows even faster than before.

The debt isn't just collective, it's distributed unevenly — and the person carrying most of the comprehension often doesn't realize how wide the gap has become because they're not the ones feeling the confusion. It shows up as incidents, slow onboarding, and "why does this work?" questions that nobody can answer — not as a number anyone is measuring.

Julien Avezou • May 20

That is an excellent point and observation Theo.
The person with the most comprehension to begin with won't feel the gap the same way as others with less understanding.
The impact of that person using AI to speed up change should be measured with a heavier weight.

Yurii Cherkasov • May 26 • Edited

A very familiar problem.

Imagine a team can work purely on vibes for a week (which happens almost everywhere in the industry), for example on a fast-moving mobile app, without stopping to reflect what actually changed.

Then Monday arrives and suddenly it is a David Blaine moment:

"Guys, what are you writing? - It's a mobile app. Yeah, an Android application. - Look closely, guys. - Oh my god... it's the operating system for car keychains!"

Jokes aside, your "comprehension debt" is something many teams feel but rarely measure: the codebase keeps moving, but the shared mental model does not move at the same speed (or even degrade, like in the case above)

On practical view based on team leading experience, I would say that teams probably need a multi-layer defense, not a single solution. There's no magic bullet.

The classic layer is still old good code review, but review has to become more than "is this correct?"
It should include: "Can another engineer explain this change?", "Does this alter the solution on the architectural level?", and "Did we update the operational team knowledge?"

Then comes old good documentation, but again, not giant wiki pages that nobody reads. I would prefer lightweight, enforced artifacts: ADRs for architectural decisions, short "how this subsystem works" notes, runbooks for critical flows. It's important to remember that documentation is being read by AI as well as people.

Another useful practice could be a comprehension review for critical systems: once per sprint or month, ask engineers to explain a subsystem they do not own. If only one person can explain it, that is a bus-factor risk.

Of course, the modern layer is AI-assisted comprehension. AI can be used not only to generate code, but to continuously explain diffs, summarize PRs, detect undocumented behavior changes, generate onboarding guides, and create "system maps" for repos.

An interesting solution in this specific direction is building a kind of collective team memory around the codebase. Not just classic documentation as static pages, but a searchable engineering memory made from PR discussions, ADRs, incident reports, architecture notes, important Slack/Teams decisions, and AI-reviewed code explanations. Technically, this can be implemented with pgvector on top of PSQL, or with a dedicated vector database if the scale requires it.

The important part is not the database itself, but the workflow: every important engineering decision and explanation becomes retrievable later by semantic search. A new engineer or reviewer could ask: "Why does this service retry this way?", "What incidents involved this subsystem?", "Someone worked on rendering bug in May 2025 - who was that and what was the solution?"

That turns AI from a code generator into a memory interface for the team. It does not replace ownership or review, but it reduces the chance that your knowledge silently disappears.

Julien Avezou • May 26

I like the way you structured your thoughts here. Building a memory interface for AI makes a lot of sense and you suggest some concrete actions. Thanks for the inputs Yurii!

Siyu • May 18 • Edited

The distinction between technical debt and comprehension debt is the strongest part of your argument. While technical debt is visible and can be logged, comprehension debt builds silently and erodes a team's ability to debug or refactor effectively. Velocity without understanding produces brittle systems that are painful to maintain. Making code explainability a merge requirement is a practical and necessary counterbalance.

Julien Avezou • May 18

Agree. Adding friction where it matters can be justified in this context if it helps keep comprehension debt low.

Ethan Walker • May 21

Comprehension debt is the load-bearing concept here. We had a 30-turn LangGraph trace that nobody on the team could debug in under an hour because the graph state was implicit in 5 different node updates. Pulled it apart into explicit state transitions with logging and on-call mean-time-to-resolution dropped in half. The fix wasn't smarter code or better LLMs. It was making the state machine visible to the next engineer who hadn't seen it.

Julien Avezou • May 21

Great example Ethan!

JB_AMT • May 22

We refer to this concept as Cognitive Load and consider it at individual, team, and org levels. A variety of tooling and automation facilitates shifting CL away from individuals. A simple example is unit tests that allow developers to free cognitive space for creative work and advancing the project.

Julien Avezou • May 22

Very true with the unit tests. Solid testing and rules within the codebase itself support developers a lot in terms of congitive load. If in a larger org these can be smartly standardised when it makes sense, this prevents context switching too and can free up thinking to focus on other important aspects of the workflow.