DEV Community

Cover image for Your Codebase Has Technical Debt. But Does Your Team Have Comprehension Debt?
Julien Avezou
Julien Avezou Subscriber

Posted on

Your Codebase Has Technical Debt. But Does Your Team Have Comprehension Debt?

How AI accelerates hidden knowledge gaps

Have you ever been thrown into an unfamiliar codebase while deadlines got tighter, stress levels rose, and incidents became harder to resolve?

I have, and it's not a pleasant experience.

This got me thinking about a kind of debt engineering teams rarely measure.

Not technical debt.

Something more subtle:

comprehension debt.

I think of comprehension debt as the gap between how fast a system changes and how well the team understands it.

And AI is making this gap more important.

AI didn't create the problem.

This problem existed long before AI.

Teams have always struggled with knowledge silos, undocumented systems, fragile ownership, and “only one person knows how this works” situations.

But AI can accelerate the problem.

When AI helps us write, refactor, and ship code faster, the codebase can evolve faster than the team’s shared understanding.

That is useful when paired with strong review, explanation, documentation, and ownership.

But dangerous when it turns into:

“The code changed, but nobody really understands the system better.”

That is the kind of risk I wanted to make more visible.

What if I could quantify comprehension debt somehow? At least to a certain degree of approximation.

I started exploring the different variables and components that would impact comprehension, for better or worse.

Based on my experience, conversations with other engineers, and patterns I’ve seen across teams, I started building a scoring methodology to approximate comprehension debt.

My goal here is to help engineering teams spot where critical or highly connected systems are changing faster than the team understands them.


What is comprehension debt?

Let's start with a more formal definition:

Comprehension debt rises when system impact, complexity, dependency surface area, change velocity, and AI-assisted change speed outpace team understanding, coverage, redundancy, documentation, and human ownership.

I am not basing this just on theory.

I experienced the negative effects of high comprehension debt recently in one of my past teams.

It was a stressful and demoralizing experience.

I constantly felt behind and had to keep up.

Incidents were getting worse and harder to resolve over time because the level of system understanding across the team was too low.

The code existed.

The services existed.

The tickets kept moving.

But the shared mental model of the system was not keeping up.

That is a hard place to work from.

You feel reactive all the time.

You are not just debugging the incident.

You are debugging your own lack of context.


Why AI makes this worth measuring now

AI-assisted development can be incredibly useful.

I use AI myself, quite frequently.

But I keep coming back to this question:

Are we increasing shipping velocity without increasing understanding velocity?

Because those are not the same thing.

A team can ship more code and still understand less of the system over time.

AI can help generate implementation options, refactor code, explain files, write tests, and speed up repetitive work.

But if AI-assisted changes are merged without enough human explanation, review, documentation, or ownership, comprehension debt can accumulate faster.

The issue is not:

“Did AI write this?”

The better question is:

“Can the team still explain, review, modify, deploy, and recover this system safely?”


How the debt score is calculated

This is not meant to be a perfect mathematical model.

It is an attempt to make an invisible engineering risk visible enough to discuss, compare, and improve.

At a high level, the score combines two sides:

1) System pressure

These factors increase comprehension debt:

  • high criticality
  • high complexity
  • high change velocity
  • high incident sensitivity
  • high dependency surface area
  • high ownership concentration
  • high AI acceleration risk when AI is used without strong human guardrails

2) Team comprehension coverage

These factors reduce comprehension debt:

  • more safe modifiers
  • more clear explainers
  • stronger reviewer redundancy
  • better documentation
  • more recent hands-on exposure
  • stronger on-call familiarity
  • better engineer-system capability scores

So the rough model is:

Comprehension Debt =
System pressure
+ dependency pressure
+ ownership concentration
+ optional AI acceleration

minus

human coverage
+ documentation quality
+ reviewer redundancy
+ recent exposure
+ operational familiarity
Enter fullscreen mode Exit fullscreen mode

To detect dangerous gaps, a Minimum Viable Coverage check for critical systems is performed.

For a critical system, the sheet checks whether it has:

  • at least 2 safe modifiers
  • at least 2 capable reviewers
  • documentation quality of 3 or higher
  • recent hands-on exposure of 3 or higher
  • on-call familiarity of 3 or higher

If one of these is missing, the system gets an MVC gap.

A critical system with an MVC gap should be flagged even if the overall debt score looks moderate.


Discussion

I am very open to feedback on how the methodology could be improved.

Also curious:

How does your team maintain understanding of the codebase?

What signals tell you that your team is starting to lose understanding of a system?

And have you noticed AI changing the speed at which your systems evolve compared to the speed at which your team understands them?

I think this is going to become a much bigger engineering leadership problem as AI generation and automation accelerates.

We are getting better at generating code.

But we still need to get better at preserving shared understanding.

Get the template

To make this easier to reason about, I turned the methodology into a spreadsheet-first template:
System Comprehension Heatmap

It includes:

  • System Inventory
  • Engineer-System Matrix
  • Overview Dashboard
  • Risk Recommendations
  • optional AI acceleration scoring
  • Minimum Viable Coverage checks
  • quick-start PDF guide

You can get it for free here

I would love feedback.

Top comments (45)

Collapse
 
itsugo profile image
Aryan Choudhary

This resonates a lot honestly, even though I haven’t worked in a large enough engineering environment yet to fully experience comprehension debt at scale.

But I have started noticing smaller versions of this feeling already while transitioning from side projects/tutorial-style development into existing systems at my previous work/smaller projects.

That feeling of:
“The code exists… but the mental model doesn’t yet.”

And I think your point about AI accelerating implementation faster than understanding is especially important. Because it’s very easy to mistake “things are moving faster” for “the team understands the system better.”

Your line:
“You are debugging your own lack of context.”
…hit particularly hard.

Curious though:
Do you think comprehension debt is mostly a scaling problem, or can small teams/startups accumulate it just as dangerously because things move faster and documentation/knowledge sharing gets deprioritized?

Collapse
 
javz profile image
Julien Avezou

I am glad this resonated with you Aryan.

To answer your question, having experienced working in both a small team and a bigger tech organisation, I think that comprehension debt is definitely a problem for both even though the complexity for solving it does increase with scale.

Here is more context for you:
For a larger org, there are so many changes happening on a daily from numerous teams. Teams change over time, either from internal or external movements of labor. Legacy codebases start getting bloated. Suddenly a large incident hits and costs the company a lot of money because of changes nobody understands. Just like technical debt, comprehension debt becomes exponentially larger and difficult to manage at scale.
For a smaller team it is important to move fast and keep iterating to find the flow that works so its arguably acceptable to not be bogged down by heavy processes. However its important to lay healthy foundations early on for your foundations engineering team to set a culture of knowledge sharing. In a smaller team, if your star engineer with most of the knowledge suddenly leaves your startup then you have a big comprehension gap. So comprehension debt is important to keep track of early even if its as simple as promoting more knolwedge sharing and keep an audit trail of changes and light documentation of your systems.

Collapse
 
itsugo profile image
Aryan Choudhary

This makes a lot of sense honestly.

The “star engineer leaves and suddenly there’s a huge comprehension gap” example especially made the problem click for me in a much more concrete way. Because yeah, even small teams can probably operate on invisible assumptions for a long time without noticing the risk buildup underneath.

I also like that you’re not framing this as “slow everything down and document every atom,” but more as building healthy knowledge-sharing habits early before scale amplifies the gaps.

And the AI angle keeps making this feel more important to me the more I think about it. Because if implementation speed keeps increasing, then shared understanding almost becomes its own scaling bottleneck.

Really appreciate the detailed response btw. Feels like one of those concepts that becomes more obvious the moment someone finally gives it a name.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

honestly this hits harder with AI agents in the mix. they can refactor a module in 3 minutes. catching up to what actually changed and why it's safe to ship still takes days.

Collapse
 
javz profile image
Julien Avezou

True. Teams will be increasingly composed of both agents and human developers and ensuring that knowledge is shared sustainably.
Do you have any learnings to share on how to achieve this?
In one of my previous posts you mentioned adding friction/thinking where it matters to ensure the right questions are being considered: 'one hard block before architecture lock'. Curious if you have other recommendations.

Collapse
 
itskondrat profile image
Mykola Kondratiuk

The most durable thing I have tried is a one-para intent file per module - not what the code does but what we explicitly ruled out and why. Agents can surface it during a refactor, humans read it at review. Keeps the friction point you mentioned alive without needing a Confluence doc nobody opens after day one.

Thread Thread
 
javz profile image
Julien Avezou

Nice recommendation, makes sense, thanks.

Thread Thread
 
itskondrat profile image
Mykola Kondratiuk

glad it resonated — the ruled-out-and-why part is the bit that actually survives agent turns. most everything else gets re-derived from the code anyway.

Collapse
 
monom profile image
Rasmus Ros

Useful framing. It maps to a real failure mode.

My main concern is false precision. A scorecard can imply understanding is in better shape than it is. Two reviewers can share the same blind spots. Docs can exist and still be stale. Recent exposure does not mean someone can reason through failures under pressure.

The harder problem is distribution of understanding, not just counts. A team can look covered on paper and still be fragile if the real mental model lives with the same few people.

I would treat the score as a prompt for deeper review, not evidence that risk is controlled.

Also worth looking at signals from the review system itself. In Gerrit, for example, you can inspect who actually reviews what, who repeatedly becomes the bottleneck, who gets bypassed on risky areas, and where approval patterns are too concentrated. That may give you a better read on comprehension concentration than self-reported coverage fields alone.

Collapse
 
javz profile image
Julien Avezou

Good points, thanks for sharing.
I agree that relying on the score itself could be misleading. Using it as a prompt for deeper review is the safer idea of how to use the scorecard, as you suggest.
I attempt to cover distribution of understanding with the Minimum Viable Coverage component.
Self-reported coverage has its limitations. You mention several here. This scorecard can be used as an awareness tool to prompt action where otherwise none would be taken.

Collapse
 
monom profile image
Rasmus Ros

That makes sense. Thanks for clarifying how you intend the scorecard to be used and where Minimum Viable Coverage fits. Used as an awareness tool and a prompt for deeper review, it feels much more grounded :)

Collapse
 
theuniverseson profile image
Andrii Krugliak

Comprehension debt is the right name for this. The part that bites me: when I run agents across a codebase, the agent forgets the decisions between sessions the same way the team does, so I started making it write a one-line why into the PR before it can merge. Cut my re-reading time on incidents by maybe half.

Collapse
 
javz profile image
Julien Avezou

I really like that process. Doesn't cost much to do on the moment, but can save plenty of headaches down the road.

Collapse
 
theuniverseson profile image
Andrii Krugliak

The cheap part is what makes it stick. Any practice that needs its own tool or ritual ends up competing for attention and usually loses. A one-line "why" in the PR survives because it rides infrastructure the team already reads. The failure mode I still hit: the agent writes the why to clear the gate, not to inform the next session. I started grading those lines in review the same as code, which helped more than I expected.

Collapse
 
peterwitham profile image
Peter Witham

This is such a great "hit hard" topic that needs all the attention it can get.

I have somewhat of a running joke with developer friends that AI is the new product team when it comes to making projects move faster than the engineering teams can track and understand.

I think it falls very much inline with those wonderful comments we get like "it's just a one line change" and ship it. Without ever stopping to evaluate if it really does have a greater impact or not, let alone what that one line change means to the bigger picture.

I think it also should have a warning sticker that says "Have you thought about technical debt" every time.

Collapse
 
javz profile image
Julien Avezou

Hey Peter, I am glad this post resonated with you.
It's true! AI is the product team that never rests haha
Agree, a quick productivity increase can hide longer term effects that are not always positive if tech or comprehension debt accumulates.

Collapse
 
sunychoudhary profile image
Suny Choudhary

This is a strong framing.

Technical debt is visible because it eventually shows up in slow builds, messy code, bugs, or painful refactors. Comprehension debt is quieter. It shows up when only one person understands why something exists, when new developers are afraid to touch a module, or when the team can change code but cannot explain the consequences.

AI can make this worse if teams are not careful. It can generate code faster than the team can understand it, which means the codebase grows while shared understanding shrinks.

The real test is not “can we ship the change?” It is “can the team explain the system after the change?”

Collapse
 
javz profile image
Julien Avezou

Well said Suny! Comprehension debt is hard to track indeed, I am attempting here to measure it.

Collapse
 
innovationsiyu profile image
Siyu • Edited

The distinction between technical debt and comprehension debt is the strongest part of your argument. While technical debt is visible and can be logged, comprehension debt builds silently and erodes a team's ability to debug or refactor effectively. Velocity without understanding produces brittle systems that are painful to maintain. Making code explainability a merge requirement is a practical and necessary counterbalance.

Collapse
 
javz profile image
Julien Avezou

Agree. Adding friction where it matters can be justified in this context if it helps keep comprehension debt low.

Collapse
 
motedb profile image
mote

The comprehension debt framing is underrated. I've seen teams with "clean code" that still take weeks to onboard new engineers because the domain knowledge lives in people's heads, not the codebase.

Code that's easy to delete requires another dimension: 叙事性. The code should tell a story about why decisions were made, not just what the code does. That context is what lets future maintainers make informed changes without breaking implicit contracts.

How do you measure comprehension debt in practice? Just code review feedback time, or something more systematic?

Collapse
 
javz profile image
Julien Avezou

Agree. I put forward in this post some metrics to estimate comprehension debt levels:

Comprehension Debt =

System pressure

  • dependency pressure
  • ownership concentration
  • optional AI acceleration

minus

Human coverage

  • documentation quality
  • reviewer redundancy
  • recent exposure
  • operational familiarity
Collapse
 
mininglamp profile image
Mininglamp

AI makes code production faster, but understanding doesn't scale with output speed. Have seen teams ship 3x more PRs with AI assistance while incident resolution time goes up — fewer people truly understand the changes being made. The hard part with any comprehension debt metric is getting teams to measure it before a production incident forces them to.

Collapse
 
javz profile image
Julien Avezou

Agree. Do you have any metrics you track on your end to measure comprehension debt?

Collapse
 
tahosin profile image
S M Tahosin

"Comprehension debt" is the perfect term for this. I've seen teams rewrite perfectly functional code simply because the original authors left and the remaining team couldn't understand the mental model behind it. If the code is clean but the team's mental map is outdated, the velocity still drops to zero. How do you usually recommend teams measure this before it becomes critical?

Collapse
 
javz profile image
Julien Avezou

Ensure your teams have good knowledge sharing practices in place, that there are solid manual code review processes in place where necessary and that there is no critical system that is only owned by one engineer.
Here are some factors to look at and attempt to measure:

  • high criticality
  • high complexity
  • high change velocity
  • high incident sensitivity
  • high dependency surface area
  • high ownership concentration
  • high AI acceleration risk when AI is used without strong human guardrails