Julien Avezou

Posted on May 18

Your Codebase Has Technical Debt. But Does Your Team Have Comprehension Debt?

#ai #softwareengineering #engineeringmanagement #productivity

How AI accelerates hidden knowledge gaps

Have you ever been thrown into an unfamiliar codebase while deadlines got tighter, stress levels rose, and incidents became harder to resolve?

I have, and it's not a pleasant experience.

This got me thinking about a kind of debt engineering teams rarely measure.

Not technical debt.

Something more subtle:

comprehension debt.

I think of comprehension debt as the gap between how fast a system changes and how well the team understands it.

And AI is making this gap more important.

AI didn't create the problem.

This problem existed long before AI.

Teams have always struggled with knowledge silos, undocumented systems, fragile ownership, and “only one person knows how this works” situations.

But AI can accelerate the problem.

When AI helps us write, refactor, and ship code faster, the codebase can evolve faster than the team’s shared understanding.

That is useful when paired with strong review, explanation, documentation, and ownership.

But dangerous when it turns into:

“The code changed, but nobody really understands the system better.”

That is the kind of risk I wanted to make more visible.

What if I could quantify comprehension debt somehow? At least to a certain degree of approximation.

I started exploring the different variables and components that would impact comprehension, for better or worse.

Based on my experience, conversations with other engineers, and patterns I’ve seen across teams, I started building a scoring methodology to approximate comprehension debt.

My goal here is to help engineering teams spot where critical or highly connected systems are changing faster than the team understands them.

What is comprehension debt?

Let's start with a more formal definition:

Comprehension debt rises when system impact, complexity, dependency surface area, change velocity, and AI-assisted change speed outpace team understanding, coverage, redundancy, documentation, and human ownership.

I am not basing this just on theory.

I experienced the negative effects of high comprehension debt recently in one of my past teams.

It was a stressful and demoralizing experience.

I constantly felt behind and had to keep up.

Incidents were getting worse and harder to resolve over time because the level of system understanding across the team was too low.

The code existed.

The services existed.

The tickets kept moving.

But the shared mental model of the system was not keeping up.

That is a hard place to work from.

You feel reactive all the time.

You are not just debugging the incident.

You are debugging your own lack of context.

Why AI makes this worth measuring now

AI-assisted development can be incredibly useful.

I use AI myself, quite frequently.

But I keep coming back to this question:

Are we increasing shipping velocity without increasing understanding velocity?

Because those are not the same thing.

A team can ship more code and still understand less of the system over time.

AI can help generate implementation options, refactor code, explain files, write tests, and speed up repetitive work.

But if AI-assisted changes are merged without enough human explanation, review, documentation, or ownership, comprehension debt can accumulate faster.

The issue is not:

“Did AI write this?”

The better question is:

“Can the team still explain, review, modify, deploy, and recover this system safely?”

How the debt score is calculated

This is not meant to be a perfect mathematical model.

It is an attempt to make an invisible engineering risk visible enough to discuss, compare, and improve.

At a high level, the score combines two sides:

1) System pressure

These factors increase comprehension debt:

high criticality
high complexity
high change velocity
high incident sensitivity
high dependency surface area
high ownership concentration
high AI acceleration risk when AI is used without strong human guardrails

2) Team comprehension coverage

These factors reduce comprehension debt:

more safe modifiers
more clear explainers
stronger reviewer redundancy
better documentation
more recent hands-on exposure
stronger on-call familiarity
better engineer-system capability scores

So the rough model is:

Comprehension Debt =
System pressure
+ dependency pressure
+ ownership concentration
+ optional AI acceleration

minus

human coverage
+ documentation quality
+ reviewer redundancy
+ recent exposure
+ operational familiarity

To detect dangerous gaps, a Minimum Viable Coverage check for critical systems is performed.

For a critical system, the sheet checks whether it has:

at least 2 safe modifiers
at least 2 capable reviewers
documentation quality of 3 or higher
recent hands-on exposure of 3 or higher
on-call familiarity of 3 or higher

If one of these is missing, the system gets an MVC gap.

A critical system with an MVC gap should be flagged even if the overall debt score looks moderate.

Discussion

I am very open to feedback on how the methodology could be improved.

Also curious:

How does your team maintain understanding of the codebase?

What signals tell you that your team is starting to lose understanding of a system?

And have you noticed AI changing the speed at which your systems evolve compared to the speed at which your team understands them?

I think this is going to become a much bigger engineering leadership problem as AI generation and automation accelerates.

We are getting better at generating code.

But we still need to get better at preserving shared understanding.

Get the template

To make this easier to reason about, I turned the methodology into a spreadsheet-first template:
System Comprehension Heatmap

It includes:

System Inventory
Engineer-System Matrix
Overview Dashboard
Risk Recommendations
optional AI acceleration scoring
Minimum Viable Coverage checks
quick-start PDF guide

You can get it for free here

I would love feedback.

Top comments (45)

Aryan Choudhary • May 18

This resonates a lot honestly, even though I haven’t worked in a large enough engineering environment yet to fully experience comprehension debt at scale.

But I have started noticing smaller versions of this feeling already while transitioning from side projects/tutorial-style development into existing systems at my previous work/smaller projects.

That feeling of:
“The code exists… but the mental model doesn’t yet.”

And I think your point about AI accelerating implementation faster than understanding is especially important. Because it’s very easy to mistake “things are moving faster” for “the team understands the system better.”

Your line:
“You are debugging your own lack of context.”
…hit particularly hard.

Curious though:
Do you think comprehension debt is mostly a scaling problem, or can small teams/startups accumulate it just as dangerously because things move faster and documentation/knowledge sharing gets deprioritized?

Julien Avezou • May 18

I am glad this resonated with you Aryan.

To answer your question, having experienced working in both a small team and a bigger tech organisation, I think that comprehension debt is definitely a problem for both even though the complexity for solving it does increase with scale.

Here is more context for you:
For a larger org, there are so many changes happening on a daily from numerous teams. Teams change over time, either from internal or external movements of labor. Legacy codebases start getting bloated. Suddenly a large incident hits and costs the company a lot of money because of changes nobody understands. Just like technical debt, comprehension debt becomes exponentially larger and difficult to manage at scale.
For a smaller team it is important to move fast and keep iterating to find the flow that works so its arguably acceptable to not be bogged down by heavy processes. However its important to lay healthy foundations early on for your foundations engineering team to set a culture of knowledge sharing. In a smaller team, if your star engineer with most of the knowledge suddenly leaves your startup then you have a big comprehension gap. So comprehension debt is important to keep track of early even if its as simple as promoting more knolwedge sharing and keep an audit trail of changes and light documentation of your systems.

Aryan Choudhary • May 19

This makes a lot of sense honestly.

The “star engineer leaves and suddenly there’s a huge comprehension gap” example especially made the problem click for me in a much more concrete way. Because yeah, even small teams can probably operate on invisible assumptions for a long time without noticing the risk buildup underneath.

I also like that you’re not framing this as “slow everything down and document every atom,” but more as building healthy knowledge-sharing habits early before scale amplifies the gaps.

And the AI angle keeps making this feel more important to me the more I think about it. Because if implementation speed keeps increasing, then shared understanding almost becomes its own scaling bottleneck.

Really appreciate the detailed response btw. Feels like one of those concepts that becomes more obvious the moment someone finally gives it a name.

Mykola Kondratiuk • May 19

honestly this hits harder with AI agents in the mix. they can refactor a module in 3 minutes. catching up to what actually changed and why it's safe to ship still takes days.

Julien Avezou • May 19

True. Teams will be increasingly composed of both agents and human developers and ensuring that knowledge is shared sustainably.
Do you have any learnings to share on how to achieve this?
In one of my previous posts you mentioned adding friction/thinking where it matters to ensure the right questions are being considered: 'one hard block before architecture lock'. Curious if you have other recommendations.

Mykola Kondratiuk • May 19

The most durable thing I have tried is a one-para intent file per module - not what the code does but what we explicitly ruled out and why. Agents can surface it during a refactor, humans read it at review. Keeps the friction point you mentioned alive without needing a Confluence doc nobody opens after day one.

Julien Avezou • May 19

Nice recommendation, makes sense, thanks.

Mykola Kondratiuk • May 19

glad it resonated — the ruled-out-and-why part is the bit that actually survives agent turns. most everything else gets re-derived from the code anyway.

S M Tahosin • May 24

"Comprehension debt" is the perfect term for this. I've seen teams rewrite perfectly functional code simply because the original authors left and the remaining team couldn't understand the mental model behind it. If the code is clean but the team's mental map is outdated, the velocity still drops to zero. How do you usually recommend teams measure this before it becomes critical?

Julien Avezou • May 24

Ensure your teams have good knowledge sharing practices in place, that there are solid manual code review processes in place where necessary and that there is no critical system that is only owned by one engineer.
Here are some factors to look at and attempt to measure:

high criticality
high complexity
high change velocity
high incident sensitivity
high dependency surface area
high ownership concentration
high AI acceleration risk when AI is used without strong human guardrails

Rasmus Ros • May 18

Useful framing. It maps to a real failure mode.

My main concern is false precision. A scorecard can imply understanding is in better shape than it is. Two reviewers can share the same blind spots. Docs can exist and still be stale. Recent exposure does not mean someone can reason through failures under pressure.

The harder problem is distribution of understanding, not just counts. A team can look covered on paper and still be fragile if the real mental model lives with the same few people.

I would treat the score as a prompt for deeper review, not evidence that risk is controlled.

Also worth looking at signals from the review system itself. In Gerrit, for example, you can inspect who actually reviews what, who repeatedly becomes the bottleneck, who gets bypassed on risky areas, and where approval patterns are too concentrated. That may give you a better read on comprehension concentration than self-reported coverage fields alone.

Julien Avezou • May 18

Good points, thanks for sharing.
I agree that relying on the score itself could be misleading. Using it as a prompt for deeper review is the safer idea of how to use the scorecard, as you suggest.
I attempt to cover distribution of understanding with the Minimum Viable Coverage component.
Self-reported coverage has its limitations. You mention several here. This scorecard can be used as an awareness tool to prompt action where otherwise none would be taken.

Rasmus Ros • May 19

That makes sense. Thanks for clarifying how you intend the scorecard to be used and where Minimum Viable Coverage fits. Used as an awareness tool and a prompt for deeper review, it feels much more grounded :)

Andrii Krugliak • May 20

Comprehension debt is the right name for this. The part that bites me: when I run agents across a codebase, the agent forgets the decisions between sessions the same way the team does, so I started making it write a one-line why into the PR before it can merge. Cut my re-reading time on incidents by maybe half.

Julien Avezou • May 20

I really like that process. Doesn't cost much to do on the moment, but can save plenty of headaches down the road.

Andrii Krugliak • May 21

The cheap part is what makes it stick. Any practice that needs its own tool or ritual ends up competing for attention and usually loses. A one-line "why" in the PR survives because it rides infrastructure the team already reads. The failure mode I still hit: the agent writes the why to clear the gate, not to inform the next session. I started grading those lines in review the same as code, which helped more than I expected.

Peter Witham • May 20

This is such a great "hit hard" topic that needs all the attention it can get.

I have somewhat of a running joke with developer friends that AI is the new product team when it comes to making projects move faster than the engineering teams can track and understand.

I think it falls very much inline with those wonderful comments we get like "it's just a one line change" and ship it. Without ever stopping to evaluate if it really does have a greater impact or not, let alone what that one line change means to the bigger picture.

I think it also should have a warning sticker that says "Have you thought about technical debt" every time.

Julien Avezou • May 20

Hey Peter, I am glad this post resonated with you.
It's true! AI is the product team that never rests haha
Agree, a quick productivity increase can hide longer term effects that are not always positive if tech or comprehension debt accumulates.

Suny Choudhary • May 21

This is a strong framing.

Technical debt is visible because it eventually shows up in slow builds, messy code, bugs, or painful refactors. Comprehension debt is quieter. It shows up when only one person understands why something exists, when new developers are afraid to touch a module, or when the team can change code but cannot explain the consequences.

AI can make this worse if teams are not careful. It can generate code faster than the team can understand it, which means the codebase grows while shared understanding shrinks.

The real test is not “can we ship the change?” It is “can the team explain the system after the change?”

Julien Avezou • May 21

Well said Suny! Comprehension debt is hard to track indeed, I am attempting here to measure it.

mote • May 23

The comprehension debt framing is underrated. I've seen teams with "clean code" that still take weeks to onboard new engineers because the domain knowledge lives in people's heads, not the codebase.

Code that's easy to delete requires another dimension: 叙事性. The code should tell a story about why decisions were made, not just what the code does. That context is what lets future maintainers make informed changes without breaking implicit contracts.

How do you measure comprehension debt in practice? Just code review feedback time, or something more systematic?

Julien Avezou • May 23

Agree. I put forward in this post some metrics to estimate comprehension debt levels:

Comprehension Debt =

System pressure

dependency pressure
ownership concentration
optional AI acceleration

minus

Human coverage

documentation quality
reviewer redundancy
recent exposure
operational familiarity

Mininglamp • May 20

AI makes code production faster, but understanding doesn't scale with output speed. Have seen teams ship 3x more PRs with AI assistance while incident resolution time goes up — fewer people truly understand the changes being made. The hard part with any comprehension debt metric is getting teams to measure it before a production incident forces them to.

Julien Avezou • May 20

Agree. Do you have any metrics you track on your end to measure comprehension debt?

NOVAInetwork • May 19

Comprehension debt is real even as a solo founder.
I use AI heavily for implementation but the
architectural decisions still have to live in my
head. An AI agent can write the handler in minutes.
Understanding why the codec layout is structured
that way, why this fee constant exists, why this
capability gate was added takes much longer.

The pattern I found: if the decision is not
documented at the layer where it is enforced, it
gets lost. A later AI session will change it
because it only sees the local context, not the
reason behind the structure.

The fix that worked for me: every protocol decision
is recorded where it is enforced, not in a
separate doc. Golden-vector tests lock byte
layouts. Constants are named for their purpose.
Validation hooks carry comments about why they
reject, not just what they reject. The codebase
becomes its own knowledge base.

AI is the best tool I have ever used. But it is a
tool. The understanding still has to come from the
person building the system.

View full discussion (45 comments)