Have you ever been thrown into an unfamiliar codebase while deadlines got tighter, stress levels rose, and incidents became harder to resolve?
I have, and it's not a pleasant experience.
This got me thinking about a kind of debt engineering teams rarely measure.
Not technical debt.
Something more subtle:
comprehension debt.
I think of comprehension debt as the gap between how fast a system changes and how well the team understands it.
And AI is making this gap more important.
AI didn't create the problem.
This problem existed long before AI.
Teams have always struggled with knowledge silos, undocumented systems, fragile ownership, and “only one person knows how this works” situations.
But AI can accelerate the problem.
When AI helps us write, refactor, and ship code faster, the codebase can evolve faster than the team’s shared understanding.
That is useful when paired with strong review, explanation, documentation, and ownership.
But dangerous when it turns into:
“The code changed, but nobody really understands the system better.”
That is the kind of risk I wanted to make more visible.
What if I could quantify comprehension debt somehow? At least to a certain degree of approximation.
I started exploring the different variables and components that would impact comprehension, for better or worse.
Based on my experience, conversations with other engineers, and patterns I’ve seen across teams, I started building a scoring methodology to approximate comprehension debt.
My goal here is to help engineering teams spot where critical or highly connected systems are changing faster than the team understands them.
What is comprehension debt?
Let's start with a more formal definition:
Comprehension debt rises when system impact, complexity, dependency surface area, change velocity, and AI-assisted change speed outpace team understanding, coverage, redundancy, documentation, and human ownership.
I am not basing this just on theory.
I experienced the negative effects of high comprehension debt recently in one of my past teams.
It was a stressful and demoralizing experience.
I constantly felt behind and had to keep up.
Incidents were getting worse and harder to resolve over time because the level of system understanding across the team was too low.
The code existed.
The services existed.
The tickets kept moving.
But the shared mental model of the system was not keeping up.
That is a hard place to work from.
You feel reactive all the time.
You are not just debugging the incident.
You are debugging your own lack of context.
Why AI makes this worth measuring now
AI-assisted development can be incredibly useful.
I use AI myself, quite frequently.
But I keep coming back to this question:
Are we increasing shipping velocity without increasing understanding velocity?
Because those are not the same thing.
A team can ship more code and still understand less of the system over time.
AI can help generate implementation options, refactor code, explain files, write tests, and speed up repetitive work.
But if AI-assisted changes are merged without enough human explanation, review, documentation, or ownership, comprehension debt can accumulate faster.
The issue is not:
“Did AI write this?”
The better question is:
“Can the team still explain, review, modify, deploy, and recover this system safely?”
How the debt score is calculated
This is not meant to be a perfect mathematical model.
It is an attempt to make an invisible engineering risk visible enough to discuss, compare, and improve.
At a high level, the score combines two sides:
1) System pressure
These factors increase comprehension debt:
- high criticality
- high complexity
- high change velocity
- high incident sensitivity
- high dependency surface area
- high ownership concentration
- high AI acceleration risk when AI is used without strong human guardrails
2) Team comprehension coverage
These factors reduce comprehension debt:
- more safe modifiers
- more clear explainers
- stronger reviewer redundancy
- better documentation
- more recent hands-on exposure
- stronger on-call familiarity
- better engineer-system capability scores
So the rough model is:
Comprehension Debt =
System pressure
+ dependency pressure
+ ownership concentration
+ optional AI acceleration
minus
human coverage
+ documentation quality
+ reviewer redundancy
+ recent exposure
+ operational familiarity
To detect dangerous gaps, a Minimum Viable Coverage check for critical systems is performed.
For a critical system, the sheet checks whether it has:
- at least 2 safe modifiers
- at least 2 capable reviewers
- documentation quality of 3 or higher
- recent hands-on exposure of 3 or higher
- on-call familiarity of 3 or higher
If one of these is missing, the system gets an MVC gap.
A critical system with an MVC gap should be flagged even if the overall debt score looks moderate.
Discussion
I am very open to feedback on how the methodology could be improved.
Also curious:
How does your team maintain understanding of the codebase?
What signals tell you that your team is starting to lose understanding of a system?
And have you noticed AI changing the speed at which your systems evolve compared to the speed at which your team understands them?
I think this is going to become a much bigger engineering leadership problem as AI generation and automation accelerates.
We are getting better at generating code.
But we still need to get better at preserving shared understanding.
Get the template
To make this easier to reason about, I turned the methodology into a spreadsheet-first template:
System Comprehension Heatmap
It includes:
- System Inventory
- Engineer-System Matrix
- Overview Dashboard
- Risk Recommendations
- optional AI acceleration scoring
- Minimum Viable Coverage checks
- quick-start PDF guide
I would love feedback.
Top comments (45)
This resonates a lot honestly, even though I haven’t worked in a large enough engineering environment yet to fully experience comprehension debt at scale.
But I have started noticing smaller versions of this feeling already while transitioning from side projects/tutorial-style development into existing systems at my previous work/smaller projects.
That feeling of:
“The code exists… but the mental model doesn’t yet.”
And I think your point about AI accelerating implementation faster than understanding is especially important. Because it’s very easy to mistake “things are moving faster” for “the team understands the system better.”
Your line:
“You are debugging your own lack of context.”
…hit particularly hard.
Curious though:
Do you think comprehension debt is mostly a scaling problem, or can small teams/startups accumulate it just as dangerously because things move faster and documentation/knowledge sharing gets deprioritized?
I am glad this resonated with you Aryan.
To answer your question, having experienced working in both a small team and a bigger tech organisation, I think that comprehension debt is definitely a problem for both even though the complexity for solving it does increase with scale.
Here is more context for you:
For a larger org, there are so many changes happening on a daily from numerous teams. Teams change over time, either from internal or external movements of labor. Legacy codebases start getting bloated. Suddenly a large incident hits and costs the company a lot of money because of changes nobody understands. Just like technical debt, comprehension debt becomes exponentially larger and difficult to manage at scale.
For a smaller team it is important to move fast and keep iterating to find the flow that works so its arguably acceptable to not be bogged down by heavy processes. However its important to lay healthy foundations early on for your foundations engineering team to set a culture of knowledge sharing. In a smaller team, if your star engineer with most of the knowledge suddenly leaves your startup then you have a big comprehension gap. So comprehension debt is important to keep track of early even if its as simple as promoting more knolwedge sharing and keep an audit trail of changes and light documentation of your systems.
This makes a lot of sense honestly.
The “star engineer leaves and suddenly there’s a huge comprehension gap” example especially made the problem click for me in a much more concrete way. Because yeah, even small teams can probably operate on invisible assumptions for a long time without noticing the risk buildup underneath.
I also like that you’re not framing this as “slow everything down and document every atom,” but more as building healthy knowledge-sharing habits early before scale amplifies the gaps.
And the AI angle keeps making this feel more important to me the more I think about it. Because if implementation speed keeps increasing, then shared understanding almost becomes its own scaling bottleneck.
Really appreciate the detailed response btw. Feels like one of those concepts that becomes more obvious the moment someone finally gives it a name.
honestly this hits harder with AI agents in the mix. they can refactor a module in 3 minutes. catching up to what actually changed and why it's safe to ship still takes days.
True. Teams will be increasingly composed of both agents and human developers and ensuring that knowledge is shared sustainably.
Do you have any learnings to share on how to achieve this?
In one of my previous posts you mentioned adding friction/thinking where it matters to ensure the right questions are being considered: 'one hard block before architecture lock'. Curious if you have other recommendations.
The most durable thing I have tried is a one-para intent file per module - not what the code does but what we explicitly ruled out and why. Agents can surface it during a refactor, humans read it at review. Keeps the friction point you mentioned alive without needing a Confluence doc nobody opens after day one.
Nice recommendation, makes sense, thanks.
glad it resonated — the ruled-out-and-why part is the bit that actually survives agent turns. most everything else gets re-derived from the code anyway.
Useful framing. It maps to a real failure mode.
My main concern is false precision. A scorecard can imply understanding is in better shape than it is. Two reviewers can share the same blind spots. Docs can exist and still be stale. Recent exposure does not mean someone can reason through failures under pressure.
The harder problem is distribution of understanding, not just counts. A team can look covered on paper and still be fragile if the real mental model lives with the same few people.
I would treat the score as a prompt for deeper review, not evidence that risk is controlled.
Also worth looking at signals from the review system itself. In Gerrit, for example, you can inspect who actually reviews what, who repeatedly becomes the bottleneck, who gets bypassed on risky areas, and where approval patterns are too concentrated. That may give you a better read on comprehension concentration than self-reported coverage fields alone.
Good points, thanks for sharing.
I agree that relying on the score itself could be misleading. Using it as a prompt for deeper review is the safer idea of how to use the scorecard, as you suggest.
I attempt to cover distribution of understanding with the Minimum Viable Coverage component.
Self-reported coverage has its limitations. You mention several here. This scorecard can be used as an awareness tool to prompt action where otherwise none would be taken.
That makes sense. Thanks for clarifying how you intend the scorecard to be used and where Minimum Viable Coverage fits. Used as an awareness tool and a prompt for deeper review, it feels much more grounded :)
Comprehension debt is the right name for this. The part that bites me: when I run agents across a codebase, the agent forgets the decisions between sessions the same way the team does, so I started making it write a one-line why into the PR before it can merge. Cut my re-reading time on incidents by maybe half.
I really like that process. Doesn't cost much to do on the moment, but can save plenty of headaches down the road.
The cheap part is what makes it stick. Any practice that needs its own tool or ritual ends up competing for attention and usually loses. A one-line "why" in the PR survives because it rides infrastructure the team already reads. The failure mode I still hit: the agent writes the why to clear the gate, not to inform the next session. I started grading those lines in review the same as code, which helped more than I expected.
This is such a great "hit hard" topic that needs all the attention it can get.
I have somewhat of a running joke with developer friends that AI is the new product team when it comes to making projects move faster than the engineering teams can track and understand.
I think it falls very much inline with those wonderful comments we get like "it's just a one line change" and ship it. Without ever stopping to evaluate if it really does have a greater impact or not, let alone what that one line change means to the bigger picture.
I think it also should have a warning sticker that says "Have you thought about technical debt" every time.
Hey Peter, I am glad this post resonated with you.
It's true! AI is the product team that never rests haha
Agree, a quick productivity increase can hide longer term effects that are not always positive if tech or comprehension debt accumulates.
This is a strong framing.
Technical debt is visible because it eventually shows up in slow builds, messy code, bugs, or painful refactors. Comprehension debt is quieter. It shows up when only one person understands why something exists, when new developers are afraid to touch a module, or when the team can change code but cannot explain the consequences.
AI can make this worse if teams are not careful. It can generate code faster than the team can understand it, which means the codebase grows while shared understanding shrinks.
The real test is not “can we ship the change?” It is “can the team explain the system after the change?”
Well said Suny! Comprehension debt is hard to track indeed, I am attempting here to measure it.
The distinction between technical debt and comprehension debt is the strongest part of your argument. While technical debt is visible and can be logged, comprehension debt builds silently and erodes a team's ability to debug or refactor effectively. Velocity without understanding produces brittle systems that are painful to maintain. Making code explainability a merge requirement is a practical and necessary counterbalance.
Agree. Adding friction where it matters can be justified in this context if it helps keep comprehension debt low.
The comprehension debt framing is underrated. I've seen teams with "clean code" that still take weeks to onboard new engineers because the domain knowledge lives in people's heads, not the codebase.
Code that's easy to delete requires another dimension: 叙事性. The code should tell a story about why decisions were made, not just what the code does. That context is what lets future maintainers make informed changes without breaking implicit contracts.
How do you measure comprehension debt in practice? Just code review feedback time, or something more systematic?
Agree. I put forward in this post some metrics to estimate comprehension debt levels:
Comprehension Debt =
System pressure
minus
Human coverage
AI makes code production faster, but understanding doesn't scale with output speed. Have seen teams ship 3x more PRs with AI assistance while incident resolution time goes up — fewer people truly understand the changes being made. The hard part with any comprehension debt metric is getting teams to measure it before a production incident forces them to.
Agree. Do you have any metrics you track on your end to measure comprehension debt?
"Comprehension debt" is the perfect term for this. I've seen teams rewrite perfectly functional code simply because the original authors left and the remaining team couldn't understand the mental model behind it. If the code is clean but the team's mental map is outdated, the velocity still drops to zero. How do you usually recommend teams measure this before it becomes critical?
Ensure your teams have good knowledge sharing practices in place, that there are solid manual code review processes in place where necessary and that there is no critical system that is only owned by one engineer.
Here are some factors to look at and attempt to measure: