Keith MacKay

Posted on Jun 1 • Originally published at tlcmentor.substack.com

Your AI Team Is Building Debt Your CFO Can't See. Here's the Ledger.

#technicaldebt #ai #leadership #management

Your AI Team Is Building Debt Your CFO Can't See. Here's the Ledger.

The same tools driving your team's productivity are generating an entirely new class of technical debt. Here's what it costs -- and what you can do about it.

Your AI pilot worked. Velocity is up. Ticket counts are down. The team is shipping faster than they have in years. The business case for expansion is obvious.

And somewhere in the code your team just generated, a debt clock is running that your current metrics can't see.

Technical debt has a new face. It accumulates faster than the original, compounds across categories, and doesn't show up in your quality dashboard until something breaks. Executives who understand what's accumulating won't be blindsided when it comes due. The ones who don't will be surprised -- twice: once by the incident, and again by the cleanup cost.

The Original Technical Debt, in Thirty Seconds

Ward Cunningham coined the term in 1992 [1]. The metaphor: taking coding shortcuts to ship fast is like taking a financial loan. You gain velocity now, pay maintenance interest later, and eventually pay down the principal through refactoring. The debt was in the code itself -- messy logic, missing documentation, duplicated functions. A developer could read the code and find it. A team could plan to eliminate it.

That framework worked because the debt was legible. You could see it. You could measure it. You could schedule it away.

AI breaks that. The new debt isn't legible. It doesn't live in code you can read. It lives in your team's cognition, in your agents' memory states, and in the interaction complexity between dozens of AI processes running simultaneously. And it accumulates in six distinct categories that require six distinct responses.

Six Flavors of AI-Era Technical Debt

1. Cognitive Debt: Your Team Is Losing Their Mental Map

The MIT Media Lab monitored participants' brain activity while they wrote with AI assistance. The result: AI-assisted users showed the weakest neural engagement of any group. When they switched back to working without AI, they underperformed -- including struggling to recall their own recently-generated work [2].

For executives, the translation is this: teams that rely heavily on AI to generate code (or other work where the thinking was outsourced to the AI) are building work they don't fully understand. This isn't a skill or motivation problem. It's structural. The brain disengages when the machine does the thinking. Lack of code review is deceptively easy--and AI-based code review may or (I and my colleagues would argue) may NOT be good enough.

The business consequence arrives slowly, then suddenly. Velocity looks like 2-3x -- until something breaks in code nobody fully understands, or a feature needs to change in a module the team is now afraid to touch. The productivity gain becomes a productivity trap. Amazon's internal "deep dive" review of a "trend of incidents" with "high blast radius" attributed to "Gen-AI assisted changes" -- which led to a new policy requiring senior engineer sign-off on all AI-assisted code before production deployment -- is an early example of this consequence appearing at scale [3].

2. Intent Debt: The "Why" Is Evaporating

When a human engineer makes an architectural decision, they usually know why. When an AI generates code, the reasoning behind every choice goes undocumented -- because nobody wrote it down and the AI doesn't keep notes unless explicitly instructed to do so.

Future developers (or future AI agents) trying to modify that code have to guess. When AI agents guess, they guess statistically: the most plausible answer from their training, not the most accurate answer for your specific system (here my drumbeat: "plausibility does not equal correctness"). If your system has unusual constraints, rejected conventions, or specific business rules that aren't obvious from the code, the agent rediscovers the "obvious" approach your team already rejected -- and nobody remembers why it was rejected.

Intent debt is the gap between what the code does and why it does it [4]. AI development widens that gap faster than any development methodology that came before it.

3. Agentic Debt: Your Agents Are Running Up Bills You Can't See

This is the new operational risk that most AI governance frameworks haven't addressed yet.

AI agents -- software that takes autonomous action on your behalf -- can accumulate three kinds of problem without any visible failure signal:

Prompt Drift: Updating one agent's configuration inadvertently degrades related agents that share elements of that configuration. The coupling is invisible. You changed something. Something else broke. The connection isn't obvious.

State Rot: An agent designed to maintain memory over time gradually fills that memory with stale, irrelevant data. It was built to remember. Nobody told it what to forget. Its past corrupts its present.

Zombie Loops: An agent stuck attempting an impossible task keeps consuming API credits without failing visibly. It just runs. The invoice arrives. The work doesn't.

Unlike the technical debt Cunningham described, agentic debt doesn't fail suddenly. It degrades: 95% of expected output becomes 85%, then 70%, while the system continues to run and look superficially healthy. By the time the pattern is visible, significant cost and quality loss have already accumulated [5].

4. Orchestration Debt: Networks of Agents Are Networks of Risk

If your team is running more than a handful of AI agents, this is already accumulating. You might also think about it as "complexity debt".

Two agents interacting have one relationship. Ten agents have 45 potential interaction pathways. Twenty have 190. Those interactions produce emergent behaviors that weren't designed, that don't appear in any specification, and that resist debugging because they depend on the sequence and content of prior agent interactions.

This is the multi-agent version of microservices debt: a system too complex to reason about fully, with failures that cascade across boundaries, and ownership that becomes diffuse as the network grows. And, just like microservices debt, if you don't explicitly log and record what happens in interactions, no record may exist later for debugging. The individual components are non-deterministic, which makes tracing failures even harder than in traditional distributed systems. The system looks fine until it doesn't -- and then it's difficult to understand why.

5. Context Debt: Your AI Tools Are Leaking Productivity

Every AI agent has a finite working memory -- a "context window" that holds the conversation, the files it reads, the instructions it follows. When sessions run long, early decisions get displaced by newer information. The agent contradicts itself. It ignores patterns it established an hour ago. It re-reads files it already processed [6].

Teams that don't understand this constraint blame the tool for inconsistency that's actually a usage pattern problem. An AI assistant that felt impressive in the demo degrades noticeably in three-hour production sessions. Nothing changed about the technology. The usage model is wrong.

The operational version is worse: organizations that enable every available AI integration "just in case" can consume 30-40% of their agents' working memory before anyone starts a task [6]. More capability, counterintuitively, produces worse results.

There ARE strategies (Recursive Language Models or RLM [7], many individual sessions launched with subagents, frequent context clearing) that can combat this, but new models with 1M token context windows apparently DO NOT solve the problem -- context rot begins in those systems after just a few hundred thousand tokens, just like in their predecessors.

6. Perfectionism Debt: AI Makes Over-Engineering Free

Gold plating -- building features and abstractions nobody asked for -- predates AI. But AI makes it fast and cheap. Iterating on AI output costs almost nothing. Running one more refinement loop is a five-second decision. So teams iterate past "done" into "elegant," past "elegant" into "over-architected."

The result: codebases with impressive abstractions built for requirements that don't exist, maintained by teams who don't fully understand what was generated or why. The technical debt here isn't messiness. It's unnecessary complexity, added at a rate and scale that manual coding never permitted.

(As an engineer with perfectionistic tendencies, the siren song of "free to just try it" is near-impossible for me to ignore!)

How These Debts Interact

These six categories don't accumulate independently. They compound.

Cognitive debt (the team doesn't understand the code) makes intent debt worse (nobody can document what they don't understand). Intent debt makes context debt more expensive (agents fill gaps with plausibility rather than accuracy). Agentic and orchestration debt multiply as agent networks grow. Perfectionism debt makes cognitive debt harder to pay down (more code, more complexity, more to understand).

The compounding dynamic is why teams can look like they're winning on velocity while quietly accumulating a cleanup bill that will arrive without warning.

What Leaders Can Do

The good news: each debt type has a corresponding organizational lever. None of them require stopping AI adoption. They require governing it.

For Cognitive and Intent Debt: invest in practices, not just tools

The fix here is documentation norms and specification-first development. Every significant AI-generated architectural decision should have a human-authored record: why this approach, what was rejected, what constraints apply. This is a practice change, not a purchase. It requires management to set the expectation, make time for it, and hold teams accountable for it.

Ask your engineering leadership: do we have a standard for documenting intent when AI generates significant code? If the answer is no, the debt is accumulating.

For Agentic Debt: require governance before production deployment

No AI agent should enter production without version control for its configuration, cost quotas per task, timeout limits, and observable audit trails. This is DevOps discipline, applied to agents. The tooling exists. What's usually missing is the requirement to use it.

Ask: what would a "zombie loop" look like in our agent deployments? How would we detect it? How long would it run before anyone noticed? Can we set budgets at an agent, session, process, task, project level? Do we have any sort of real-time monitoring?

For Orchestration Debt: design before you scale

Multi-agent systems should be architected before they're deployed. Who owns each agent? What are the contracts between them? What happens when one agent's output becomes another agent's input -- and that output is wrong? These questions are much cheaper to answer before the network exists than after it fails.

Ask: what is our current map of agent-to-agent interactions? Does anyone own that map? Are interactions logged? Traceable?

For Context Debt: provide training and tooling

Developers who understand context management extract dramatically more value from AI tools than those who don't -- using identical software. This is a training investment with a measurable productivity return.

Ask: do we train developers on context management? Does our tooling show context consumption? Can teams see when they're approaching the limit?

For Perfectionism Debt: scope discipline up front

Define "done" before starting AI-assisted development cycles. Timebox refinement loops. Require human approval before AI generates anything the spec didn't request. This is a product management norm, not a technical one.

Ask: do our AI-assisted development cycles have explicit stopping criteria? Or does "good enough" get replaced by "even better" by default?

The Leadership Question Worth Asking

Most executives have been asking: "Are we moving fast enough with AI?"

The better question is: "Are we building debt faster than we can pay it back?"

Speed without governance accumulates all six categories simultaneously. The teams shipping fastest right now may be the ones with the largest future cleanup bills -- except these bills don't arrive as refactoring sprints. They arrive as incidents, outages, talent frustration, and "nobody knows how that works anymore."

The organizations that win long-term will have moved quickly AND built the governance frameworks that keep the debt manageable. Those aren't mutually exclusive goals. But they require explicit leadership, not just ambitious pilots.

The Bottom Line

Technical debt has a new face. Six new faces, actually. Your current metrics probably can't see any of them.

The tools your team is using to move faster are borrowing against future productivity, code quality, and system reliability. How much depends on the practices and governance you build around them. Ward Cunningham said debt is acceptable if you plan to pay it down. These new categories are harder to spot and harder to pay. Start managing them before they manage you.

How is your organization thinking about governance for AI-generated code and AI agent deployments? Have you seen any of these debt patterns emerge yet?

References

If this resonated, here are some related articles:

For why context debt is so expensive in practice, and what the most effective teams do about it: Context in Context: Why AI Tools Degrade Over Longer Work Sessions | Substack
For the mental model argument behind cognitive and intent debt: why AI coding requires three mental models simultaneously, and what happens when developers skip building them: The Typing Was Never the Job. Neither Is the Prompting.
For the governance framing: how evolving AI capabilities require evolving leadership strategy, from human-in-the-loop to human-before-the-loop: An Evolving Strategy for Knowledge Work | Substack
For why specification-first development -- the antidote to intent debt -- is also pushing AI teams back toward waterfall workflows: The Irony of AI Development: How Context Engineering Is Taking Us Back to Waterfall | Substack

Keith MacKay is a technology strategy consultant and CTO in EY-Parthenon's Software Strategy Group (SSG), specializing in AI disruption and technology diligence for private equity and corporate clients. SSG's AI Disruption Lab conducts rapid assessments of how AI transforms and threatens existing business models and value chains. Keith teaches at Northeastern University and writes about strategy, management, and AI/technology.

DEV Community

Your AI Team Is Building Debt Your CFO Can't See. Here's the Ledger.

Your AI Team Is Building Debt Your CFO Can't See. Here's the Ledger.

The Original Technical Debt, in Thirty Seconds

Six Flavors of AI-Era Technical Debt

1. Cognitive Debt: Your Team Is Losing Their Mental Map

2. Intent Debt: The "Why" Is Evaporating

3. Agentic Debt: Your Agents Are Running Up Bills You Can't See

4. Orchestration Debt: Networks of Agents Are Networks of Risk

5. Context Debt: Your AI Tools Are Leaking Productivity

6. Perfectionism Debt: AI Makes Over-Engineering Free

How These Debts Interact

What Leaders Can Do

The Leadership Question Worth Asking

The Bottom Line

References

Top comments (0)