Harsh

Posted on Mar 18

AI Is Creating a New Kind of Tech Debt — And Nobody Is Talking About It

#webdev #javascript #career #ai

Six months ago, my team was celebrating.

We had shipped more features in Q3 than in the entire previous year. Our velocity was through the roof. AI tools had transformed how we worked — what used to take a week was taking a day. What used to take a day was taking an hour.

Our CTO sent a company-wide Slack message: "This is what the future of engineering looks like."

Last month, we had to stop all feature development for three weeks.

Not because of a security breach. Not because of a server outage. Because our codebase had become so tangled with AI-generated code that nobody — not even the people who had "written" it — could confidently modify it anymore.

We had celebrated our way into a crisis.

And the worst part? I saw it coming. I just didn't know what I was looking at. 🧵

The New Tech Debt Nobody Named Until Now

Technical debt is old news. Every developer knows the feeling — rushing to ship, cutting corners, promising yourself you'll refactor later. The code works today. It'll be someone else's problem tomorrow.

AI tech debt is different. It's not about cutting corners. It's about moving so fast you lose the thread entirely.

There are actually three distinct types of AI technical debt accumulating in codebases right now — and most teams are experiencing all three simultaneously:

1. Cognitive Debt — shipping code faster than you can understand it

2. Verification Debt — approving diffs you haven't fully read

3. Architectural Debt — AI generating working solutions that violate the system's design

Most articles about AI and tech debt focus on code quality. That's the wrong level. The real crisis is happening one level up — in the minds of the developers who are supposed to understand the systems they're building.

The Moment I Understood What Was Happening

Let me tell you about the week everything clicked.

A new developer joined our team — let's call him Rahul. Bright, fast, clearly talented. He had been using Cursor and Claude Code aggressively since his first day.

After three weeks, I asked him to walk me through the authentication flow he had built.

He opened the files. Started explaining. Got to the token refresh logic and paused.

"Actually," he said, "I'm not entirely sure why it's structured this way. It worked when I tested it."

I wasn't angry. I recognized the feeling. It was the same feeling I had when I tried to debug my own AI-generated code and felt like I was reading someone else's work.

That conversation led me down a rabbit hole that changed how I think about AI tools entirely.

The Numbers That Explain the Crisis

Here's the data that should be front-page news in every developer community — and somehow isn't:

Developer trust in AI coding tools dropped from 43% to 29% in eighteen months. Yet usage climbed to 84%.

Read that again. Developers trust AI tools less than ever. They're using them more than ever. That gap — using tools you increasingly distrust — has a name now: cognitive debt.

It gets worse.

75% of technology leaders are projected to face moderate or severe debt problems by 2026 because of AI-accelerated coding practices.

And the one that hit me hardest:

One API security company found a 10x increase in security findings per month in Fortune 50 enterprises between December 2024 and June 2025. From 1,000 to over 10,000 monthly vulnerabilities. In six months.

Ten times more security vulnerabilities. In six months. In the largest companies in the world.

This is what happens when velocity becomes the only metric.

"I Used to Be a Craftsman"

One developer captured something important in a way I keep thinking about:

"I used to be a craftsman... and now I feel like I am a factory manager at IKEA."

That image stuck with me. Not because it's pessimistic — but because it's precise.

A factory manager at IKEA doesn't understand how every piece of furniture is built. They manage throughput. They watch for obvious defects. They trust the system.

That works for furniture. It doesn't work for software systems that handle user data, process payments, or run infrastructure that people depend on.

Software requires someone who understands it deeply enough to reason about what happens when things go wrong. The factory manager model — high throughput, shallow review — produces systems that nobody truly understands.

And systems that nobody understands break in ways that nobody can predict or fix quickly.

The Three Debt Types — In Plain English

Let me explain exactly what's accumulating in codebases right now.

1. Cognitive Debt — The Invisible Crisis

Margaret-Anne Storey described this perfectly: a program is not its source code. A program is a theory — a mental model living in developers' minds that captures what the software does, how intentions became implementation, and what happens when you change things.

AI tools push developers from create mode into review mode by default. You stop solving problems and start evaluating solutions someone else produced.

The issue is that reviewing AI output feels productive. You are reading code, spotting issues, making edits. But you are not building the mental model that lets you reason about the system independently.

A student team illustrated this perfectly — they had been using AI to build fast and had working software. When they needed to make a simple change by week seven, the project stalled. Nobody could explain design rationales. Nobody understood how components interacted. The shared theory of the program had evaporated.

// This code works. Can you explain why in 30 seconds?
// If you generated it with AI and didn't stop to understand it — 
// you've accumulated cognitive debt.

const processPayment = async (userId, amount, currency) => {
  const [user, rateLimit, fraud] = await Promise.all([
    db.users.findById(userId),
    redis.get(`rate:${userId}`),
    fraudService.check(userId, amount)
  ]);

  if (!user || rateLimit > 10 || fraud.score > 0.7) {
    throw new PaymentError(user ? 'RATE_LIMITED' : 'USER_NOT_FOUND');
  }

  // Can you spot the bug? What happens if fraud.score is exactly 0.7?
  // What if rateLimit is null?
  // AI generated this. Did you understand it before you shipped it?
};

2. Verification Debt — The False Confidence Trap

Every time you click approve on a diff you haven't fully understood, you're borrowing against the future.

Unlike technical debt — which announces itself through mounting friction, slow builds, tangled dependencies — verification debt breeds false confidence. The codebase looks clean. The tests are green.

Six months later you discover you've built exactly what the spec said — and nothing the customer actually wanted.

# The verification debt accumulates here:
# ✅ All tests passing
# ✅ No linting errors  
# ✅ Code review approved
# ✅ Deployed to production

# But nobody asked:
# ❌ Does this actually solve the user's problem?
# ❌ What happens in edge cases the AI didn't consider?
# ❌ Does this match our architecture patterns?
# ❌ Will the next developer understand this?

3. Architectural Debt — When Patterns Break Down

AI agents generate working code fast, but they tend to repeat patterns rather than abstract them. You end up with five slightly different implementations of the same logic across five files. Each one works. None of them share a common utility.

AI-generated code tends toward the happy path. It handles the cases the training data covered well — standard inputs, expected states, common error codes. Edge cases, race conditions, and infrastructure-specific failures get shallow treatment or none at all.

When an AI agent needs functionality, it reaches for a package. It doesn't weigh whether the existing codebase already handles the need, whether the dependency is maintained, or whether the package size is justified for a single function.

The result is what I'd call "coherent chaos" — code that's individually reasonable and collectively incoherent.

The Productivity Paradox — Why Faster Isn't Actually Faster

Here's the contradiction that nobody in leadership wants to hear:

AI coding tools write 41% of all new commercial code in 2026. Velocity has never been higher.

Yet experienced developers report a 19% productivity decrease when using AI tools, according to Stack Overflow analysis. And the majority of developers report spending more time debugging AI-generated code and more time resolving security vulnerabilities.

How can tools that generate code faster make developers slower?

Because writing code was never the bottleneck.

Understanding code is the bottleneck. Debugging code is the bottleneck. Modifying code you didn't write — or that you wrote but don't understand — is the bottleneck.

AI made the fast part faster. It made the slow parts slower.

The teams measuring AI adoption rates and feature velocity are optimizing for the wrong metrics. They're ignoring technical debt accumulation. The companies that rushed into AI-assisted development without governance are the ones facing crisis-level accumulated debt in 2026-2027.

What Actually Happens When Nobody Understands the Code

I want to be concrete about what this looks like in practice.

Scenario 1: The three-week freeze

That was us. Six months of AI-assisted velocity, followed by three weeks of complete stoppage because we needed to understand what we had built before we could safely change it.

Net velocity after accounting for the freeze: approximately zero gain over traditional development.

Scenario 2: The junior developer trap

54% of engineering leaders plan to hire fewer junior developers due to AI. But AI-generated technical debt requires human judgment to fix — precisely the judgment that junior developers develop through years of making mistakes and learning.

By eliminating junior positions, organizations are creating a future where they lack the human capacity to fix the debt being generated today.

The engineers needed in 2027 — those with 2-4 years of debugging experience — won't exist because they weren't hired.

Scenario 3: The security time bomb

One security company found that AI-assisted development led to code with 2.74x higher rates of security issues compared to human-written code. That debt doesn't announce itself. It sits in production, waiting.

How to Actually Fix This — Practically

After three weeks of painful debugging and refactoring, here's what my team changed:

1. Introduce the "Can You Debug It at 2am?" Rule

Before any AI-generated code gets merged, the author must be able to answer:

"If this breaks in production at 2am and pages you, can you debug it without looking at it again?"

If the answer is no — the code doesn't merge until the author understands it.

This one rule caught more problems in our first week than all our previous code review processes combined.

2. Separate "Generation Sessions" from "Understanding Sessions"

Monday: Use AI to generate the feature (fast)
Tuesday: Read every line without AI assistance (slow)
Wednesday: Refactor what you don't understand (medium)
Thursday: Test edge cases AI didn't consider (medium)
Friday: Merge

Slower in the short term. Dramatically faster over a six-month timeline.

3. Track Cognitive Debt — Not Just Code Quality

Add these questions to your sprint retrospectives:

Can every team member explain the core systems we shipped this sprint?
Are there modules that only one person understands?
Did we ship anything we couldn't confidently modify next week?

These aren't sentimental questions. They're risk assessments.

4. Treat AI Like a Brilliant Junior Developer

Powerful. Fast. Confident about things it shouldn't be confident about. Needs supervision on anything complex.

Junior developer rule:
✅ Use for boilerplate and scaffolding
✅ Use for well-understood patterns
✅ Use for test generation
⚠️ Review everything carefully
❌ Don't let them architect alone
❌ Don't merge code you can't explain
❌ Don't skip review because tests pass

Apply the same rules to AI. Because the stakes are the same.

The Uncomfortable Truth

Here's what nobody in the AI coding tool marketing wants you to hear:

The teams winning in 2026 are not the ones generating the most code. They are the ones generating the right code and maintaining the discipline to review, refactor, and architect around AI's output.

Clean, modular, well-documented systems let AI become a supercharger. Tangled, patchworked systems suffocate AI's value — and eventually suffocate the business trying to run them.

The irony of AI tech debt is this: the better your codebase, the more value you get from AI. The worse your codebase, the more damage AI does to it.

AI amplifies what's already there. Strong foundations get amplified into faster shipping. Weak foundations get amplified into faster debt accumulation.

And unlike traditional technical debt — which announces itself gradually through friction — AI technical debt can accumulate invisibly behind green test suites and high velocity metrics, right up until the moment it doesn't.

The Question That Changed How I Lead My Team

After our three-week freeze, my CTO asked a question in our retrospective that I haven't stopped thinking about:

"At what point did we stop building software and start just generating it?"

There's a difference. Building implies understanding. Generating implies throughput.

The future belongs to developers who do both — who use AI's generation speed without losing their own understanding.

That's not a warning against AI tools. It's an argument for using them with intention.

Generate fast. Understand everything.

Has your team hit an AI tech debt wall yet — or are you seeing the warning signs? I'd genuinely love to know how other teams are handling this. Drop your experience in the comments — especially if you've found systems that actually work. 👇

Heads up: AI helped me write this.Somewhat fitting given the topic — but the three-week freeze story, the Rahul conversation, and the lessons are all mine. I believe in being transparent about my process! 😊

Top comments (136)

Ben Halpern • Mar 18

Can every team member explain the core systems we shipped this sprint?
Are there modules that only one person understands?
Did we ship anything we couldn't confidently modify next week?

I think this applies extra to any teams which already struggled with these concepts: Which is most teams.

The bandaid is that the agent can explain away things people don't know, but it is a snowball effect if you let it get out of control!

Harsh • Mar 18

that snowball point is something i hadn't thought through clearly enough when writing this.

traditional debt at least gives you friction slow builds, tangled code, something that signals "fix me"

but when the agent explains away the gap so smoothly, you lose even that warning signal.

and the teams already struggling with knowledge silos like you said are probably the ones least likely to notice it happening.

makes me think the real fix isn't technical at all. it's cultural teams that have always valued "can everyone explain this?" will catch it. teams that haven't won't even see it coming.

really appreciate you adding this Ben 🙏

Ben Halpern • Mar 18

We used to have knowledge gaps, now we have runaway knowledge gaps.

Harsh • Mar 18

runaway knowledge gaps that's the phrase i was looking for the entire time i was writing this.

saving that one.

leob • Mar 18 • Edited

I'd say that you MUST slow down - going slower now will make you go faster later on :-)

My rules of thumb:

1) Unit tests FTW - in the "AI era", TDD is more important than ever

2) Don't accept the first version that's generated - iterate, and mold it until you're REALLY happy

3) Let others review it, not just yourself!

Harsh • Mar 18

going slower now will make you go faster later this is exactly the mindset shift that's hardest to sell to a team that's celebrating velocity metrics.

the TDD point is underrated honestly tests force you to understand what the code should do before the AI writes it. that's the cognitive debt fix hiding in plain sight.

Mykola Kondratiuk • Mar 19

the security piece is what i see most in the wild. been scanning ai-generated codebases for a few months now and the debt isn't in the logic - it's in all the tiny trust decisions the AI makes by default. broad permissions, open CORS, no input validation. each one is harmless-ish alone but they compound fast once real traffic hits. it's not even bad code per se, it's just code written by something with no blast radius intuition

Harsh • Mar 19

no blast radius intuition that's the most precise description of AI's security blind spot i've read.

it doesn't think in terms of what happens when this goes wrong at scale. broad permissions make sense in isolation. open CORS is convenient. no input validation is faster to write. none of them feel dangerous until they compound.

a human developer with production scars thinks about blast radius instinctively. AI has no scars. it has no memory of 3am incidents. and that absence shows up exactly where you're describing in all the small trust decisions that seem fine until they aren't.

Mykola Kondratiuk • Mar 19

"blast radius intuition" is such a good framing. ran into this exact thing - AI happily suggested wildcard CORS because it made the immediate thing work, zero consideration for what it enables. you have to keep pulling it back to the threat model. honestly feels like a separate review pass is just table stakes now.

Harsh • Mar 19

wildcard CORS because it made the immediate thing work that's the perfect example of AI optimizing for local correctness over global safety.

it solved the problem in front of it. it had no model of what that solution enables downstream.

keep pulling it back to the threat model is exactly the skill that can't be automated. you have to know what the threat model is before you can evaluate whether the code respects it. AI doesn't know your threat model. it doesn't even know one exists.

separate review pass as table stakes agreed. and i'd add: the reviewer needs to be someone who has actually been paged at 3am. otherwise they don't know what they're looking for.

Mykola Kondratiuk • Mar 19

"local correctness over global safety" - yeah that framing is really useful. I had a similar thing where the AI fixed my auth bug but introduced a timing issue that only showed up under load. it passed all the tests so it felt done. the threat model lens helps catch that kind of thing before you ship it

Sylwia Laskowska • Mar 18

Really great take 👏

What resonated with me the most is this idea that with AI we’re often removing the layer of understanding, not just speeding things up. The code “works”, but fewer and fewer people actually know why it works — and that’s where the real risk starts.

And the junior point hits hard. Not long ago, my company was actively training juniors and growing them into solid engineers. Now… honestly, I haven’t even heard the word “junior” in a while.

Feels like we’re optimizing for short-term velocity, while quietly cutting off the pipeline of people who would be able to understand and maintain these systems in the future.

Harsh • Mar 18

optimizing for short-term velocity while cutting off the pipeline that's the part that genuinely worries me most.

the junior developer point isn't just about jobs. it's about who fixes the mess in 5 years when nobody understands the systems AI built.

really appreciate you sharing this — that pipeline framing is something i'll be thinking about for a while.

Daniel Yarmoluk • Mar 20

My experience, and it's only my opinion, I think we are looking at these problems wrong. We need to love on models more, context more, that's the human part. The focus on the real problems with agents summarizing complicated value chains and win-win-win-win scenarios (employee-company-customer-market) and context and love on models, specifically, context and texture emulates the complicated ever-changing problem set we face. Scientific breakthroughs, and refining through context architecture (compressed to the new and improved .md file, long live the md file!) can further add texture and graph databases can layer on other graph databases for edges and nodes which is more token density (170X) through the context window. I'm way too busy working on problems for real people (feeding family, mom has cancer, buddy lost job, my brother makes 100K and still can't live in a studio newly divorced in SoCal stuff, rebuilding relationships).

Harsh • Mar 20

the context architecture point is real the quality of what AI produces scales directly with the quality of context you give it. most teams underfeed their models and then blame the output.

but the last paragraph is the most human thing in this thread.

all of this the tech debt debates, the AI tooling, the context windows it's all in service of the actual problems. feeding families. taking care of parents. helping friends land on their feet.

hope things ease up soon. the real problems are the ones worth solving.

Daniel Yarmoluk • Mar 20

Thanks for replying, least i'm not alone, and as we say in AA, there is power together.

Daniel Yarmoluk • Mar 20

because it was human, and my intention is mine...if AI wrote this, would you change what you thought of it?

Harsh • Mar 20

not alone at all and that question deserves an honest answer.

no, i wouldn't change what i thought of it. the value was in what was shared the real situations, the real people, the real weight of it. whether a human or AI typed those words, the meaning came from a life being lived.

but i'm glad it was you. that matters too.

Daniel Yarmoluk • Mar 20 • Edited

and that was a very nice note. note to world, that is how you can keep a "human in the loop", like what a horrible world choice, what about like human concern or something else. Intention/context = love on your model. How can we measure this? I'm up at 3:57am in Minneapolis, why? I care, it's my intention. You can also call it a high-fidelity b*****t meter in some "context", particularly for the AI sycophants.

Harsh • Mar 20

3:57am because you care that's the metric that doesn't fit in any dashboard.

you're right that human in the loop is a terrible phrase. it reduces people to a quality control step. "human concern" is closer it implies someone actually gives a damn about the outcome, not just the process.

the high-fidelity BS meter is real. and it only works if the person holding it actually cares enough to use it. that's the part that can't be automated.

hope you get some sleep. the world needs people who are up at 4am caring about things.

Daniel Yarmoluk • Mar 20

Preach brother

Max • Mar 25

The "can you debug at 2am?" standard is good, but I'd push it further: can you explain to your teammate what this code does without reading it? If not, you don't own it.

We've been running Claude Code as a daily pair programmer on a 111K-commit codebase for 85+ days. The cognitive debt is real — but we found the antidote isn't slowing AI down, it's making the AI narrate before acting. Every edit gets a one-sentence explanation of what's changing and why, before the change happens. The human reviews the intent, not just the diff.

The other thing we learned: static analysis isn't optional anymore. PHPStan, PHPMD, Rector — they're the AI's self-awareness, because the AI genuinely can't tell when its own quality is dropping. We can't either, until the pipeline goes red.

Harsh • Mar 25

Max, that ownership test hits different can you explain what this code does without reading it? That's a much higher bar than I set, and honestly a better one.

The narrate before acting pattern is something I haven't seen described this clearly before. Reviewing intent before the diff is a subtle but massive shift because by the time you're reading a diff, you're already in evaluation mode, looking for what's wrong. When you review the intent first, you're in thinking mode, asking whether the approach is even right. That's a completely different cognitive state.

85+ days on a 111K-commit codebase is serious real-world signal too. Most AI + code discussions are theoretical. Yours isn't.

The static analysis point is the one I'd underline twice. The AI genuinely can't tell when its own quality is dropping that's the part no amount of prompting fixes. PHPStan catching what the AI missed isn't a workaround, it's a necessary layer. The pipeline going red is often the only honest feedback the AI gets.

Thanks for bringing actual field data into this conversation this is exactly the kind of grounded insight the article needed.👍️

Max • Mar 27

The "reviewing intent before the diff" distinction is something we discovered by accident. The agent was required to narrate what it was about to change before making the edit — originally as a safety measure so the human could say "wait, no." But the side effect was better: the narration itself caught bad ideas. Writing "I'm about to add a caching layer to this endpoint" forces the agent to articulate why, and sometimes the answer is "actually, there's already one two files over."

The static analysis point is the one I feel strongest about. We've run three AI agents for months now, and the consistent pattern is: the agent's confidence doesn't correlate with its correctness. It sounds just as sure when it's right as when it's wrong. The pipeline going red is genuinely the only reliable signal. Without it, you're trusting vibes — and vibes scale terribly.

Appreciate the engagement — articles like yours are where the real conversation happens. The theoretical takes have their place, but the field data is what moves things forward.

Harsh • Mar 27

The discovered by accident detail is what makes this credible. The best guardrails usually aren't designed top-down, they emerge from teams noticing what actually works in practice.

The narration forcing articulation of why and sometimes revealing actually, there's already one two files over is essentially making the agent rubber duck itself before acting. It's not just a safety layer, it's a reasoning layer. That's a completely different thing.

The agent's confidence doesn't correlate with its correctness should be printed and put above every monitor in every team using AI agents right now. That's the core problem in one sentence. The pipeline going red being the only reliable signal means you've essentially offloaded the agent's quality awareness to the CI system entirely which works, but only if the CI system is comprehensive enough to catch what the agent confidently missed.

Three agents, months of real data, consistent pattern — this is the kind of signal that should be shaping how the industry talks about agent reliability. Not the demos, not the benchmarks. This.

jidonglab • Mar 22

one angle i don't see discussed enough: the context window itself is a form of tech debt in agent systems. every time you bolt on another tool or add more instructions to your agent pipeline, you're eating into the context budget. eventually the model starts dropping important context from earlier in the conversation and you get subtle failures that are way harder to debug than traditional code bugs.

the fix isn't just "write better prompts" — it's treating token usage like memory management. compress what you can, evict what you don't need, and monitor context utilization the same way you'd monitor RAM usage in a production service.

Harsh • Mar 22

treating token usage like memory management that's the framing that should be in every agent architecture guide written in 2026.

the parallel is exact. context windows have limits the way RAM has limits. when you exceed them, you don't get a clean error you get silent degradation. the model starts dropping earlier context the way a system under memory pressure starts evicting pages. and unlike RAM pressure, you don't get an out-of-memory exception. you get subtly wrong behavior that looks like correct behavior until it isn't.

the "bolt on another tool" accumulation is how it happens in practice. each tool feels free because it's just a few tokens. then you have twelve tools, a system prompt, conversation history, and retrieved context all competing for the same budget and the model is quietly making tradeoffs you didn't ask it to make.

monitor context utilization the same way you'd monitor RAM that's not a metaphor. that's literally the right engineering practice. token budgets, context compression between turns, eviction policies for stale context. this is infrastructure work, not prompt work.

genuinely thinking about this as a fifth debt type now alongside cognitive, verification, architectural, and context drift. token debt might be the right name for it.

jidonglab • Mar 22

token debt nails it as a name. the worst part is there's no stack trace — context overflow just silently degrades output quality and you don't notice until something breaks downstream. most teams have zero visibility into per-turn context utilization right now, which is exactly why it accumulates so fast.

SyntaxSeed (Sherri W) • Apr 6

I'm sorry but EVERYONE I know who is critical of AI code generation is talking about these problems. We're just being ignored.

You think it's bad now?? Wait until no one is left who knows how to hand-code & the LLM companies have you by the throat.

Harsh • Apr 7

You're right and I'm not here to argue with you. 🙏

The critics have been saying this for a while. The problem isn't that no one is talking. The problem is that decision-makers aren't listening.

Vendor lock-in at the skill level that's the real long-term risk. Not just API dependency, but a generation of developers who never learned to code without AI.

So here's my honest question: what do you think the community should actually do about it? Not just what to avoid what to build, teach, or demand?

I'm asking because I genuinely want to know. 🙌

SyntaxSeed (Sherri W) • Apr 7

I've been in this industry for over 2 decades. Nothing talks like money. Decision makers won't change the policy until it starts hurting the bottom line.

It's important for developers to not hide the pain that this mess is causing but working silent overtime, working through their lunches, etc. Make sure management feels the crash in productivity that happens once we cross this tipping point.

Harsh • Apr 7

Valid point, But most devs hide pain because they fear being replaced. How do we make the crash visible without individuals getting fired first?

Julien Avezou • Mar 19

This is a great breakdown of the accumulating tech debt in the AI frenzy we are witnessing. Knowledge sharing at a team level is so important and has always been a struggle with dev teams even before AI tooling came along, but its arguable even more important now.
Thanks for sharing Harsh.

Harsh • Mar 19

completely agree knowledge sharing was already the hardest unsolved problem in software teams before AI arrived.

what AI has done is accelerate the consequences. teams that had weak knowledge sharing before could still muddle through because the code was at least written by someone who understood it. now the code can be written by something that understands nothing and the knowledge sharing gap becomes existential, not just inconvenient.

the irony is that AI tools make it easier to generate documentation and explanations. the blocker was never the effort of writing it down. it was always the culture of valuing it.

thanks for reading and adding this the pre-AI context matters a lot. 🙏

Artem Koltunov • May 14

The Rahul story is what makes this concrete — "I'm not entirely sure why it's structured this way. It worked when I tested it." We had an almost identical moment. Our team ran three timed experiments with AI tools, and in the third one (Cursor, full SDK integration), the developer who committed the code couldn't explain why it built a complete fetch-download-reupload chain for an image when the ID was already in the response object. Tests green, code clean, understanding zero. Your three-category taxonomy nails the compounding effect: that architectural redundancy (architectural debt) passed review unchallenged (verification debt) because nobody had built the mental model to question it (cognitive debt). The 3-week freeze you describe is what happens when all three hit simultaneously. One thing I'd add: the trust numbers you cite (43% → 29% while usage climbed to 84%) suggest developers know something is wrong but can't articulate what. The debt is real, but invisible to every dashboard they have.

Harsh • May 14

Artem tests green, code clean, understanding zero That's the perfect summary of the problem. 🙏

That moment where the code works, the CI passes, and the person who wrote it can't explain why that's not a bug. That's a symptom And your team's experiment with Cursor is the clearest real-world example I've heard.

Architectural debt (wrong foundation) + cognitive debt (no mental model) = compounding effect.

You've nailed the interaction. They don't stay separate. One makes the other worse. Wrong architecture makes it harder to build a mental model. No mental model makes it easier to accept wrong architecture.

Trust numbers are real but invisible to the dashboards.

This is the part that keeps me up. The dashboards show green. The velocity is up. Everything looks fine. But trust is quietly eroding, and no ticket tracks it.

Thank you for adding the third experiment detail and for seeing the taxonomy clearly. 🙌

Artem Koltunov • May 15

Exactly, the feedback loop is what makes it insidious. We watched it happen in real time: after the third experiment, our dashboards showed the best sprint of the quarter. Velocity up, PR turnarounddown. And it's only when someone asks to explain the code in a module before making changes that you realize — nobody can, only by prompting the AI. The metric missing from every tool: "can the author explain this without opening the file?" Or even with opening the file, but without prompting the AI.

View full discussion (136 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.