Your team shipped more features last quarter than any quarter before. The AI coding tools are working. Everyone feels faster.
Then you look at the codebase six months later and nothing makes sense.
GitClear analyzed 211 million changed lines of code across repositories from Google, Microsoft, and Meta between 2020 and 2024. Their finding: copy-pasted code rose from 8.3% to 12.3% of all changes, while refactored code dropped from 25% to under 10%. Code duplication blocks increased eightfold. The codebase is growing, but the architecture is rotting.
This is not traditional tech debt. Traditional debt comes from shortcuts under deadline pressure. AI-generated tech debt comes from code that works, passes tests, and reads fine — but lacks architectural judgment.
The Measurement Problem
Ox Security analyzed 300 repositories and found 10 recurring anti-patterns in 80-100% of AI-generated code. The top offenders: excessive commenting (90-100% of repos), avoidance of refactoring (80-90%), and duplicated bug patterns across files (80-90%). They called AI-generated code "highly functional but systematically lacking in architectural judgment."
The METR study made this concrete. Sixteen experienced open-source developers (averaging 22,000+ star repositories) were randomly assigned tasks with and without AI tools. The result: developers using AI took 19% longer to complete tasks. But when surveyed afterward, those same developers estimated they were 20% faster. The perception gap was 39 percentage points.
If your team cannot measure the debt, they cannot manage it. Here are five detection patterns that surface AI-generated tech debt before it compounds.
Pattern 1: Cyclomatic Complexity Drift Detection
AI-generated code tends to solve problems by adding conditions rather than abstracting patterns. A function that started at complexity 5 slowly grows to 15 as the AI adds edge case handling inline rather than extracting helper functions.
Track complexity over time, not just at a single point.
"""
complexity_tracker.py — Track cyclomatic complexity drift per function.
Requires: pip install radon
Radon docs: https://radon.readthedocs.io/
"""
import json
import subprocess
import sys
from datetime import date
from pathlib import Path
def get_complexity(source_dir: str) -> list[dict]:
"""Run radon cc and return per-function complexity scores."""
result = subprocess.run(
["radon", "cc", source_dir, "-j", "-n", "C"],
capture_output=True, text=True, check=True,
)
raw = json.loads(result.stdout)
functions = []
for filepath, blocks in raw.items():
for block in blocks:
functions.append({
"file": filepath,
"name": block["name"],
"complexity": block["complexity"],
"lineno": block["lineno"],
})
return functions
def load_baseline(path: Path) -> dict:
"""Load previous complexity snapshot."""
if path.exists():
return json.loads(path.read_text())
return {}
def detect_drift(baseline: dict, current: list[dict], threshold: int = 3) -> list[dict]:
"""Flag functions whose complexity increased beyond threshold."""
alerts = []
for func in current:
key = f"{func['file']}::{func['name']}"
prev = baseline.get(key, {}).get("complexity", func["complexity"])
delta = func["complexity"] - prev
if delta >= threshold:
alerts.append({
"function": key,
"was": prev,
"now": func["complexity"],
"delta": delta,
"line": func["lineno"],
})
return alerts
def save_snapshot(functions: list[dict], path: Path) -> None:
"""Save current complexity as the new baseline."""
snapshot = {}
for f in functions:
key = f"{f['file']}::{f['name']}"
snapshot[key] = {
"complexity": f["complexity"],
"date": str(date.today()),
}
path.write_text(json.dumps(snapshot, indent=2))
if __name__ == "__main__":
source = sys.argv[1] if len(sys.argv) > 1 else "src"
baseline_path = Path(".complexity-baseline.json")
current = get_complexity(source)
baseline = load_baseline(baseline_path)
alerts = detect_drift(baseline, current)
if alerts:
print(f"Found {len(alerts)} complexity drift alerts:")
for a in alerts:
print(f" {a['function']} line {a['line']}: "
f"{a['was']} -> {a['now']} (+{a['delta']})")
sys.exit(1)
else:
print(f"No drift detected across {len(current)} functions.")
save_snapshot(current, baseline_path)
Run this in CI on every pull request. When a function's complexity jumps by 3 or more since the last baseline, the build flags it. The developer must either justify the increase or refactor before merging.
The threshold of 3 is deliberate. A single if adds 1 point. Three conditional branches added to one function in a single PR almost always means inline logic that should be extracted.
Pattern 2: Clone Detection With Structural Matching
AI models generate code by statistical prediction. When similar problems appear in different parts of a codebase, the model generates similar — but not identical — solutions. These near-duplicates are harder to find than exact copies.
jscpd (copy-paste detector) catches both exact and near-duplicates across 150+ languages.
# Install: npm install -g jscpd
# Docs: https://github.com/kucherenko/jscpd
# Scan your source directory for duplicates
jscpd ./src --min-lines 5 --min-tokens 50 --reporters consoleFull
# Output shows duplicate blocks with file locations:
# Clone found (Python):
# src/auth/login.py [10:25]
# src/auth/register.py [15:30]
# Lines: 15, Tokens: 89
# Set a duplication threshold for CI
# Configure in .jscpd.json: {"threshold": 5}
jscpd ./src --threshold 5 --reporters consoleFull
The --threshold flag turns this into a CI gate. GitClear's data shows the industry average crossed 12% duplication in 2024. Set your threshold at your current level and ratchet it down each quarter.
For Python-specific detection, pylint has a built-in duplicate checker:
# Uses Pylint's similarity checker across your codebase
# Docs: https://pylint.readthedocs.io/
pylint --disable=all --enable=duplicate-code src/
Both approaches complement each other. jscpd catches structural similarity across languages. Pylint catches Python-specific patterns like duplicated class hierarchies and repeated decorator chains.
Pattern 3: Dead Code Accumulation Tracking
AI assistants frequently generate utility functions, helper classes, and imports that the final implementation never uses. Over weeks of AI-assisted development, dead code accumulates silently.
vulture detects unused Python code by analyzing ASTs:
# Install: pip install vulture
# Docs: https://github.com/jendrikseipp/vulture
# Scan for dead code with 80% confidence threshold
vulture src/ --min-confidence 80
# Output:
# src/utils/helpers.py:45: unused function 'format_response' (90% confidence)
# src/models/user.py:12: unused import 'Optional' (100% confidence)
# src/api/routes.py:89: unused variable 'temp_cache' (80% confidence)
The confidence scoring matters. At 100%, vulture is certain the code is unreachable within the analyzed files. At 60%, there might be dynamic usage the static analysis missed. Start at 80% for CI gates and 60% for manual review.
Track dead code percentage over time:
"""
dead_code_tracker.py — Track dead code accumulation over time.
Requires: pip install vulture
"""
import subprocess
import json
from datetime import date
from pathlib import Path
def count_dead_code(source_dir: str, min_confidence: int = 80) -> dict:
"""Run vulture and count findings by type."""
result = subprocess.run(
["vulture", source_dir, f"--min-confidence={min_confidence}"],
capture_output=True, text=True,
)
lines = result.stdout.strip().split("\n") if result.stdout.strip() else []
counts = {"unused_function": 0, "unused_import": 0, "unused_variable": 0, "other": 0}
for line in lines:
if "unused function" in line:
counts["unused_function"] += 1
elif "unused import" in line:
counts["unused_import"] += 1
elif "unused variable" in line:
counts["unused_variable"] += 1
else:
counts["other"] += 1
counts["total"] = len(lines)
counts["date"] = str(date.today())
return counts
def append_history(counts: dict, history_path: Path) -> None:
"""Append today's count to the tracking history."""
history = []
if history_path.exists():
history = json.loads(history_path.read_text())
history.append(counts)
history_path.write_text(json.dumps(history, indent=2))
if __name__ == "__main__":
counts = count_dead_code("src")
append_history(counts, Path(".dead-code-history.json"))
print(f"Dead code: {counts['total']} findings "
f"({counts['unused_function']} functions, "
f"{counts['unused_import']} imports, "
f"{counts['unused_variable']} variables)")
When dead code count climbs week over week, something is generating code nobody uses. That is the signal to review AI-assisted PRs more carefully.
Pattern 4: Refactoring Ratio Gate
GitClear's most striking finding was not that duplication increased — it was that refactoring collapsed. From 25% of all code changes in 2021 to under 10% in 2024. AI tools generate new code. They rarely suggest consolidating existing code.
Measure the ratio of refactoring to new code in every sprint:
"""
refactor_ratio.py — Measure refactoring vs new code ratio from git history.
Uses git log to classify commits as refactoring or feature work.
"""
import subprocess
import re
import sys
def get_recent_commits(days: int = 14) -> list[str]:
"""Get commit messages from the last N days."""
result = subprocess.run(
["git", "log", f"--since={days} days ago",
"--pretty=format:%s", "--no-merges"],
capture_output=True, text=True, check=True,
)
return [line.strip() for line in result.stdout.split("\n") if line.strip()]
def classify_commits(messages: list[str]) -> dict:
"""Classify commits as refactor, feature, fix, or other."""
refactor_patterns = re.compile(
r"refactor|extract|consolidate|simplify|rename|restructure|deduplicate|cleanup|clean up",
re.IGNORECASE,
)
feature_patterns = re.compile(
r"add|implement|create|build|introduce|new|feature",
re.IGNORECASE,
)
fix_patterns = re.compile(r"fix|bug|patch|resolve|hotfix", re.IGNORECASE)
counts = {"refactor": 0, "feature": 0, "fix": 0, "other": 0}
for msg in messages:
if refactor_patterns.search(msg):
counts["refactor"] += 1
elif feature_patterns.search(msg):
counts["feature"] += 1
elif fix_patterns.search(msg):
counts["fix"] += 1
else:
counts["other"] += 1
return counts
def compute_ratio(counts: dict) -> float:
"""Compute refactoring ratio as percentage of total commits."""
total = sum(counts.values())
if total == 0:
return 0.0
return (counts["refactor"] / total) * 100
if __name__ == "__main__":
days = int(sys.argv[1]) if len(sys.argv) > 1 else 14
commits = get_recent_commits(days)
counts = classify_commits(commits)
ratio = compute_ratio(counts)
print(f"Last {days} days: {len(commits)} commits")
print(f" Refactoring: {counts['refactor']} ({ratio:.1f}%)")
print(f" Features: {counts['feature']}")
print(f" Fixes: {counts['fix']}")
print(f" Other: {counts['other']}")
if ratio < 15:
print(f"\nRefactoring ratio ({ratio:.1f}%) is below 15% threshold.")
print("Consider scheduling dedicated refactoring time.")
This is a proxy metric. Commit messages are noisy. But the trend matters more than any single measurement. If your refactoring ratio drops below 15% for three consecutive sprints, your codebase is accumulating structural debt regardless of the source.
The fix is not to stop using AI tools. The fix is to schedule explicit refactoring time — separate from feature work, tracked separately in your sprint. AI tools generate. Humans consolidate. Both steps are necessary.
Pattern 5: Architectural Boundary Enforcement
AI-generated code does not respect module boundaries. A function in src/auth/ might import directly from src/billing/ because the model saw that pattern somewhere in its training data. Over time, the dependency graph becomes a web.
Enforce boundaries with import rules:
"""
boundary_check.py — Enforce architectural boundaries via import analysis.
Uses Python's ast module (standard library) to parse imports.
"""
import ast
import sys
from pathlib import Path
# Define allowed imports between modules.
# Each key is a module, values are modules it MAY import from.
ALLOWED_IMPORTS = {
"auth": {"models", "utils", "config"},
"billing": {"models", "utils", "config"},
"api": {"auth", "billing", "models", "utils", "config"},
"models": {"utils", "config"},
"utils": {"config"},
"config": set(),
}
def get_module_name(filepath: Path, src_root: Path) -> str:
"""Extract the top-level module name from a file path."""
relative = filepath.relative_to(src_root)
return relative.parts[0] if len(relative.parts) > 1 else ""
def check_imports(filepath: Path, src_root: Path) -> list[dict]:
"""Parse a Python file and check imports against boundary rules."""
module = get_module_name(filepath, src_root)
if module not in ALLOWED_IMPORTS:
return []
violations = []
source = filepath.read_text()
tree = ast.parse(source, filename=str(filepath))
for node in ast.walk(tree):
target = None
if isinstance(node, ast.Import):
for alias in node.names:
parts = alias.name.split(".")
if parts[0] in ALLOWED_IMPORTS and parts[0] != module:
target = parts[0]
elif isinstance(node, ast.ImportFrom):
if node.module:
parts = node.module.split(".")
if parts[0] in ALLOWED_IMPORTS and parts[0] != module:
target = parts[0]
if target and target not in ALLOWED_IMPORTS.get(module, set()):
violations.append({
"file": str(filepath),
"line": node.lineno,
"module": module,
"imports": target,
"allowed": sorted(ALLOWED_IMPORTS[module]),
})
return violations
def scan_directory(src_root: Path) -> list[dict]:
"""Scan all Python files for boundary violations."""
all_violations = []
for pyfile in src_root.rglob("*.py"):
all_violations.extend(check_imports(pyfile, src_root))
return all_violations
if __name__ == "__main__":
src = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("src")
violations = scan_directory(src)
if violations:
print(f"Found {len(violations)} boundary violations:")
for v in violations:
print(f" {v['file']}:{v['line']} — "
f"'{v['module']}' imports '{v['imports']}' "
f"(allowed: {v['allowed']})")
sys.exit(1)
else:
print("No boundary violations found.")
The ALLOWED_IMPORTS dictionary is your architecture. When the AI generates an import that crosses a boundary, the check fails. The developer must either fix the import or update the architecture — both of which force a deliberate decision.
This pattern scales. Start with top-level module boundaries. Add sub-module rules as the codebase grows. The AI does not know your architecture. This tool enforces it.
Putting It Together: The CI Pipeline
Each pattern works independently. Together, they form a debt detection pipeline:
# .github/workflows/debt-detection.yml
name: Tech Debt Detection
on: [pull_request]
jobs:
complexity-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install radon
- run: python complexity_tracker.py src
clone-detection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: npm install -g jscpd
- run: jscpd ./src --threshold 5
dead-code:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install vulture
- run: vulture src/ --min-confidence 80
refactor-ratio:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- run: python refactor_ratio.py 14
boundary-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: python boundary_check.py src
None of these tools know whether code was written by a human or an AI. They measure structural quality. That is the point. The source does not matter. The architecture does.
What This Means for Your Team
The research is clear. AI coding tools increase output velocity. They also increase structural debt. The METR study showed experienced developers were 19% slower with AI tools while believing they were 20% faster — a 39 percentage point perception gap.
This does not mean AI tools are bad. It means teams need to pair generation speed with detection systems. The five patterns above give you concrete metrics: complexity drift, duplication percentage, dead code count, refactoring ratio, and boundary violations.
Track these metrics every sprint. Set thresholds. Ratchet them tighter over time. AI generates code faster than humans. Humans still need to maintain the architecture.
The teams that ship fast in 2026 will not be the ones that generate the most code. They will be the ones that detect and resolve structural debt before it compounds.
Follow @klement_gunndu for more AI engineering content. We're building in public.
Top comments (0)