DEV Community: Peng Cao

Part 5: Closing the Loop (Git as a Runtime)

Peng Cao — Tue, 28 Apr 2026 11:18:00 +0000

Part 5 of our series: "The Agentic Readiness Shift: Building for Autonomous Engineers."

Read Part 4: The Neural Spine

Reasoning is Not Deployment

Generating a Terraform snippet or a CloudFormation template is easy. Ensuring that snippet is valid, syntactically correct, and compatible with your existing stack is where 99% of AI automation fails. Most systems are "opinionated but unverified"—they hope for the best and leave the human to clean up the mess when the cloud provider throws a validation error.

In an autonomous engineering system, we treat deployment as a first-class citizen of the reasoning process. The engine doesn't just "think" about infrastructure; it executes it via The Coder Loop.

The JIT Infrastructure Engine

We chose SST v4 (built on Pulumi) because it allows for Just-In-Time (JIT) infrastructure mutations. Unlike traditional IaC tools that require slow planning phases and manual approval, SST v4 gives our autonomous agents the ability to define and deploy resources in a sub-second loop.

When the architect agent pulses a PATCH_PLANNED event, the coder agent ingests the intent and translates it into TypeScript-based infrastructure code that is immediately deployable.

Verified Mutation (The Coder Gate)

The agent doesn't just push code and pray. It runs a local synthesis check to ensure the SST v4 definition is valid. If the synthesis fails, it emits a REASONING_ERROR back to the neural spine, triggering a reflection loop for the architect to try again.

// Synthesizing JIT Concurrency Scaling...
const api = new sst.aws.ApiGatewayV2('MyApi');
api.route('POST /submit', {
  handler: 'api/handler.handler',
  transform: {
    function: {
      reservedConcurrency: 100, // Mutated from 10 via Reflector SCR
    },
  },
});
// synthesis status: VALIDATED_OK
// executing: sst deploy --stage production

Safety First

Giving a machine the keys to your AWS account is terrifying. That's why every loop is wrapped in Recursion Guards and VPC Isolation. An autonomous system must be unkillable, but it must also be bounded.

The loop is closed. The agent is no longer an advisor; it is an operator.

Next up: Part 6: "Cognitive Tiering (Multi-Headed Brain)"

EventBridge: The Neural Spine

Peng Cao — Thu, 23 Apr 2026 11:30:56 +0000

This is Part 4 of our series: *

"The Agentic Readiness Shift: Building for Autonomous Systems."*

Read Part 3: The Economic Moat

Mapping the ClawFlow mesh. How asynchronous events allow decoupled agents to coordinate without a central controller.

01. The Monolith Problem

Traditional automation scripts are monolithic. They follow a rigid, linear execution path: A must finish before B can start. In the world of autonomous infrastructure, this is fatal. If the Coder agent is busy committing a patch, the Reflector shouldn't stop monitoring for new gaps.

We needed a nervous system—a way for agents to "pulse" their intent across the entire cluster without waiting for a response.

02. ClawFlow: Decoupled Autonomy

Enter ClawFlow. Built on AWS EventBridge, it's a decentralized mesh where every action is a discrete event. When the Reflector identifies a performance bottleneck, it doesn't "call" the Architect. It emits a GAP_DETECTED event to the neural spine.

Any agent tuned to that frequency can react. The Architect picks up the signal, designs a solution, and pulses a MUTATION_PLANNED event.

The flow looks like this:

[REFLECTOR] → NEURAL_BUS_STREAM → [ARCHITECT]
                                 ↘ [CODER]

Events: GAP_DETECTED → PATCH_PLANNED → GIT_COMMIT

03. Unlimited Breadth

This asynchronous nature gives serverlessclaw what we call Unlimited Breadth. Because there is no central controller, we can scale sub-agents horizontally across the AWS global infrastructure. A mutation happening in ap-southeast-2 can trigger a security reflection in us-east-1 in milliseconds.

04. The Next Evolution

Having a neural spine is one thing; having a "conscience" is another. In the next post, we'll explore The Reflector—the autonomous critique mechanism that ensures the engine doesn't just act, but understands why it acts.

Next up: Part 5: "The Reflector: Machines that Self-Critique"

[!TIP]
Ready for Autonomous Infrastructure?
Check out our open-source project serverlessclaw or try the managed ClawMore service for instant agentic readiness.

The Economic Moat (Quantifying your AI ROI)

Peng Cao — Sun, 19 Apr 2026 04:31:59 +0000

This is the surprise "Director's Cut" finale of our "Agentic Readiness" series. We've talked about the "Wall" and the "Tsunami." Today, we talk about the Bottom Line.

In the early days of AI coding, the conversation was all about Velocity. "How fast can I ship?"

But as we enter the era of millions of lines of AI-generated code, the conversation is shifting to Economics. Forward-thinking engineering leaders aren't just asking "how fast"—they are asking "at what cost?"

1. The Context Tax is Real

Every time an AI agent helps you build, you are paying a "Context Tax."

It’s the cost of the tokens the agent needs to "read" your file.
It’s the tokens wasted on 500-line type files that aren't relevant.
It’s the \$4.95 spent on navigation for a \$0.05 fix.

In a small project, this is pocket change. In an enterprise monorepo, it’s a recurring operational expense.

AIReady is the first tool suite that turns Technical Debt into a Token Budget.

2. Refactoring as a Financial Imperative

We’ve traditionally seen refactoring as a "nice to have" when we have extra time. AIReady changes that.

By linking our Fragmentation Score to real-world model pricing (Claude 4.6, GPT-5.4), we can show you exactly how much money you’re leaving on the table.

High Fragmentation = High Token Attrition = Higher Burn.
High Semantic Duplication = AI Hallucinations = Wasted Developer Review Time.

When you run aiready analyze --business, you aren't just getting code smells; you're getting a ledger.

3. The Only Moat That Matters

Models are becoming commodities. Gemini, Claude, and GPT will eventually reach parity on reasoning.

When intelligence is cheap, the competitive advantage shifts to Context.

The team with the most "Agentic Ready" codebase—the one with the lowest Navigation Tax and highest Signal Clarity—will move 10x faster because their agents are smarter for cheaper.

Your codebase is your intelligence moat. Keep it clean, or your agents will drown in the noise.

Don't wait for the competition. Start quantifying your AI Economics today:
npx @aiready/cli scan --score

Peng Cao is the founder of aiready.

[!TIP]
Ready for Autonomous Infrastructure?
Check out our open-source project serverlessclaw or try the managed ClawMore service for instant agentic readiness.

The Future is Friendly Code For AI and Humans

Peng Cao — Sun, 19 Apr 2026 04:21:50 +0000

This is the final part of our "AI Code Debt Tsunami" series. We've explored the hidden costs of AI-assisted development, the metrics that matter, and the tools to visualize and manage debt. Today, we look forward to the future of software development in an AI-first world.

For the past seven weeks, we’ve been dissecting a quiet crisis: the explosion of unmanaged, AI-generated code that is currently flooding our repositories. We’ve called it the AI Code Debt Tsunami.

We’ve seen how:

Semantic Duplicate Detection identifies logic that’s been rewritten in five different ways.
Context Budgeting reveals the hidden token cost of deep import chains.
Visualization turns abstract architectural decay into impossible-to-ignore physical shapes.

But as we conclude this series, I want to move away from the "debt" metaphor and talk about something more optimistic: The Convergence.

The Convergence: AI-Friendly is Human-Friendly

For years, "clean code" was defined by what made it readable for humans. We optimized for clarity, maintainability, and cognitive load.

Then came the AI era. Suddenly, we started optimizing for "vibe"—getting the AI to generate something that works now, regardless of its structural integrity. This created a rift between code that ships fast and code that lasts.

But here is the secret we've discovered while building AIReady:

The same patterns that make a codebase readable for an AI are the same patterns that make it manageable for a human.

When you reduce import depth to save AI tokens, you're actually reducing cognitive load for the next developer who has to touch that file. When you eliminate semantic duplicates to prevent AI hallucinations, you’re actually enforcing the "Don't Repeat Yourself" (DRY) principle that makes your system easier to test.

Making code AI-ready isn’t a separate chore. It’s the ultimate forcing function for good engineering.

What's Next for AIReady?

We started this project to help teams measure what traditional tools missed. But measurement is only the first step. Here is what we're building next:

1. Auto-Remediation Plans

Identifying a "Hairball" or an "Orphan" is great, but fixing it is hard. We’re working on AI-powered refactoring agents that can take an AIReady report and generate a step-by-step migration plan—automating the cleanup as fast as the debt was created.

2. The Visual Orchestrator

Our D3-based visualizer is evolving from a static map into a control center. Imagine dragging nodes on the graph to propose architectural changes, and having the AI automatically rewrite the imports and move the files to match the new "shape."

3. Continuous Integration Benchmarking

We’re launching a SaaS tier that tracks your AI-readiness score over time. Every PR will get a "Context Delta"—exactly how many tokens this change adds or removes from your global context budget.

A Vision for the Future

The future of software isn't "No Code." It’s High-Context Code.

The teams that win in the next decade won't be the ones who generate the most lines of code. They will be the ones who maintain the leanest, highest-context repositories. They will be the teams whose codebases are so "human-friendly" (and thus AI-friendly) that their AI assistants can operate with 99% accuracy because they are never confused by fragmentation or duplicates.

At AIReady, our goal is to provide the toolkit for this transition. We believe that by measuring the invisible, we can build systems that are better for the humans who write them and the machines that help us scale.

Join the Journey

The AIReady CLI will always be open-source. We built it in public, and we want you to help us define the next set of metrics.

Run the scan: npx aiready analyze
Visualize your debt: npx aiready visualise
Contribute: Join us on GitHub.

Thank you for following along with this series. The tsunami is here, but together, we can learn to surf it.

Peng Cao is the founder of receiptclaimer and creator of aiready, an open-source suite for measuring and optimising codebases for AI adoption.

Visualizing the Invisible: Seeing the Shape of AI Code Debt

Peng Cao — Sat, 18 Apr 2026 10:43:54 +0000

When we talk about technical debt, we usually talk about lists. A linter report with 450 warnings. A backlog with 32 "refactoring" tickets. A SonarQube dashboard showing 15% duplication.

But for AI-generated code, lists are deceiving. "15 duplicates" sounds manageable—until you realize they are all slight variations of your core authentication logic spread across five different micro-frontends.

Text-based metrics fail to convey structural complexity. They tell you what is wrong, but not where it fits in the bigger picture. In the age of "vibe coding," where code is generated faster than it can be read, we need a new way to understand our systems. We need to see the shape of our debt.

The Solution: Introducing the AIReady Visualizer

To tackle this, we've built the AIReady Visualizer. It's not just another static dependency chart; it’s an interactive, force-directed graph that maps file dependencies and semantic relationships in real-time.

By analyzing import statements and semantic similarity (using vector embeddings), we render your codebase as a living organism. When you see your code as a graph, the "invisible" structural problems of AI code debt suddenly become obvious visual patterns.

The Shape of Debt: 3 Visual Patterns

When we run the visualizer on "vibe-coded" projects, three distinct patterns emerge—each signaling a different kind of risk.

1. The Hairball (Tightly Coupled Modules)

What it looks like: A dense, tangled mess of nodes where everything imports everything else. There are no clear layers or boundaries.

The Problem: This pattern kills AI context windows. When an AI agent tries to modify one file in a "Hairball," it often needs to understand the entire tangle to avoid breaking things. Pulling one file into context pulls the whole graph, leading to token limit exhaustion or hallucinated dependencies.

The Fix: You need to refactor by breaking cycles and enforcing strict module boundaries. The visualizer helps identify the "knot" that holds the hairball together.

2. The Orphans (Islands of Dead Code)

What it looks like: Small clusters or individual nodes floating completely separate from the main application graph.

The Problem: These are often fossils of abandoned AI experiments—features that were generated, tested, and forgotten, but never deleted. They bloat the repo size and confuse developers ("What is this legacy-auth-v2 folder doing?"). More dangerously, they can be "hallucinated" back to life if an AI agent mistakenly imports them.

The Fix: If it's not connected to the entry point, delete it. The visualizer makes finding these islands trivial.

3. The Butterflies (High Fan-In/Fan-Out)

What it looks like: A single node with massive connections radiating out (high fan-out) or pointing in (high fan-in). Often seen in files named utils/index.ts or types/common.ts.

The Problem: These files are bottlenecks and context bloat.

High Fan-In: Changing this file breaks everything. AI agents struggle to predict the blast radius of changes here.
High Fan-Out: Importing this file brings in a massive tree of unnecessary dependencies, polluting the AI's context window with irrelevant code.

The Fix: Split these "god objects" into smaller, deeper modules.

How It Works

Under the hood, the AIReady Visualizer combines two powerful tools:

@aiready/graph: Our analysis engine that parses TypeScript/JavaScript ASTs to build a precise dependency graph. It creates a weighted network of files based on import strength and semantic similarity.
D3.js: We use D3's force simulation to render this network. Files that are tightly coupled naturally pull together, while unrelated modules drift apart, physically revealing the architecture (or lack thereof).

Use Case: Bridging the "Vibe" Gap

We're seeing a growing divide in engineering teams:

The "Vibe Coders": Junior devs or founders using AI to ship features at breakneck speed. Their focus is output.
The Engineering Managers: Seniors trying to maintain stability and scalability. Their focus is structure.

The visualizer bridges this gap. It's hard to explain abstract architectural principles to a junior dev who just wants to "ship it." It's much easier to show them a giant, tangled "Hairball" and say, "See this knot? This is why your build takes 15 minutes and why the AI keeps getting confused."

Visuals turn abstract "best practices" into concrete, observable reality.

See Your Own Codebase

Don't let your codebase become a black box. You can visualize your own project's shape today.

Run the analysis on your repository:

npx aiready visualise

Stop guessing where the debt is. Start seeing it.

Read the full series:

Part 1: The AI Code Debt Tsunami is Here (And We're Not Ready)

Part 2: Why Your Codebase is Invisible to AI
Part 3: AI Code Quality Metrics That Actually Matter
Part 4: Deep Dive: Semantic Duplicate Detection
Part 5: The Hidden Cost of Import Chains
Part 6: Visualizing the Invisible ← You are here

The Hidden Cost of Import Chains

Peng Cao — Sat, 04 Apr 2026 12:55:43 +0000

You open a seemingly simple file in your codebase:

// src/api/user-profile.ts (52 lines)
import { validateUser } from './validators';
import { formatResponse } from './formatters';
import { logRequest } from './logger';

export async function getUserProfile(userId: string) {
  validateUser(userId);
  const user = await fetchUser(userId);
  logRequest('getUserProfile', userId);
  return formatResponse(user);
}

Looks clean, right? Just 52 lines, three imports, straightforward logic. But when your AI assistant tries to understand this file, here's what actually gets loaded into its context window:

src/api/user-profile.ts           52 lines    1,245 tokens
  └─ validators.ts                 89 lines    2,134 tokens
       └─ validation-rules.ts      156 lines   3,721 tokens
       └─ error-types.ts            41 lines     982 tokens
  └─ formatters.ts                 103 lines   2,456 tokens
       └─ format-utils.ts           78 lines    1,867 tokens
  └─ logger.ts                      67 lines    1,603 tokens
       └─ log-transport.ts          124 lines   2,967 tokens
       └─ log-formatter.ts          91 lines    2,178 tokens

Total: 801 lines, 19,153 tokens

Your 52-line file just became a 19,153-token context load. That's 366x more expensive than it appears. And your AI assistant has to load all of this to understand your simple function.

This is the hidden cost of import chains—and it's one of the biggest reasons AI struggles with your codebase.

The Context Window Crisis

Every import creates a cascading context cost:

Direct dependencies: Files you import
Transitive dependencies: Files your imports import
Type dependencies: Interfaces and types needed for understanding
Implementation depth: How deep the chain goes

Modern AI models have context windows of 128K-1M tokens. Sounds like a lot, right? But in a real codebase:

Average file: 200-300 lines = 4,800-7,200 tokens
With direct imports: 800-1,200 lines = 19,200-28,800 tokens
With deep chains: 2,000+ lines = 48,000+ tokens
Multiple related files: Context exhaustion

Suddenly that 128K context window doesn't feel so spacious. Add a few related files to analyze a feature, and your AI is already hitting limits—or worse, truncating critical context.

Real-World Impact: The receiptclaimer Analysis

When I ran @aiready/context-analyzer on receiptclaimer's codebase, I discovered patterns that shocked me:

Before Refactoring:

Average context budget per file: 12,450 tokens
Maximum depth: 7 levels
Fragmented domains: 4 (User, Receipt, Auth, Payment)
Low cohesion files: 23 (43% of analyzed files)

Top offenders:
- src/api/receipt-processor.ts: 47,821 tokens (cascade depth: 7)
- src/services/user-service.ts: 38,945 tokens (cascade depth: 6)
- src/api/payment-handler.ts: 35,102 tokens (cascade depth: 6)

After Refactoring:

Average context budget per file: 4,780 tokens (-62%)
Maximum depth: 4 levels
Fragmented domains: 2 (consolidated User+Auth, Receipt+Payment)
Low cohesion files: 5 (9% of analyzed files)

Top files (now optimized):
- src/api/receipt-processor.ts: 8,234 tokens (depth: 3)
- src/services/user-service.ts: 6,891 tokens (depth: 3)
- src/api/payment-handler.ts: 7,445 tokens (depth: 4)

Impact on AI Performance:

Response time: Avg 8.2s → 3.1s (62% faster)
Context truncation errors: 34 → 2 (94% reduction)
Suggestions quality: Subjectively much better, AI now references correct patterns
Developer satisfaction: "AI finally gets what I'm trying to do"

The Four Dimensions of Context Cost

@aiready/context-analyzer measures four key metrics:

1. Import Depth (Cascade Levels)

How many layers deep your dependencies go:

// Depth 0: No imports
export function add(a: number, b: number) {
  return a + b;
}

// Depth 1: Direct imports only
import { add } from './math';
export function calculate(x: number) {
  return add(x, 10);
}

// Depth 3+: Deep chain (EXPENSIVE)
import { processUser } from './user-processor'; // imports 5 files
// └─ which imports './validators'             // imports 3 files
//     └─ which imports './validation-rules'   // imports 2 files

Rule of thumb:

Depth 0-2: ✅ Excellent (< 5,000 tokens)
Depth 3-4: ⚠️ Acceptable (5,000-15,000 tokens)
Depth 5+: ❌ Expensive (15,000+ tokens)

2. Context Budget (Total Tokens)

The total number of tokens AI needs to understand your file:

// Small budget (< 3,000 tokens)
// File: 120 lines, 1 import, shallow dependency
import { API_URL } from './config';
export function fetchUser(id: string) {
  return fetch(`${API_URL}/users/${id}`);
}

// Large budget (> 20,000 tokens)
// File: 200 lines, 8 imports, deep dependencies
import { validateInput } from './validators'; // +4,500 tokens
import { transformData } from './transformers'; // +6,200 tokens
import { enrichUser } from './enrichment'; // +8,100 tokens
import { formatResponse } from './formatters'; // +3,800 tokens
// ... more imports ...

Target zones:

< 5,000 tokens: ✅ AI-friendly
5,000-15,000 tokens: ⚠️ Monitor
15,000+ tokens: ❌ Refactor needed

3. Domain Fragmentation

How scattered your related logic is across files:

// FRAGMENTED (user logic in 8 files)
src/api/user-login.ts           // Authentication
src/api/user-profile.ts         // Profile management
src/services/user-validator.ts  // Validation
src/utils/user-formatter.ts     // Formatting
src/models/user-types.ts        // Types
src/db/user-repository.ts       // Data access
src/middleware/user-auth.ts     // Auth middleware
src/helpers/user-utils.ts       // Utilities

// CONSOLIDATED (user logic in 3 files)
src/domain/user/
  ├─ user.service.ts            // Core business logic
  ├─ user.repository.ts         // Data access
  └─ user.types.ts              // Types and interfaces

Why fragmentation matters:

When AI tries to understand user-related features, it must:

Load 8 separate files (fragmented) vs 3 files (consolidated)
Parse 3,200+ lines vs 800 lines
Navigate 24+ imports vs 6 imports
Build mental model across scattered context vs cohesive modules

4. Cohesion Score

How well a file focuses on one responsibility:

// LOW COHESION (mixed concerns)
// user-service.ts
export class UserService {
  validateEmail() {
    /* validation logic */
  }
  sendEmail() {
    /* email sending logic */
  }
  formatUserName() {
    /* formatting logic */
  }
  logUserAction() {
    /* logging logic */
  }
  encryptPassword() {
    /* crypto logic */
  }
  renderUserProfile() {
    /* rendering logic */
  }
}

// HIGH COHESION (single responsibility)
// user-service.ts
export class UserService {
  createUser() {
    /* user creation */
  }
  updateUser() {
    /* user updates */
  }
  deleteUser() {
    /* user deletion */
  }
  getUserById() {
    /* user retrieval */
  }
}

Cohesion calculation:

The tool analyzes:

Method names and their similarity
Import types (business logic vs utilities vs external)
File path and naming conventions
Return types and parameter types

Scores:

80-100%: ✅ Highly cohesive (focused responsibility)
60-79%: ⚠️ Moderate cohesion (some mixing)
< 60%: ❌ Low cohesion (refactor into separate modules)

Technical Deep Dive: How Context-Analyzer Works

Step 1: Build Dependency Graph

// Pseudo-code of the analysis
function analyzeDependencies(entryFile: string) {
  const graph = new DependencyGraph();

  function traverse(file: string, depth: number = 0) {
    const ast = parseFile(file);
    const imports = extractImports(ast);

    for (const imp of imports) {
      const resolvedPath = resolveImport(imp, file);
      graph.addEdge(file, resolvedPath, depth + 1);

      if (depth < MAX_DEPTH) {
        traverse(resolvedPath, depth + 1);
      }
    }
  }

  traverse(entryFile);
  return graph;
}

Step 2: Calculate Token Costs

function calculateContextBudget(file: string, graph: DependencyGraph) {
  let totalTokens = 0;
  const visited = new Set();

  function countTokens(currentFile: string) {
    if (visited.has(currentFile)) return;
    visited.add(currentFile);

    const content = readFile(currentFile);
    const tokens = estimateTokens(content); // ~24 tokens per 100 chars
    totalTokens += tokens;

    // Recursively count dependencies
    for (const dep of graph.getDependencies(currentFile)) {
      countTokens(dep);
    }
  }

  countTokens(file);
  return totalTokens;
}

Step 3: Detect Fragmentation

function detectFragmentation(files: string[]) {
  const domains = new Map();

  for (const file of files) {
    const domain = extractDomain(file); // e.g., "user", "receipt"
    if (!domains.has(domain)) {
      domains.set(domain, []);
    }
    domains.get(domain).push(file);
  }

  // Flag domains split across many files
  return [...domains.entries()]
    .filter(([_, files]) => files.length > 5)
    .map(([domain, files]) => ({
      domain,
      fileCount: files.length,
      fragmentationScore: calculateFragmentation(files),
    }));
}

Step 4: Measure Cohesion

function analyzeCohesion(file: string) {
  const ast = parseFile(file);
  const exports = extractExports(ast);
  const imports = extractImports(ast);

  // Analyze semantic similarity of exports
  const similarities = [];
  for (let i = 0; i < exports.length - 1; i++) {
    for (let j = i + 1; j < exports.length; j++) {
      const sim = calculateSimilarity(exports[i], exports[j]);
      similarities.push(sim);
    }
  }

  // High average similarity = high cohesion
  const avgSimilarity =
    similarities.reduce((a, b) => a + b, 0) / similarities.length;

  // Penalty for mixed import types
  const importTypes = categorizeImports(imports);
  const mixedPenalty = Object.keys(importTypes).length > 3 ? 0.2 : 0;

  return Math.max(0, avgSimilarity - mixedPenalty);
}

Example: Refactoring receiptclaimer's Receipt Processing

Let me show you a real refactoring that reduced context budget by 82%.

Before: Deep Import Chain (47,821 tokens)

// src/api/receipt-processor.ts
import { validateReceipt } from '../validators/receipt-validator';
import { parseReceiptImage } from '../services/ocr-service';
import { extractLineItems } from '../parsers/line-item-parser';
import { calculateTotals } from '../calculators/total-calculator';
import { enrichMerchantData } from '../enrichment/merchant-enricher';
import { formatReceiptResponse } from '../formatters/receipt-formatter';
import { logProcessing } from '../logging/process-logger';
import { notifyUser } from '../notifications/user-notifier';

export async function processReceipt(imageUrl: string, userId: string) {
  logProcessing('start', userId);

  const validation = validateReceipt(imageUrl);
  if (!validation.valid) throw new Error('Invalid receipt');

  const ocrResult = await parseReceiptImage(imageUrl);
  const lineItems = extractLineItems(ocrResult);
  const totals = calculateTotals(lineItems);
  const enriched = await enrichMerchantData(ocrResult.merchant, lineItems);

  await notifyUser(userId, 'Receipt processed');

  return formatReceiptResponse({ lineItems, totals, merchant: enriched });
}

Dependency tree:

receipt-processor.ts (180 lines, 4,302 tokens)
  ├─ receipt-validator.ts (94 lines, 2,247 tokens)
  │   ├─ validation-rules.ts (156 lines, 3,721 tokens)
  │   └─ error-types.ts (41 lines, 982 tokens)
  ├─ ocr-service.ts (203 lines, 4,847 tokens)
  │   ├─ image-preprocessor.ts (145 lines, 3,461 tokens)
  │   ├─ ocr-client.ts (89 lines, 2,125 tokens)
  │   └─ text-extractor.ts (178 lines, 4,249 tokens)
  ├─ line-item-parser.ts (167 lines, 3,987 tokens)
  ├─ total-calculator.ts (78 lines, 1,862 tokens)
  ├─ merchant-enricher.ts (134 lines, 3,201 tokens)
  │   └─ merchant-api-client.ts (98 lines, 2,340 tokens)
  ├─ receipt-formatter.ts (103 lines, 2,458 tokens)
  ├─ process-logger.ts (67 lines, 1,601 tokens)
  │   └─ log-transport.ts (124 lines, 2,967 tokens)
  └─ user-notifier.ts (89 lines, 2,125 tokens)
      └─ notification-service.ts (156 lines, 3,724 tokens)

Total: 1,902 lines, 47,821 tokens
Depth: 7 levels

After: Consolidated Module (8,234 tokens)

// src/domain/receipt/receipt.service.ts
import { ReceiptRepository } from './receipt.repository';
import { OCRProvider } from './ocr.provider';
import { ReceiptTypes } from './receipt.types';

export class ReceiptService {
  constructor(
    private repository: ReceiptRepository,
    private ocrProvider: OCRProvider
  ) {}

  async processReceipt(
    imageUrl: string,
    userId: string
  ): Promise<ReceiptTypes.ProcessedReceipt> {
    // Validation (inline, simple)
    if (!this.isValidImageUrl(imageUrl)) {
      throw new ReceiptTypes.ValidationError('Invalid image URL');
    }

    // OCR processing (delegated to focused provider)
    const ocrResult = await this.ocrProvider.parseImage(imageUrl);

    // Business logic (co-located)
    const receipt = this.buildReceipt(ocrResult);
    const lineItems = this.parseLineItems(ocrResult.text);
    const totals = this.calculateTotals(lineItems);

    // Enrichment (co-located)
    const merchant = await this.enrichMerchant(ocrResult.merchantName);

    // Persistence
    const saved = await this.repository.save({
      ...receipt,
      lineItems,
      totals,
      merchant,
      userId,
    });

    return saved;
  }

  private isValidImageUrl(url: string): boolean {
    return url.startsWith('http') && /\.(jpg|jpeg|png|pdf)$/i.test(url);
  }

  private parseLineItems(text: string): ReceiptTypes.LineItem[] {
    // Inline parsing logic (previously in separate file)
    // ~30 lines of focused parsing
  }

  private calculateTotals(items: ReceiptTypes.LineItem[]): ReceiptTypes.Totals {
    // Inline calculation (previously in separate file)
    // ~15 lines of calculation
  }

  private async enrichMerchant(name: string): Promise<ReceiptTypes.Merchant> {
    // Inline enrichment (previously in separate file + client)
    // ~20 lines of enrichment logic
  }

  private buildReceipt(ocrResult: OCRResult): Partial<ReceiptTypes.Receipt> {
    // Mapping logic
  }
}

New dependency tree:

receipt.service.ts (245 lines, 5,856 tokens)
  ├─ receipt.repository.ts (87 lines, 2,078 tokens)
  ├─ ocr.provider.ts (45 lines, 1,072 tokens) [thin wrapper]
  └─ receipt.types.ts (38 lines, 908 tokens)

Total: 415 lines, 8,234 tokens
Depth: 3 levels
Reduction: 47,821 → 8,234 tokens (82.8% decrease)

What Changed?

1. Consolidated scattered logic:

8 separate files → 1 service file
Related functions co-located
Clear domain boundary

2. Inlined simple utilities:

validateReceipt: 94 lines → 3 lines (simple inline check)
calculateTotals: 78 lines → 15 lines (removed abstraction overhead)
parseLineItems: 167 lines → 30 lines (removed generic parsers)

3. Removed unnecessary abstractions:

Separate formatter → methods on service
Separate logger → focused logging where needed
Notification → moved to message queue trigger

4. Created thin wrappers:

OCR client: Fat client (203 lines) → thin provider (45 lines)
Repository: Focused data access only

Migration Strategy: How to Refactor Without Breaking Everything

Refactoring deep import chains is scary. Here's how to do it safely:

Step 1: Measure Current State

# Generate baseline report
npx @aiready/context-analyzer ./src --output baseline.json

# Identify top offenders
npx @aiready/context-analyzer ./src --sort-by budget --limit 10

Step 2: Prioritize Refactoring

Focus on:

High-traffic files: API handlers, services, core business logic
High-budget files: > 15,000 tokens
Deep chains: Depth > 5
Low cohesion: Score < 60%

Step 3: Create Domain Boundaries

Before (scattered):
src/
  ├─ api/
  ├─ services/
  ├─ utils/
  ├─ formatters/
  ├─ validators/
  └─ helpers/

After (domain-driven):
src/
  ├─ domain/
  │   ├─ user/
  │   │   ├─ user.service.ts
  │   │   ├─ user.repository.ts
  │   │   └─ user.types.ts
  │   ├─ receipt/
  │   └─ payment/
  └─ infrastructure/
      ├─ api/
      └─ database/

Step 4: Refactor Incrementally

Week 1: Consolidate one domain (e.g., User)
Week 2: Consolidate another domain (e.g., Receipt)

Week 3: Update imports across codebase
Week 4: Remove old files, update tests

Step 5: Verify Improvements

# Generate new report
npx @aiready/context-analyzer ./src --output after.json

# Compare
npx @aiready/cli compare baseline.json after.json

Best Practices

✅ Do:

Co-locate related logic: Keep domain logic together
Inline simple utilities: < 20 lines, used in one place
Use dependency injection: Makes testing easier, reduces coupling
Create thin adapters: For external services, databases
Measure regularly: Track context budget over time

❌ Don't:

Over-abstract: Not everything needs a separate file
Create deep hierarchies: Flat is better than nested
Split prematurely: Extract only when reused 3+ times
Ignore cohesion: Low cohesion = mixed concerns = high context cost
Refactor blindly: Understand dependencies before moving code

Integration with CI/CD

GitHub Actions: Context Budget Check

name: Context Budget Check
on: [pull_request]

jobs:
  context-analysis:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3

      - name: Analyze context budget
        run: npx @aiready/context-analyzer ./src --threshold 15000

      - name: Check for regressions
        run: |
          npx @aiready/context-analyzer ./src --output current.json
          npx @aiready/cli compare baseline.json current.json --fail-on-regression

Pre-commit Hook: Prevent Deep Chains

#!/bin/sh
# .git/hooks/pre-commit

echo "Checking import depth..."
npx @aiready/context-analyzer ./src --max-depth 4 --quiet

if [ $? -ne 0 ]; then
  echo "❌ Import chains too deep. Refactor before committing."
  exit 1
fi

The Bottom Line

Import chains are invisible expensive. Every import adds context cost that:

Slows down AI responses
Increases token usage (costs money on paid APIs)
Causes context truncation errors
Makes AI suggestions less accurate

But unlike many optimization problems, this one has clear metrics and actionable fixes:

Measure: Run context-analyzer to see your current state
Prioritize: Focus on high-budget, deep-chain, low-cohesion files
Refactor: Consolidate domains, inline utilities, remove unnecessary abstractions
Verify: Measure again, track improvements over time

Try It Yourself

# Analyze your codebase
npx @aiready/context-analyzer ./src

# Check specific file
npx @aiready/context-analyzer ./src/api/handler.ts --detailed

# Find files over budget
npx @aiready/context-analyzer ./src --threshold 15000

# Export report
npx @aiready/context-analyzer ./src --output report.json

# Unified CLI with all metrics
npx @aiready/cli scan --score

Before you refactor:

Measure your current context budget
Identify top offenders (top 10 files by token cost)
Pick one domain to consolidate

After you refactor:

Measure again
Calculate percentage improvement
Share your results!

Resources:

GitHub: github.com/getaiready/aiready-cli
Docs: getaiready.dev
Report issues: github.com/getaiready/aiready-cli/issues

What's your biggest context budget file? Run the analyzer and share your findings—I'd love to see what you discover.

Peng Cao is the founder of receiptclaimer and creator of aiready, an open-source suite for measuring and optimizing codebases for AI adoption.

Deep Dive: Semantic Duplicate Detection with AST Analysis - How AI Keeps Rewriting Your Logic

Peng Cao — Sat, 04 Apr 2026 05:28:31 +0000

You've just asked your AI assistant to add email validation to your new signup form. It writes this:

function validateEmail(email: string): boolean {
  return email.includes('@') && email.includes('.');
}

Simple enough. But here's the problem: this exact logic—checking for '@' and '.'—already exists in four other places in your codebase, just written differently:

// In src/utils/validators.ts
const isValidEmail = (e) => e.indexOf('@') !== -1 && e.indexOf('.') !== -1;

// In src/api/auth.ts
if (user.email.match(/@/) && user.email.match(/\./)) {
  /* ... */
}

// In src/components/EmailForm.tsx
const checkEmail = (val) =>
  val.split('').includes('@') && val.split('').includes('.');

// In src/services/user-service.ts
return email.search('@') >= 0 && email.search('.') >= 0;

Your AI didn't see these patterns. Why? Because they look different syntactically, even though they're semantically identical. This is semantic duplication—and it's one of the biggest hidden costs in AI-assisted development.

The Problem: Syntax Blinds AI Models

Traditional duplicate detection tools look for exact or near-exact text matches. They catch copy-paste duplicates, but miss logic that's been rewritten with different:

Variable names (email vs e vs val)
Methods (includes() vs indexOf() vs match() vs search())
Structure (inline vs function vs arrow function)

AI models suffer from the same limitation. When they scan your codebase for context, they see these five implementations as completely unrelated. Each one consumes precious context window tokens, yet provides zero new information.

Real-World Impact: The receiptclaimer Story

When I ran @aiready/pattern-detect on receiptclaimer's codebase, I found 23 semantic duplicate patterns scattered across 47 files. Here's what that looked like:

Before:

23 duplicate patterns (validation, formatting, error handling)
8,450 wasted context tokens
AI suggestions kept reinventing existing logic
Code reviews: "Didn't we already have this somewhere?"

After consolidation:

3 remaining patterns (acceptable, different contexts)
1,200 context tokens (85% reduction)
AI now references existing patterns
Faster code reviews, cleaner suggestions

The math: Each duplicate pattern cost ~367 tokens on average. When AI assistants tried to understand feature areas, they had to load multiple variations of the same logic, quickly exhausting their context window.

How It Works: Jaccard Similarity on AST Tokens

@aiready/pattern-detect uses a technique called Jaccard similarity on Abstract Syntax Tree (AST) tokens to detect semantic duplicates. Let me break that down.

Step 1: Parse to AST

First, we parse your code into an Abstract Syntax Tree—a structural representation that ignores syntax and focuses on meaning:

// Original code
function validateEmail(email) {
  return email.includes('@') && email.includes('.');
}

// AST tokens (simplified)
[
  'FunctionDeclaration',
  'Identifier:validateEmail',
  'Identifier:email',
  'ReturnStatement',
  'LogicalExpression:&&',
  'CallExpression:includes',
  'MemberExpression:email',
  'StringLiteral:@',
  'CallExpression:includes',
  'MemberExpression:email',
  'StringLiteral:.',
];

Step 2: Normalize

We normalize these tokens by:

Removing specific identifiers (variable/function names)
Keeping operation types (CallExpression, LogicalExpression)
Preserving structure (nesting, flow control)

// Normalized tokens
[
  'FunctionDeclaration',
  'ReturnStatement',
  'LogicalExpression:&&',
  'CallExpression:includes',
  'StringLiteral',
  'CallExpression:includes',
  'StringLiteral',
];

Step 3: Calculate Jaccard Similarity

Jaccard similarity measures how similar two sets are:

Jaccard(A, B) = |A ∩ B| / |A ∪ B|

Where:

A ∩ B = tokens in both sets (intersection)
A ∪ B = tokens in either set (union)

Example:

// Pattern A (normalized)
Set A = ['FunctionDeclaration', 'ReturnStatement', 'LogicalExpression:&&',
         'CallExpression:includes', 'StringLiteral']

// Pattern B (normalized)
Set B = ['FunctionDeclaration', 'ReturnStatement', 'LogicalExpression:&&',
         'CallExpression:indexOf', 'StringLiteral']

// Intersection
A ∩ B = ['FunctionDeclaration', 'ReturnStatement', 'LogicalExpression:&&',
         'StringLiteral']
|A ∩ B| = 4

// Union
A ∪ B = ['FunctionDeclaration', 'ReturnStatement', 'LogicalExpression:&&',
         'CallExpression:includes', 'CallExpression:indexOf', 'StringLiteral']
|A ∪ B| = 6

// Jaccard similarity
Jaccard(A, B) = 4 / 6 = 0.67 (67%)

By default, pattern-detect flags patterns with ≥70% similarity as duplicates. This catches most semantic duplicates while avoiding false positives.

Pattern Classification

The tool automatically classifies patterns into categories:

1. Validators

Logic that checks conditions and returns boolean:

// Pattern: Email validation
function validateEmail(email) {
  return email.includes('@');
}
const isValidEmail = (e) => e.indexOf('@') !== -1;

2. Formatters

Logic that transforms input to output:

// Pattern: Phone number formatting
function formatPhone(num) {
  return num.replace(/\D/g, '');
}
const cleanPhone = (n) =>
  n
    .split('')
    .filter((c) => /\d/.test(c))
    .join('');

3. API Handlers

Request/response processing logic:

// Pattern: Error response handling
function handleError(err) {
  return { status: 500, message: err.message };
}
const errorResponse = (e) => ({ status: 500, message: e.message });

4. Utilities

General helper functions:

// Pattern: Array deduplication
function unique(arr) {
  return [...new Set(arr)];
}
const dedupe = (a) => Array.from(new Set(a));

When to Extract vs When to Tolerate

Not all semantic duplicates should be eliminated. Here's how to decide:

✅ Extract When:

High similarity (>85%): Nearly identical logic, definitely consolidate
Frequent reuse: Used in 3+ places
Core business logic: Validation, formatting, calculations
High maintenance cost: Logic that changes often

⚠️ Consider Context:

Medium similarity (70-85%): Review case-by-case
Different domains: User validation vs product validation might be intentionally separate
Performance critical: Sometimes duplication for optimization is justified

❌ Tolerate When:

Low similarity (<70%): Probably not semantic duplicates
Test code: Tests often duplicate assertions intentionally
Isolated modules: If modules should remain independent
One-off logic: Used once or twice, extraction overhead not worth it

Example: Refactoring receiptclaimer's Validation Logic

Here's a real refactoring from receiptclaimer:

Before: 5 duplicate email validators

// src/api/auth/signup.ts
function validateSignupEmail(email: string) {
  return email.includes('@') && email.length > 5;
}

// src/api/auth/login.ts
const checkLoginEmail = (e: string) => e.indexOf('@') !== -1 && e.length > 5;

// src/services/user-service.ts
function isValidEmail(email: string) {
  return /@/.test(email) && email.length > 5;
}

// src/components/EmailForm.tsx
const validateEmail = (val: string) =>
  val.includes('@') && val.trim().length > 5;

// src/utils/validators.ts
export const emailValid = (email: string) =>
  email.search('@') >= 0 && email.length > 5;

Similarity scores:

signup vs login: 89%
signup vs user-service: 87%
signup vs EmailForm: 85%
signup vs validators: 91%

After: Consolidated to core utility

// src/utils/validators.ts
export function isValidEmail(email: string): boolean {
  return email.includes('@') && email.trim().length > 5;
}

// Usage everywhere else
import { isValidEmail } from '@/utils/validators';

Impact:

5 implementations → 1
~1,850 tokens → ~370 tokens (80% reduction)
AI now finds and reuses the pattern
Single source of truth for email validation

Integration with CI/CD

Make semantic duplicate detection part of your workflow:

GitHub Actions Example

name: Code Quality
on: [pull_request]

jobs:
  semantic-analysis:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3

      - name: Detect semantic duplicates
        run: npx @aiready/pattern-detect ./src --threshold 70

      - name: Comment on PR
        if: failure()
        uses: actions/github-script@v6
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '⚠️ Semantic duplicates detected. Run `npx @aiready/pattern-detect` locally for details.'
            })

Pre-commit Hook

#!/bin/sh
# .git/hooks/pre-commit

echo "Checking for semantic duplicates..."
npx @aiready/pattern-detect ./src --threshold 70 --quiet

if [ $? -ne 0 ]; then
  echo "❌ Semantic duplicates detected. Review and consolidate before committing."
  exit 1
fi

Advanced Configuration

Customize pattern detection for your project:

{
  "pattern-detect": {
    "threshold": 70,
    "minTokens": 10,
    "ignorePatterns": ["**/tests/**", "**/*.test.ts", "**/mocks/**"],
    "categories": {
      "validators": {
        "enabled": true,
        "threshold": 75
      },
      "formatters": {
        "enabled": true,
        "threshold": 70
      }
    }
  }
}

Best Practices

Run regularly: Make it part of your CI/CD, not a one-time audit
Start with high thresholds: Begin at 85%, lower gradually as you understand your codebase
Review context: Don't blindly consolidate—understand why duplicates exist
Educate your team: Share findings in code reviews, explain semantic vs syntactic
Track progress: Measure token reduction over time

The Bottom Line

Semantic duplication is invisible to traditional tools and AI models alike. But it's costing you:

Context window waste: 30-50% of tokens in typical AI-assisted projects
Slower AI responses: Models process redundant logic repeatedly
Inconsistent suggestions: AI doesn't know which pattern to follow
Higher maintenance: Changes must be made in multiple places

@aiready/pattern-detect makes the invisible visible. It shows you where your AI is wasting context, where your patterns diverge, and where consolidation will have the biggest impact.

Try It Yourself

# Analyze your codebase
npx @aiready/pattern-detect ./src

# With custom threshold
npx @aiready/pattern-detect ./src --threshold 80

# Output to JSON
npx @aiready/pattern-detect ./src --output json > report.json

# Unified CLI with all metrics
npx @aiready/cli scan --score

Resources:

GitHub: github.com/getaiready/aiready-cli
Docs: getaiready.dev
Report issues: github.com/getaiready/aiready-cli/issues

Found semantic duplicates in your codebase? Share your before/after numbers in the comments—I'd love to hear your results.

Peng Cao is the founder of receiptclaimer and creator of aiready, an open-source suite for measuring and optimizing codebases for AI adoption.

AI Code Quality Metrics That Actually Matter: The 9 Dimensions of AI-Readiness

Peng Cao — Sat, 04 Apr 2026 05:23:05 +0000

Part 3 of "The AI Code Debt Tsunami" series

For decades, software teams have relied on metrics like cyclomatic complexity, code coverage, and lint warnings to measure code quality. These tools were designed for human reviewers. But as AI-assisted development becomes the norm, these old metrics are no longer enough. AI models don’t “see” code the way humans do. They don’t care about your coverage percentage or how many branches your function has. What matters is how much context they can fit, how consistent your patterns are, and how much semantic duplication lurks beneath the surface.

That’s why we built AIReady: to measure the 9 core dimensions of AI-readiness.

Why Traditional Metrics Fall Short

Traditional tools answer "Is this code maintainable for a human?" AIReady answers "Is this code understandable for an AI?"

An AI's "understanding" is limited by its context window and its ability to predict patterns. When your codebase is fragmented, inconsistent, or full of boilerplate, you are essentially "blinding" the AI, leading to hallucinations, broken suggestions, and subtle bugs.

The 9 Dimensions of AI-Readiness

We've identified 9 critical metrics that determine how well an AI agent can navigate, understand, and modify your codebase.

1. Semantic Duplicates

What it is: Logic that is repeated but written in different ways.
Why it matters: Traditional linters miss logic duplication. AI models get confused when the same logic exists in multiple places, often updating only one and leaving the others as "logic debt."

2. Context Fragmentation

What it is: Analyzes how scattered related logic is across the codebase.
Why it matters: AI has a limited context window. If a single feature is spread across 15 folders, the AI cannot "see" the whole picture at once, leading to incomplete refactors.

3. Naming Consistency

What it is: Measures how consistently variables, functions, and classes are named.
Why it matters: AI predicts code based on patterns. Inconsistent naming (e.g., mixing getUser and fetchAccount) breaks these patterns and reduces suggestion accuracy.

4. Dependency Health

What it is: Measures the stability, security, and freshness of your dependencies.
Why it matters: AI models often suggest outdated or insecure packages if your project is stuck on old versions. A clean dependency graph keeps AI suggestions modern and safe.

5. Change Amplification

What it is: Tracks how many places need to change when a single requirement evolves.
Why it matters: AI struggles with high coupling. If one change requires 10 files to be updated, the AI is significantly more likely to miss a spot or introduce a regression.

6. AI Signal Clarity

What it is: Measures the ratio of "signal" (actual logic) to "noise" (boilerplate, dead code).
Why it matters: Excess boilerplate wastes the AI's context window. More "signal" means the AI can spend its tokens on the logic that actually matters.

7. Documentation Health

What it is: Checks for missing, outdated, or misleading documentation.
Why it matters: AI relies heavily on docstrings to understand intent. Outdated docs lead to "hallucinations" where the AI assumes behavior that no longer exists.

8. Agent Grounding

What it is: Assesses how easily an AI agent can navigate your project structure.
Why it matters: Standard structures allow AI agents to navigate autonomously. Confusing layouts make agents "get lost" during multi-file operations.

9. Testability Index

What it is: Quantifies how easy it is for an AI to write and run tests for your code.
Why it matters: AI-generated tests are the best way to verify AI-generated code. Code that is hard to test is inherently harder for an AI to maintain safely.

How to Start Measuring

AIReady provides a unified CLI to scan your codebase against all 9 dimensions:

npx @aiready/cli scan --score

This command gives you an overall AI Readiness Score (0-100) and a detailed breakdown of where your biggest "AI Debt" lies.

What's Next?

Over the coming weeks, we will be doing a Deep Dive Series into each of these 9 metrics. We'll show real-world examples of how they impact AI productivity and provide concrete refactoring strategies to improve your score.

Stay tuned for Part 4: The Hidden Cost of Semantic Duplicates.

GitHub: github.com/getaiready/aiready-cli
Platform: platform.getaiready.dev
Docs: getaiready.dev/docs

Peng Cao is the creator of aiready, an open-source suite for measuring and optimizing codebases for AI adoption.

Why Your Codebase is Invisible to AI (And What to Do About It)

Peng Cao — Sun, 01 Feb 2026 05:53:12 +0000

Part 2 of 7: The AI Code Debt Tsunami Series

I watched GitHub Copilot suggest the same validation logic three times in one week. Different syntax. Different variable names. Same exact purpose.

The AI wasn’t broken. My codebase was invisible.

Here’s the problem: AI can write code, but it can’t see your patterns. Not the way humans do. When you have the same logic scattered across different files with different names, AI treats each one as unique. It doesn’t know you’ve already solved this problem. So it solves it again. And again.

This isn’t just annoying. It’s expensive.

Press enter or click to view image in full size

The Context Window Crisis

Every time your AI assistant helps with code, it needs context. It reads your file, follows imports, understands dependencies. All of this costs tokens. The more fragmented your code, the more tokens you burn.

Let me show you a real example from building ReceiptClaimer.

Example 1: User Validation — The Hard Way

I had user validation logic spread across 8 files:

api/auth/validate-email.ts
api/auth/validate-password.ts
api/users/check-email-exists.ts
api/users/validate-username.ts
lib/validators/email.ts
lib/validators/password-strength.ts
utils/auth/email-format.ts
utils/validation/user-fields.ts

Each file: 80–150 lines. Different patterns. Different error handling. Different import chains.

When AI needed to help with user validation, it had to:

Read the current file (200 tokens)
Follow imports to understand the pattern (3,200 tokens)
Pull in dependencies to match types (5,800 tokens)
Scan similar files to understand conventions (3,250 tokens)
Total context cost: 12,450 tokens per request.

At GPT-4 pricing (~0.03/1K tokens), that’s 0.37 per code suggestion.

Example 2: User Validation — The Smart Way

After refactoring, I consolidated to 2 files:

lib/user-validation/index.ts - All validation logic
lib/user-validation/types.ts - Shared types
Each file: 200–250 lines. Single pattern. Clear error handling. Minimal imports.

Same AI assistance, new cost:

Read the current file (200 tokens)
Read the validation module (900 tokens)
Read type definitions (1,000 tokens)
Total context cost: 2,100 tokens per request.

That’s an 83% reduction. From 0.37 to 0.06 per suggestion.

If your team makes 50 AI-assisted edits per day, that’s:

Before: 18.50/day = 555/month = $6,660/year
After: 3/day = 90/month = $1,080/year
Savings: $5,580/year. Just from organizing user validation.

And that’s one domain. What about error handling? Database queries? API endpoints? File uploads?

Three Ways Your Code Becomes Invisible

1. Semantic Duplicates: Same Logic, Different Disguise

Traditional linters catch copy-paste duplication. They’re useless for semantic duplicates.

Here’s what I mean. Both functions do the exact same thing:

// File: api/receipts/validate.ts
function checkReceiptData(data: any): boolean {
  if (!data.merchant) return false;
  if (!data.amount) return false;
  if (data.amount <= 0) return false;
  if (!data.date) return false;
  return true;
}
// File: lib/validators/receipt-validator.ts
export function isValidReceipt(receipt: ReceiptInput): boolean {
  const hasRequiredFields = receipt.merchant && 
                           receipt.amount && 
                           receipt.date;
  const hasPositiveAmount = receipt.amount > 0;
  return hasRequiredFields && hasPositiveAmount;
}

ESLint won’t catch this. SonarQube won’t catch this. They look different.

But to an AI reading your codebase? These are two competing patterns. Should it use the imperative style with early returns? Or the declarative style with boolean composition?

It doesn’t know. So it picks randomly. Or invents a third way.

I found 23 of these in ReceiptClaimer before I measured. Receipt validation, user authentication, file upload checks, date parsing, currency formatting.

Each one was a signal to AI: “We don’t have a standard way of doing this.”

2. Domain Fragmentation: Scattered Logic That Bleeds Tokens

Every time you split a single responsibility across multiple files, you fragment your domain. AI has to load more context. Burn more tokens. Make more mistakes.

Here’s what fragmentation looked like in my codebase:

Receipt Processing (fragmented):

src/
  api/
    receipts/
      upload.ts          # Handles file upload
      extract.ts         # Calls OCR service
      parse.ts           # Parses OCR response
  lib/
    ocr/
      google-vision.ts   # Google Vision integration
      openai-vision.ts   # OpenAI Vision integration
    parsers/
      receipt-parser.ts  # Parsing logic
  services/
    receipt-service.ts   # Business logic
  utils/
    file-upload.ts       # S3 upload helper

8 files. 7 different import paths. To understand receipt processing, AI needs to load all of them.

Receipt Processing (consolidated):

src/
  domains/
    receipt-processing/
      index.ts           # Public API
      ocr-service.ts     # OCR abstraction
      parser.ts          # Parsing logic
      storage.ts         # S3 operations
      types.ts           # Shared types

5 files. Single import path. Clear boundaries. AI can understand the entire domain from one import.

The result: Import depth dropped from 7 levels to 3 levels. Context budget per file dropped 62%.

3. Low Cohesion: Mixed Concerns That Confuse Everyone

This is the “God file” problem, but inverted.

Instead of one file doing everything, you have files that do unrelated things. AI can’t figure out what the file is for.

Example from my early codebase:

// lib/utils/helpers.ts (820 lines)
export function formatCurrency(amount: number): string { ... }
export function parseDate(dateStr: string): Date { ... }
export function uploadToS3(file: Buffer): Promise<string> { ... }
export function validateEmail(email: string): boolean { ... }
export function generateToken(): string { ... }
export function calculateGST(amount: number): number { ... }
export function hashPassword(pwd: string): Promise<string> { ... }

What is this file? Currency formatting? Date parsing? Authentication? File uploads? Tax calculation?

All of them. None of them.

When AI tries to help, it doesn’t know which pattern to follow. The file has no cohesive theme. So AI makes guesses. Often wrong guesses.

After measuring cohesion scores (more on this below), I split this into:


lib/formatting/currency.ts - Currency & GST
lib/formatting/date.ts - Date parsing
lib/auth/tokens.ts - Token & password handling
lib/storage/s3.ts - File uploads
lib/validation/email.ts - Email validation

Cohesion score went from 0.23 (terrible) to 0.89 (excellent).

AI suggestions became relevant. Copilot started importing the right modules. Code reviews got faster because humans could find things too.

How to Measure Invisibility

You can’t fix what you can’t measure. So I built tools to measure these three dimensions.

Measuring Semantic Duplicates

Traditional tools use line-by-line comparison. They fail on semantic duplicates.

I built @aiready/pattern-detect using a different approach:

Parse code into AST (Abstract Syntax Trees)
Extract semantic tokens (variable names → generic placeholders)
Calculate Jaccard similarity (set-based comparison)
Example:

// Function A
function validateUser(user) {
  if (!user.email) return false;
  if (!user.password) return false;
  return true;
}
// Function B  
function checkUserValid(data) {
  const hasEmail = !!data.email;
  const hasPassword = !!data.password;
  return hasEmail && hasPassword;
}

After normalization:

Function A tokens: [if, not, property, return, false, return, true]
Function B tokens: [const, property, return, and]
Jaccard similarity: 0.78 (78% similar)

Become a member
Anything above 0.70? Probably a semantic duplicate worth reviewing.

Tool: npx @aiready/pattern-detect

Measuring Fragmentation

Context budget tells you how many tokens AI needs to understand a file.

I built @aiready/context-analyzer to measure:

Import depth — How many levels deep do imports go?
Context budget — Total tokens needed to understand this file
Cohesion score — Are imports related to each other?
Fragmentation score — Is this domain split across files?
Example output:

src/api/receipts/upload.ts
Import depth: 7 levels
Context budget: 12,450 tokens
Cohesion: 0.34 (low - mixed concerns)
Fragmentation: 0.78 (high - scattered domain)
High fragmentation + low cohesion = AI will struggle.

Tool: npx @aiready/context-analyzer

Measuring Consistency

The third dimension: pattern consistency.

Do you handle errors the same way everywhere? Use the same naming conventions? Follow the same async patterns?

I’m building @aiready/consistency to detect:

Mixed error handling patterns (try-catch vs callbacks vs promises)
Inconsistent naming (camelCase vs snake_case)
Import style drift (ES modules vs require)
Async pattern mixing (async/await vs .then())
Status: Beta release next week.

The ReceiptClaimer Results

I ran these tools on my own codebase — ReceiptClaimer, an AI-powered receipt tracker for Australian taxpayers. Here’s what I found:

Before Measurement

Semantic duplicates: 23 patterns repeated 87 times
Average import depth: 5.8 levels
Average context budget: 8,200 tokens per file
Cohesion score: 0.42 (poor)
Monthly AI costs: ~$380 (estimated)

After Refactoring (4 weeks)

Semantic duplicates: 3 patterns repeated 8 times (-87%)
Average import depth: 2.9 levels (-50%)
Average context budget: 2,100 tokens per file (-74%)
Cohesion score: 0.89 (excellent)
Monthly AI costs: ~$95 (estimated)
Time invested: 40 hours over 4 weeks Annual savings: $3,420 in AI costs ROI: 12.6 months (probably faster due to velocity gains)

But the real win wasn’t the money.

AI suggestions became useful. Copilot started suggesting the right patterns. Code reviews got faster. New features shipped with fewer bugs. Onboarding new developers became easier.

Making my code visible to AI made it better for humans too.

What You Can Do Today

You don’t need to refactor everything. Start with measurement.

Step 1: Measure Your Semantic Duplicates

npx @aiready/pattern-detect
Look for:

Similarity scores > 70%
Patterns repeated 3+ times
Core domains (auth, validation, API handlers)

Step 2: Measure Your Fragmentation

npx @aiready/context-analyzer
Look for:

Import depth > 5 levels
Context budget > 8,000 tokens
Cohesion score < 0.50
Files with fragmentation > 0.70

Step 3: Pick ONE Domain to Fix

Don’t refactor everything. Pick your most painful domain:

The one where AI suggestions are worst
The one where code reviews take longest
The one where new developers get confused
Focus there. Consolidate files. Extract common patterns. Measure again.

Step 4: Track Improvements

Run the tools weekly. Watch the metrics improve. Share results with your team.

The goal isn’t perfect code. It’s visible code.

Code that AI can understand. Code that humans can maintain. Code that doesn’t waste tokens on fragmentation.

Next in This Series

In Part 3, I’ll dive deep into the technical details: “Building AIReady: Metrics That Actually Matter”

We’ll explore:

Why traditional metrics (cyclomatic complexity, code coverage) miss AI problems
How Jaccard similarity works on AST tokens (with diagrams)
The three dimensions of AI-readiness and how they interact
Design decisions: Why I built a hub-and-spoke architecture
Open source philosophy: Free forever, configurable by design
Until then, run the tools. Measure your codebase. See how invisible it really is.

Try it yourself:

Home page
GitHub
Want to support this work?

⭐ Star the repo
🐛 Report issues you find
💬 Share your results (I read every comment)
Peng Cao is building open source tools for AI-ready development. He’s also the creator of ReceiptClaimer, an AI-powered receipt tracker for Australian taxpayers. Follow along as he builds in public.

Read the series:

Part 1: The AI Code Debt Tsunami is Here
Part 2: Why Your Codebase is Invisible to AI ← You are here
Part 3: Building AIReady — Metrics That Actually Matter (coming Next)

The AI Code Debt Tsunami is Here (And We’re Not Ready)

Peng Cao — Thu, 15 Jan 2026 22:50:07 +0000

Part 1 of “The AI Code Debt Tsunami” series

Six months ago, GitHub Copilot helped me write a user validation function in 30 seconds. Yesterday, it wrote the same function again. And again. Five different versions across my codebase, each slightly different, none aware of the others.

This isn’t a bug in the AI. This is the new normal. The probabilistic nature is the characteristics of the LLM coding agents and the limited context for each interaction.

We’re witnessing the fastest productivity boost in software development history. AI coding assistants have made us 5x-10x faster at writing individual functions. But there’s a dark side we’re only beginning to understand: AI-generated code creates tech debt at an unprecedented scale and speed.

Traditional tech debt accumulates exponentially by number of team member multiplied by the number of coding agent each one uses — messy code compounds over months into years. AI code debt accumulates exponentially. What used to take 18 months to become unmaintainable now happens in 6 weeks.

The tsunami is here. Most teams don’t even see the wave.

The Paradox: Going Fast While Falling Behind

Here’s what I observed while building receiptclaimer, a receipt processing SaaS
:

Week 1–2: 🚀 Amazing! We’re shipping features daily. Copilot writes boilerplate, Claude helps with complex logic, ChatGPT generates tests. We’re moving 3x faster than any team I’ve been on.

Week 3–4: 🤔 Hmm. Our AI assistants keep suggesting we create utilities that… already exist? They’re also suggesting 3 different patterns for the same API endpoint. Which one is “right”?

Week 5–6: 😰 Wait. Our codebase has 23 nearly-identical handler functions. Our import chains are 8 levels deep. AI tools are now giving worse suggestions because they can’t fit our context into their windows. We’ve gone from 10x faster to 5x faster, probably drop quicker as time passes.

The math: 1 months of 10x productivity = 12 months of traditional work. But we also accumulated what feels like 24 months of tech debt. Net result? We need to shift the focus to maintain that high multiplier.

This is the AI code debt paradox: The faster AI helps you write code, the faster you accumulate debt you can’t see. Eventually, it becomes a codebase only for AI but not for human.

The Four Horsemen of AI Code Debt

After analyzing dozens of AI-assisted projects (including my own), I’ve identified four distinct problems that traditional metrics completely miss:

1. Knowledge Cutoff Gaps (The Outdated Pattern Problem)

AI models have training cutoffs. GPT-4’s knowledge ends in April 2023. Claude’s is a bit later. But your best practices evolved last month.

The result: AI confidently suggests patterns that were deprecated in your codebase months ago. It recommends libraries you’ve already migrated away from. It writes code that technically works but violates architectural decisions made after its training data was collected.

Real example from receiptclaimer

// AI suggested this in November 2025:
app.get('/api/receipts', async (req, res) => {
  const { userId } = req.query;
  // ... validation logic
});

// But we standardized on this pattern in a few times!!!:
app.get('/api/receipts', withAuth(async (req, res) => {
  const userId = req.user.id; // From auth middleware
  // ... no validation needed, it's in middleware
}));

AI didn’t know about our withAuth middleware because it was created 3 months after training cutoff. Result? 18 endpoints using the old pattern, 12 using the new one. All written by AI. All technically correct. All inconsistent.

2. Model Preference Drift (The Team Chaos Problem)

Your frontend dev prefers Cursor. Your backend dev swears by GitHub Copilot. Your junior dev uses ChatGPT. Each AI has different preferences for how to solve problems.

The result: Your codebase becomes a Frankenstein of 3 different “AI styles,” each internally consistent, but totally incompatible with each other.

Real example:

Copilot likes this: const user = await db.users.findById(userId)
Claude prefers: const user = await getUserById(userId) (wrapped in helper)
ChatGPT suggests: const user = await User.findById(userId) (ORM style)

All three work. None are wrong. But when you have all three scattered across 100 files, your AI assistants get confused trying to help with refactoring. Which pattern should they follow?

3. Undetected Semantic Duplicates (The Invisible Repetition Problem)

This is the most insidious one. AI generates code that looks different but does the same thing.

Traditional duplicate detection tools (like jscpd) only catch copy-paste duplicates — exact text matches. But AI never copy-pastes. It generates fresh code every time, with different variable names, slightly different logic, but functionally identical.

Real example from receiptclaimer:

// File 1: src/api/receipts.ts
const validateReceipt = (data) => {
  if (!data.amount || data.amount <= 0) return false;
  if (!data.date || new Date(data.date) > new Date()) return false;
  if (!data.merchant || data.merchant.trim().length === 0) return false;
  return true;
}

// File 2: src/services/receipt-validator.ts  
export function isValidReceipt(receipt) {
  const hasAmount = receipt.amount && receipt.amount > 0;
  const hasValidDate = receipt.date && new Date(receipt.date) <= new Date();
  const hasMerchant = receipt.merchant?.trim().length > 0;
  return hasAmount && hasValidDate && hasMerchant;
}

// File 3: src/utils/validation.ts
class ReceiptValidator {
  static validate(r) {
    return r.amount > 0 && 
           new Date(r.date) <= new Date() && 
           r.merchant.trim() !== '';
  }
}

Three different files. Three different names. Three different syntaxes. Same exact logic.

Traditional linters see zero duplication (0% text overlap). But they’re wasting hundreds of AI tokens and confusing the models. When Copilot sees all three, it doesn’t know which pattern to follow, so it creates a fourth variant.

We found 23 of these in our codebase. That’s 8,450 tokens of wasted context every time AI tries to understand our validation logic.

4. Context Fragmentation (The Token Budget Problem)

AI models have limited context windows. GPT-4 Turbo has 128K tokens. Claude 3.5 has 200K. Sounds like a lot, right?

Wrong.

When your code is fragmented across dozens of files with deep import chains, AI needs to load massive amounts of context just to understand one function.

Real example:

// src/api/users.ts (850 tokens)
import { getUserById } from '../services/user-service'; // +2,100 tokens
import { validateUser } from '../utils/user-validation'; // +1,800 tokens  
import { UserModel } from '../models/user'; // +2,100 tokens
import { logger } from '../lib/logger'; // +450 tokens
import { cache } from '../helpers/cache'; // +900 tokens

export const getUser = async (id) => {
  // 20 lines of actual code
}

To understand this 20-line function, AI needs to load:

The function itself: 850 tokens
All its imports: 7,350 tokens

Become a member

Their transitive dependencies: ~4,000 more tokens

Total: 12,200 tokens for a 20-line function.

Now multiply this across your entire codebase. We discovered that some of our “simple” user management operations were costing 15,000+ tokens just for AI to understand the context. That’s 10% of GPT-4’s context window for one feature domain.

The result? AI gives incomplete answers, misses important context, or suggests refactorings that break transitive dependencies it couldn’t fit in its window.

Why Traditional Metrics Miss This Entirely
If you’re running SonarQube, CodeClimate, or similar tools, you feel pretty confident about your code quality. You shouldn’t be.

Traditional metrics were designed for human code review, not AI code consumption:

Cyclomatic complexity: Measures branching logic (good for humans debugging). Useless for detecting semantic duplicates.
Code coverage: Measures test coverage (good for reliability). Doesn’t detect context fragmentation.
Duplication detection: Measures text similarity (catches copy-paste). Blind to AI-generated semantic duplicates.
Dependency graphs: Shows imports (good for architecture). Doesn’t measure token cost.

None of these tools answer the questions that matter in an AI-first world:

How much does it cost AI to understand this file?
Are there semantically similar patterns AI keeps recreating?
Is this code organized in a way AI can consume efficiently?
Will AI suggestions be consistent with our existing patterns?

We’re using 2015 metrics for 2025 problems.

The Real Cost (In Numbers You Can Measure)
Let me translate this into business impact, using real numbers from receiptclaimer:

Before AI-readiness optimization:

23 semantic duplicate patterns (undetected by traditional tools)
Average context budget per feature: 12,000 tokens
AI response quality: ~60% useful without additional clarification
Time to onboard new AI patterns: ~2 hours of prompt engineering per feature
Developer frustration: High (AI keeps suggesting “wrong” patterns)

Impact on velocity

Week 1–4: 10x faster than baseline ✅
Week 5–12: 5x faster than baseline ⚠️
Week 13–20: pretty much baseline ❌
Week 21+: Velocity crisis — considering partial rewrite ❌❌

The hidden cost: We spent 4 months going fast in the wrong direction. The refactoring tax came due, and it was massive.

What Comes Next

Here’s the uncomfortable truth: Every team using AI coding assistants is accumulating this debt right now. The only difference is some realize it, most don’t.

The good news? This is measurable. Fixable. Preventable.

Over the next few weeks, I’m going to break down:

How to detect semantic duplicates AI creates (even traditional tools miss)
How to measure context costs and fragmentation
How to optimize your codebase so AI tools work with your patterns instead of against them
Real case study of how we refactored receiptclaimer and quantified the results

I built aiready suite tools to solve this problem for myself and my team. It’s open source, configurable, and designed for the AI-first development workflow.

@aiready/cli: Unified CLI interface for running all below analysis tools together or individually
@aiready/pattern-detect: Detect semantic duplicate patterns that waste AI context window tokens
@aiready/context-analyzer: Analyze context window costs, import depth, cohesion, and fragmentation
@aiready/consistency: Check naming conventions and pattern consistency across your codebase
Because here’s what I learned: Making your codebase AI-ready doesn’t just make AI better. It makes your code better for humans too.

Clean, consistent, well-organized code has always been the ideal. AI just makes the cost of not doing it much more immediate and painful.

The tsunami is here. But we can learn to surf!

Next in this series: Part 2 — “Why Your Codebase is Invisible to AI (And What to Do About It)” — We’ll dive deep into semantic duplicates and context fragmentation, with concrete examples and detection strategies.

Have questions or war stories about AI-generated tech debt? Drop them in the comments. I read every one.