Vuong Ngo

Posted on Oct 12

AI Keeps Breaking Your Architectural Patterns. Documentation Won't Fix It.

#ai #architecture #development #vibecoding

I've been using AI coding assistants across our engineering team for over a year. Working in a data department, we had some privilege to experiment and use Claude, Roo-Code and other in-house agents for our daily workflow.

The pattern emerged slowly. Junior developers shipping features faster than before, which was great. Code reviews taking longer, which wasn't. The code functionally worked, tests passed, but something was consistently off. Direct database imports in service layers. Default exports scattered across a codebase that had standardized on named exports years ago. Repository pattern bypassed in favor of inline SQL.

These weren't bugs. The code ran fine in production. They were architectural drift—the slow erosion of patterns we'd spent years establishing. What made it frustrating was the inconsistency. A junior developer would correctly implement dependency injection in one file, then bypass it completely in the next. Same developer, same day, same codebase. The knowledge was there, but it wasn't being applied consistently.

The obvious answer was "better code review." But that doesn't scale. When you're reviewing 20+ PRs a day across a 50-package monorepo, you can't catch every architectural violation. And the ones you miss compound.

Here's what we figured out: this isn't an AI problem or a developer problem. It's a feedback timing problem.

TL;DR

AI-generated code violates architectural patterns because of timing and context, not capability
Static documentation creates a validation gap that AI can't bridge
Effective architecture enforcement requires runtime feedback loops, not upfront documentation
Path-based pattern matching provides file-specific architectural context
We built Architect MCP to close the feedback loop at code generation time
Results: 80% pattern compliance vs 30-40% with documentation alone

The Real Problem: Temporal and Spatial Context Loss

Let's be precise about what's happening here. AI coding assistants operate with ephemeral context windows. Even with project-specific documentation (CLAUDE.md, system prompts, etc.), there's a fundamental mismatch between when architectural constraints are communicated and when they need to be applied.

Consider a typical session:

Claude reads your architectural guidelines at initialization (t=0)
You discuss requirements, explore the codebase, iterate on design (t=0 to t=20min)
Claude generates code implementing the agreed-upon logic (t=20min)

By step 3, the architectural constraints from step 1 are 20 minutes and dozens of messages removed from the working context. The AI is optimizing for correctness against the immediate requirements, not consistency against architectural patterns defined at session start.

This isn't a memory problem—it's a priority and relevance problem.

What AI Optimizes For

When generating code, LLMs are fundamentally pattern-matching against their training data. Your specific architectural conventions represent a tiny signal compared to the millions of codebases in the training set. Without active feedback, the model defaults to the strongest statistical patterns:

Common > Custom: Express.js patterns over your Hono.js conventions
Simple > Structured: Direct database calls over repository pattern
Familiar > Framework-specific: Default exports because they're ubiquitous in the training data

This is why you see the same violations repeatedly, even with extensive documentation.

Why Documentation Fails (And What That Tells Us)

Our first attempt was documentation. We already had a substantial CLAUDE.md, but we expanded it. Detailed sections on dependency injection patterns, repository layer requirements, export conventions, framework-specific architectural rules. We made it comprehensive—over 3,000 lines.

Junior developers referenced it. AI assistants had access to it. Compliance rate stayed around 40%. The failure modes are instructive:

1. The Relevance Gap

A 1k-line document applies to every file equally, which means it applies to no file specifically. A repository needs repository-specific guidance. A React component needs component-specific rules. Serving generic "follow clean architecture" advice to both is essentially noise.

2. The Retrieval Problem

Even with RAG systems, retrieving the right architectural context at code generation time is non-trivial. You need to know what patterns apply before you can retrieve them. If Claude is generating a new file type, there's no obvious query to pull the relevant constraints.

3. The Validation Gap

This is the critical one. Documentation describes correct patterns but provides no mechanism to verify compliance. It's teaching without testing. The feedback loop is broken.

Rethinking the Problem: Feedback Over Front-loading

Here's the architectural insight: you can't front-load all context, but you can close the feedback loop.

Instead of trying to make AI remember everything upfront, we need to provide architectural feedback at two critical moments:

Before code generation: "What patterns apply to this specific file?"
After code generation: "Does this implementation comply with those patterns?"

This shifts from a memory problem to a validation problem. And validation can be automated.

The Feedback Loop Architecture

The system needs three components:

1. Pattern Database
Organized by file path patterns with specific architectural requirements:

src/repositories/**/*.ts → Repository pattern rules
src/services/**/*.ts → Service layer rules
src/components/**/*.tsx → Component architecture rules

2. Pre-generation Context Injection
Before generating code, query the pattern database with the target file path. Inject specific, relevant architectural constraints into the immediate context.

3. Post-generation Validation
After code generation, validate against the same patterns. Use severity ratings to determine action (submit, flag, auto-fix).

The key insight: specificity matters more than comprehensiveness. Better to provide five highly relevant rules for a specific file than 50 generic rules that might apply.

Implementation: Architect MCP

We implemented this as an MCP (Model Context Protocol) server with two primary tools:

get-file-design-pattern

Provides file-specific architectural context before code generation.

// Input: File path
get-file-design-pattern("src/repositories/userRepository.ts")

// Output: Specific patterns for this file type
{
  "template": "backend/hono-api",
  "patterns": [
    "Implement IRepository<T> interface",
    "Use constructor-injected database connection",
    "Named exports only (export class RepositoryName)",
    "No direct database imports (import from '../db' is violation)"
  ],
  "reference": "src/repositories/baseRepository.ts"
}

This runs before Claude generates code, injecting precise architectural requirements into the active context.

review-code-change

Validates generated code against architectural patterns.

// Input: File path and generated code
review-code-change("src/repositories/userRepository.ts", generatedCode)

// Output: Structured validation results
{
  "severity": "LOW" | "MEDIUM" | "HIGH",
  "violations": [...],
  "compliance": "92%",
  "patterns_followed": ["✅ Implements IRepository<User>", ...],
  "recommendations": [...]
}

This runs after code generation, providing structured feedback that can drive automation (auto-submit on LOW, flag on MEDIUM, auto-fix on HIGH).

Path-Based Pattern Matching: The Critical Detail

The pattern database uses path-based matching to provide file-specific guidance. This deserves deeper explanation because it's where the system gains leverage.

Pattern Hierarchy

# Global patterns (apply to all projects)
**/*.ts:
  - No 'any' types without justification
  - Use named exports

# Template patterns (apply to projects using this template)
backend/hono-api:
  src/repositories/**/*.ts:
    - Implement IRepository<T>
    - Use dependency injection

  src/services/**/*.ts:
    - No direct database access
    - Use repository layer

# Project patterns (apply to specific project)
user-management-api:
  src/services/authService.ts:
    - Must use AuthProvider interface
    - Specific to auth domain

The system applies patterns from most general to most specific, with later patterns overriding earlier ones. This provides both consistency (global rules) and flexibility (project-specific exceptions).

Why This Scales

New projects inherit template patterns automatically. No need to reconfigure architectural rules for every new service—just specify the template in project.json:

{
  "name": "new-api-service",
  "sourceTemplate": "backend/hono-api"
}

The service immediately inherits 50+ architectural patterns specific to Hono.js APIs.

LLM-Powered Validation: Using AI to Check AI

Here's a non-obvious design choice: we use Claude to validate Claude-generated code.

Why? Because architectural compliance isn't mechanical pattern matching. Consider:

Mechanical linter approach:

// Regex: /export\s+default/
// Violation: Uses default export
export default class UserService { }

LLM validation approach:

// Understands context and intent
export default class UserService { }
// Violation: Uses default export when named export required per repository pattern
// Recommendation: Change to 'export class UserService' for consistency with repository pattern established in architect.yaml

The LLM-based validation:

Understands architectural intent, not just syntax
Provides contextual explanations
Can reason about related patterns (if you're violating DI, you're probably also missing interface implementation)
Generates actionable recommendations

This is more expensive than static linting, but the cost is justified because it runs only on changed files and provides significantly higher signal.

Layered Validation: Defense in Depth

Architect MCP isn't a replacement for existing validation layers—it's complementary. The full validation stack:

Layer 1: TypeScript Compiler

Catches: Type errors, syntax violations
Speed: < 1s
Coverage: Type safety

Layer 2: Biome/ESLint

Catches: Code style, simple rules
Speed: < 5s
Coverage: Style consistency

Layer 3: Architect MCP

Catches: Architectural pattern violations
Speed: 5-10s (LLM call)
Coverage: Framework-specific architecture

Layer 4: Code Review (Human/AI)

Catches: Business logic, complex issues
Speed: Minutes to hours
Coverage: Domain-specific concerns

Each layer has different trade-offs. TypeScript is fast but can't enforce architectural patterns. Linting handles style but not domain architecture. Architect MCP fills the gap between syntax/style and human review.

What Actually Changed

After 3 months in production across our 50+ project monorepo with a team of 8 developers:

The obvious improvement: Architectural violations became rare instead of common. Not eliminated—there are still legitimate cases where you need to break a pattern—but the unconscious drift stopped. Junior developers stopped ping-ponging between following patterns correctly and breaking them in the next file.

The unexpected improvement: Code review shifted. We thought we'd just catch violations faster. What actually happened was we stopped spending review cycles on architectural corrections. Comments like "this should use dependency injection" or "use named exports" basically disappeared. Reviews focused on design decisions, edge cases, business logic—things that actually need human judgment.

The subtle improvement: Context-switching overhead decreased. When you're working across multiple projects with different architectural patterns (Next.js app vs Hono API vs TypeScript library), you're constantly reloading mental context. Having the validation layer means you find out immediately when you've applied the wrong pattern to the wrong project, not three reviews later.

What didn't improve: We still see legitimate architectural violations. Sometimes you need to bypass a pattern for a specific reason. The difference is those are now conscious decisions documented in the PR, not unconscious mistakes that slip through review.

What This Reveals About AI-Assisted Development

The broader lesson: AI coding assistants need tight feedback loops, not extensive documentation.

This mirrors how junior developers actually learn a codebase. They don't absorb architectural patterns by reading documentation upfront. They learn by:

Getting specific guidance for the task at hand
Making changes
Getting feedback on what they did wrong
Iterating

When junior developers pair with AI, both need the same learning structure. The difference is speed. Human code review happens in hours or days. Automated feedback happens in seconds. That speed difference is what makes the approach viable.

The unexpected insight: this doesn't just help junior developers. Senior developers using AI make the same architectural mistakes—they just catch them earlier in their own review. Automated validation helps everyone maintain consistency when context-switching between projects with different architectural patterns.

Implementation Notes

If you're considering building something similar, a few non-obvious lessons:

1. Pattern Granularity Matters
Too broad (e.g., "follow clean architecture") and AI can't apply it. Too narrow (e.g., "line 47 must use Promise.all") and you've essentially hardcoded the implementation. The right level is "file-type specific patterns" (repository pattern for repositories, component pattern for components).

2. Severity Ratings Enable Automation
Without severity ratings, you can't automate responses. With them:

LOW → Auto-submit (pattern followed)
MEDIUM → Flag for attention (minor violations)
HIGH → Block submission (critical violations)

3. Template Inheritance Is Critical for Scale
Defining patterns per-project doesn't scale past ~10 projects. Template-based inheritance means you define patterns once per framework/architecture, then all projects using that template inherit them.

4. LLM Validation Is Worth the Cost
We initially tried regex-based pattern matching. It caught obvious violations—literal regex matches like export default—but missed anything requiring context. Why is this a default export? Is it actually violating the pattern or is this one of the legitimate exceptions? Regex can't answer that. LLM validation understands intent and context. Yes, it costs money per validation. But the alternative is human code review catching these issues, which is orders of magnitude more expensive in terms of developer time.

Getting Started

Architect MCP is open source: github.com/AgiFlow/aicode-toolkit

The implementation is straightforward—it's an MCP server that reads YAML pattern definitions and uses Claude to validate code against them. The hard part isn't the code—it's defining your architectural patterns clearly enough to encode them. We spent more time debating what our patterns actually were than building the validation system.

If you're building something similar, start with:

Identify your top 5 most-violated architectural patterns
Define them as path-based rules in YAML
Build the pre-generation context injection first (higher ROI than validation)
Add validation once you've proven the concept

Open Questions

We're still figuring out:

1. Pattern Evolution
How do you version architectural patterns? When you update a pattern, do you auto-update all projects or let them opt-in?

2. Cross-File Patterns
Current implementation handles single-file patterns well. Cross-file architectural concerns (e.g., "services should only call repositories, never directly call other services") are harder to encode and validate.

3. Performance at Scale
LLM-based validation works well at our scale (50 projects, ~10 changes/day). What happens at 500 projects or 1000 changes/day? Do you need caching, batching, or a hybrid approach?

If you've solved these problems, I'd love to hear about it.

If you're dealing with similar problems—AI generating code that works but breaks your architectural patterns—I'd be curious to hear how you're handling it. Drop a comment or reach out.

Resources:

Architect MCP GitHub
Preview post about Scaffolding technique

DEV Community