DEV Community

Shinsuke KAGAWA
Shinsuke KAGAWA

Posted on • Edited on

Zero Context Exhaustion: Building Production-Ready AI Coding Teams with Claude Code Sub-agents

The Problem You're Facing Right Now

Ever had Claude Code stop mid-implementation with "context window exceeded"? Started a complex refactoring only to watch it fail at 200K tokens? Or worse—gotten code that completely missed your requirements because the AI lost track of the original context?

I hit these walls constantly. My breaking point came when a critical MCP server implementation ground to a halt, leaving me with half-finished code and no way forward except starting over.

The Solution That Actually Works

This boilerplate transforms those failures into success. In our proof-of-concept sub-agents-mcp, the same implementation that died at 200K tokens was completed with 770K tokens processed, zero context exhaustion, 236 tests passing—in just 2 days.

Here's how: Instead of one overloaded AI agent, you get a team of specialized agents, each with fresh context for their specific task. The orchestrator manages them, the meta-cognition layer keeps them on track, and you get production-ready code every time.

Try It First (Takes 5 Minutes)

# Create project (30 seconds)
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install

# Start Claude Code
claude

# Begin auto-implementation
/implement Add a help feature
Enter fullscreen mode Exit fullscreen mode

Requirements analysis, design documentation, implementation, quality checks, and commits all run automatically.

Who benefits:

  • Solo developers: Small apps auto-generated with quality assurance built-in
  • Teams: Drastically reduced Pull Request review costs
  • OSS maintainers: Consistent rules applied to contributor code

What Actually Happens When You Run It

Here's what the /implement command orchestrates behind the scenes:

/implement Create help function with search capability

# Phase 1: Requirements Analysis (30s)
→ requirement-analyzer determines scale: Medium (4-5 files)
→ Asks clarifying questions about format, search type, interface

# Phase 2: Design Documentation (3-5 min)
→ technical-designer creates Design Doc with architecture
→ document-reviewer validates consistency and completeness
→ Iterative refinement until approval (typically 1-2 cycles)

# Phase 3: Work Planning (2 min)
→ work-planner creates 16 tasks across 4 phases
→ User approval required for autonomous execution

# Phase 4: Autonomous Implementation (10-20 min per task × 16 tasks)
→ task-decomposer breaks into atomic commits
→ task-executor implements with TDD (Red-Green-Refactor)
→ quality-fixer ensures lint/type/test compliance
→ Auto-commits after each quality-assured task

# Result: Production-ready code with:
✓ Full type safety (zero any types)
✓ 100% test coverage for new code
✓ Validated against acceptance criteria
✓ Ready for deployment
Enter fullscreen mode Exit fullscreen mode

This entire process runs with minimal human intervention after initial requirements clarification, demonstrating how specialized agents collaborate to deliver quality code.

Background: My Journey with AI Coding

Like many of you, I've experienced Claude Code producing unexpected implementations. As an Engineering Manager who enjoys systematizing processes, I practiced "Agentic Coding" using Claude Code's Sub-agents—deliberately allowing failures and feeding them back into the system. Watching implementations fail initially felt uncomfortable, but it became a valuable learning process.

I've published the results as AI Coding Project Boilerplate. This article introduces practical methods you can use in real projects through this boilerplate.

The Core Problem: Why Traditional AI Coding Fails

When you delegate everything to a single AI agent, you encounter:

  • Context becomes too long and execution stops midway
  • Implementations deviate from your intent
  • Upon finding errors, it impulsively starts fixing, breaking overall consistency
  • Attempts large-scale changes at once and loses control

The Solution: Agentic Coding

Agentic Coding shifts from "conversing with one AI agent" to "collaborating with a team of specialized AI agents."

Diagram of AI sub-agents collaborating in Agentic Coding

Claude Code's Sub-agents feature creates specialized AI agents that delegate tasks. Each agent maintains independent context, preserving relevant information while completing single responsibilities with maximum accuracy.

My boilerplate coordinates nine specialized agents:

  • requirement-analyzer: Requirements analysis
  • technical-designer: Architecture and design documentation
  • task-executor: TDD-based implementation
  • quality-fixer: Quality assurance and fixes
  • rule-advisor: Meta-cognition and context optimization

Why Context Engineering Matters More Than You Think

The Paradigm Shift

"Context Engineering" (gaining attention since 2024) is about building mechanisms that provide the right context at the right time for LLMs to make appropriate decisions.

Key focus areas:

  • Preventing missing background information
  • Avoiding context blur from unnecessary information
  • Recognizing that LLMs, like humans, need proper context for quality output

Measurable Impact: Before vs After

Metric Before (Traditional) After (Context Engineering)
Rule Provision All rules (40+ sections in 10 files) Only necessary sections (3-5 sections)
Context Exhaustion Occurs at 200K tokens No exhaustion even at 770K tokens
Implementation Accuracy Frequent deviations 236 tests all passing
Development Period 1+ week for similar scale Completed in 2 days

Data from sub-agents-mcp project

Single Responsibility Principle for AI Agents

Each sub-agent has one clear responsibility:

Agent Responsibility Context Usage (Measured)
prd-creator PRD (Product Requirements Doc) creation 30K tokens
technical-designer ADR/Design Doc creation 60K tokens
work-planner Work plan creation 30K tokens
task-decomposer Task breakdown 50K tokens
task-executor TDD implementation 5-60K tokens/task
quality-fixer Quality checks and fixes 90K tokens/task
rule-advisor Meta-cognition and rule selection 15K tokens

In sub-agents-mcp, we processed 8 tasks totaling ~770K tokens. Each agent's independent context prevented main agent exhaustion.

The Secret Sauce: Providing Information at the Right Time

The rule-advisor sub-agent handles interactive work. When TodoWrite is called, it:

  1. Analyzes task essence
  2. Selects relevant rules
  3. Returns contextualized guidance

This prevents LLMs from being "too helpful"—rushing to answer without proper context.

Evolution process:

  • Initial: Too many rules → ignored important ones
  • Iteration: Too few rules → unstable behavior
  • Current: Dynamic rule selection via meta-cognition

Design Documentation as Foundation

The boilerplate enforces design-document-driven development:

  • PRD (Product Requirements Doc): What to build and why
  • ADR (Architecture Decision Record): Why specific technologies
  • Design Doc: How to implement and design rationale

These documents preserve implementation background throughout the process.

How It Works in Practice

The /implement Command Flow

Using /implement activates the orchestrator pattern:

/implement Add a help feature
Enter fullscreen mode Exit fullscreen mode

Step 1: Requirements Analysis

requirement-analyzer evaluates scope and determines the process:

Scale-Based Patterns

  • Small (1-2 files): Simple plan → Implementation → QA
  • Medium (3-5 files): Requirements → Design → Implementation → QA
  • Large (6+ files): PRD → ADR → Design → Implementation → QA

Step 2: Design Documentation with Review Loop

Real-world example from the help function implementation:

● technical-designer (Create Design Doc for help system)
└ Done (12 tool uses・27.0k tokens・2m 53.3s)

● document-reviewer (Review Design Doc consistency)
└ Done (11 tool uses・27.8k tokens・2m 41.4s)

● Design Document Review Result - APPROVED WITH CONDITIONS

Technical Feasibility: 90/100
Completeness Score: 85/100

Critical Issues Found:
1. JSON Validation Library: Need to specify Zod or AJV+TypeBox for schema validation
2. Dependency Management: Must add explicit versions and TypeScript definitions
3. Error Handling: Should implement Result<T, E> pattern per TypeScript rules

Important Issues:
1. Performance Expectations: Adjust from sub-100ms to realistic 50-200ms range
2. Integration Strategy: Define module export and dependency injection patterns

Your Decision Required:
A) Quick Fix & Proceed - Address critical issues only (~30 min)
B) Thorough Revision - Address all issues (~1-2 hours)
C) Proceed As-Is - Accept limitations, fix during implementation

> A

● technical-designer (Update Design Doc with fixes)
└ Done (10 tool uses・29.7k tokens・3m 15.5s)

Changes Applied:
✓ Added Zod for JSON schema validation with TypeScript inference
✓ Specified all dependencies with exact versions
✓ Implemented Result<T, E> pattern for error handling
✓ Adjusted performance targets to 50-200ms range
✓ Added dependency injection pattern for testability

● document-reviewer (Verify Design Doc updates)
└ Done (4 tool uses・23.1k tokens・39.9s)

● All critical issues resolved - APPROVED

The iterative review ensures technical soundness before implementation begins.
Enter fullscreen mode Exit fullscreen mode

Step 3: Task Breakdown and Implementation

work-planner creates phases with quality guardrails:

  • TDD (Test-Driven Development) approach enforced
  • Integration tests at meaningful boundaries
  • Acceptance criteria verification at minimum scale

task-decomposer then generates atomic tasks (one logical commit each).

Key Learning: Layer-by-layer implementation delays integration issues. We now use vertical slices for early validation.

Step 4: Implementation and Quality Loop

For each task:

  1. task-executor implements using TDD (writes test → implements → refactors)
  2. quality-fixer runs comprehensive checks (lint, format, type-check, all tests)
  3. Main LLM commits upon quality assurance

This separation prevents context exhaustion during quality checks.

The Meta-Cognition Innovation: rule-advisor

The Problem It Solves

LLMs exhibit "helpfulness bias"—rushing to answer/fix without proper context, causing:

  • Missing important project rules
  • Skipping background understanding
  • Shortsighted fixes requiring rework

How rule-advisor Works

rule-advisor implements meta-cognition by:

  1. Intercepting TodoWrite calls: Forces a pause before action
  2. Analyzing task essence: Understanding the "why" before the "how"
  3. Selecting relevant rules: From dozens of pages, extracts only what's needed

Actual Output from Production Use:

{
  "taskAnalysis": {
    "taskType": "fix",
    "estimatedFiles": 1,
    "mainFocus": "Fix TypeScript type safety and error handling issues",
    "requiredTags": ["type-safety", "implementation", "quality", "debugging", "error-handling"]
  },
  "selectedRules": [
    {
      "file": "@docs/rules/typescript.md",
      "sections": [
        {
          "title": "Type Safety",
          "content": "**Absolute Rule**: any type is completely prohibited.\n\n**any Type Alternatives (Priority Order)**\n1. **unknown Type + Type Guards**: Use for validating external input\n2. **Generics**: When type flexibility is needed\n3. **Union Types**: Combinations of multiple types\n4. **Type Assertions (Last Resort)**: Only when type is certain"
        }
      ],
      "reason": "Essential TypeScript type safety rules to fix any type usage",
      "priority": "high"
    }
  ],
  "mandatoryChecks": {
    "taskEssence": "Ensuring comprehensive type safety (not just fixing surface errors)",
    "ruleAdequacy": "Selected rules directly address any type usage and error handling",
    "pastFailures": ["quick fixes without proper typing", "ignoring null/undefined cases"],
    "firstStep": "Replace all any types with proper TypeScript types and type guards"
  },
  "metaCognitiveQuestions": [
    "What is the root cause of each type safety violation?",
    "How can we implement proper error handling without suppressing errors?",
    "Are there edge cases (null, undefined, zero division) that need handling?",
    "Should we implement input validation with type guards for robustness?"
  ],
  "criticalRules": [
    "Complete prohibition of any type - use unknown + type guards instead",
    "All errors must have proper handling - no error suppression",
    "Null/undefined checks mandatory before operations"
  ],
  "warningPatterns": [
    "any type usage → Replace with unknown + type guards",
    "Unsafe type assertions (as) → Implement proper type validation",
    "Missing null/undefined checks → Add explicit validation",
    "Error suppression in try-catch → Proper error handling with logging"
  ],
  "firstActionGuidance": {
    "action": "Start by replacing all any types with proper TypeScript types",
    "rationale": "Type safety is the foundation - fixing types first will reveal other issues"
  },
  "confidence": "high"
}
Enter fullscreen mode Exit fullscreen mode

For manual override, I created /task command to force meta-cognition.

Technical Implementation Details

Sub-agent Definition

Agents are defined in .claude/agents/ as Markdown with YAML frontmatter:

---
name: task-executor
description: Specialized agent for steadily executing individual tasks
tools: Read, Edit, Write, MultiEdit, Bash, Grep, Glob, LS, TodoWrite
---

You are a specialized AI assistant that reliably executes individual tasks.

Key responsibilities include:
- TDD implementation following Red-Green-Refactor
- Progress tracking across task files and work plans  
- Dependency analysis and incremental verification
- Structured JSON reporting upon completion

Full implementation details: https://github.com/shinpr/ai-coding-project-boilerplate/blob/main/.claude/agents/task-executor.md
Enter fullscreen mode Exit fullscreen mode

Rule System Architecture

# rules-index.yaml
rules:
  typescript:
    file: "typescript.md"
    tags: [implementation, type-safety, async]
    typical-use: "Creating/modifying TypeScript code"
    sections:
      - "Basic Principles"
      - "Type Safety"
      - "Error Handling"
Enter fullscreen mode Exit fullscreen mode

Quality Assurance Mechanisms

Automatic triggers in CLAUDE.md:

  • 5+ file changes: Report scope and pause
  • Same error 3x: Force root cause analysis
  • Every 3 files: Update TodoWrite (checkpoint)

Real-World Validation: sub-agents-mcp Project

Project Metrics

  • Scope: MCP server implementation for Claude Code/Cursor CLI
  • Scale: 34 test files, 236 test cases
  • Duration: 2 days (vs typical 1 week)
  • Quality: All tests passing, immediately usable

Token Usage Breakdown

Phase Tokens Traditional (Cumulative) Agentic (Independent)
PRD Creation 30K 30K 30K ✓
ADR/Design Doc 60K 90K 60K ✓
Work Planning 30K 120K 30K ✓
Task Breakdown 50K 170K ⚠️ DANGER ZONE 50K ✓
Implementation ×8 480K >200K ♻️ AUTO-COMPACT 60K each ✓
Quality Checks ×8 720K 90K each ✓
Total 770K FAILED COMPLETED

Key Insights

  • Zero context exhaustion (traditional fails at 200K)
  • Consistent quality across all tasks
  • Immediate production readiness
  • Discovered improvements fed back to boilerplate

Getting Started with Your Project

Quick Setup (30 seconds)

# Create and enter project
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install

# Start Claude Code
claude

# Begin implementation
/implement [Your feature description]
Enter fullscreen mode Exit fullscreen mode

Customization for Real Projects

Adapt to your project by modifying:

  1. docs/rules/project-context.md
    • Project type and target users
    • Implementation characteristics
    • Domain-specific requirements
  2. docs/rules/architecture/*.md
    • Architecture patterns (layered, vertical slice, etc.)
    • Technology stack specifics
    • Design principles
  3. docs/rules/rules-index.yaml
    • Map your custom rules
    • Define tags and sections
    • Set rule priorities
  4. CLAUDE.md
    • Project-wide constraints
    • Global rules affecting all work
    • Stop triggers and checkpoints

Key Takeaways

  1. Context Engineering > More Context: Right information at the right time beats information overload

  2. Specialized Agents > One Smart Agent: Single responsibility principle applies to AI

  3. Meta-cognition > Speed: Pausing to think prevents costly rework

  4. Design First > Code First: Documentation preserves intent through implementation

  5. Fail Forward: Let it fail, learn, and systematize the lessons

Join the Movement

The process of watching AI learn from failures and improve is surprisingly satisfying.

Get involved:

Let's build an AI coding environment that learns from failures and continuously improves.

GitHub logo shinpr / ai-coding-project-boilerplate

TypeScript boilerplate optimized for Claude Code - featuring Sub agents, rule-based development, and instant project scaffolding via npx

AI Coding Project Boilerplate 🤖

Read this in other languages: 日本語

TypeScript Node.js Claude Code License: MIT PRs Welcome

🚀 Beat Context Exhaustion with Sub agents - Production-Ready AI Development at Scale

Sub agents orchestration solves the #1 problem in AI coding: context exhaustion. Maintain consistent quality across large projects with specialized agents handling each task independently.

📸 Quick Demo

Demo

Creating a production-ready TypeScript project with sub agents in action

🎯 Real Project Built with This Boilerplate

See what's possible with this boilerplate and Claude Code:

MCP server that enables Claude Code/Cursor CLI to work as sub agents

  • Development Time: ~2 days
  • Scale: ~30 TypeScript files with comprehensive test suite
  • Published: GitHub
  • Features:
    • MCP server implementation specialized for AI CLI tools
    • Enables Claude Code/Cursor CLI to function as sub agents via MCP
    • 3-minute setup with simple installation
    • Production-quality code (tests, type definitions, CI/CD included)

💡 Key Insight: With proper…


Resources

Top comments (0)