Shinsuke KAGAWA

Posted on Aug 23 • Edited on Oct 10

Zero Context Exhaustion: Building Production-Ready AI Coding Teams with Claude Code Sub-agents

#productivity #ai #claudecode #programming

The Problem You're Facing Right Now

Ever had Claude Code stop mid-implementation with "context window exceeded"? Started a complex refactoring only to watch it fail at 200K tokens? Or worse—gotten code that completely missed your requirements because the AI lost track of the original context?

I hit these walls constantly. My breaking point came when a critical MCP server implementation ground to a halt, leaving me with half-finished code and no way forward except starting over.

The Solution That Actually Works

This boilerplate transforms those failures into success. In our proof-of-concept sub-agents-mcp, the same implementation that died at 200K tokens was completed with 770K tokens processed, zero context exhaustion, 236 tests passing—in just 2 days.

Here's how: Instead of one overloaded AI agent, you get a team of specialized agents, each with fresh context for their specific task. The orchestrator manages them, the meta-cognition layer keeps them on track, and you get production-ready code every time.

Try It First (Takes 5 Minutes)

# Create project (30 seconds)
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install

# Start Claude Code
claude

# Begin auto-implementation
/implement Add a help feature

Requirements analysis, design documentation, implementation, quality checks, and commits all run automatically.

Who benefits:

Solo developers: Small apps auto-generated with quality assurance built-in
Teams: Drastically reduced Pull Request review costs
OSS maintainers: Consistent rules applied to contributor code

What Actually Happens When You Run It

Here's what the /implement command orchestrates behind the scenes:

/implement Create help function with search capability

# Phase 1: Requirements Analysis (30s)
→ requirement-analyzer determines scale: Medium (4-5 files)
→ Asks clarifying questions about format, search type, interface

# Phase 2: Design Documentation (3-5 min)
→ technical-designer creates Design Doc with architecture
→ document-reviewer validates consistency and completeness
→ Iterative refinement until approval (typically 1-2 cycles)

# Phase 3: Work Planning (2 min)
→ work-planner creates 16 tasks across 4 phases
→ User approval required for autonomous execution

# Phase 4: Autonomous Implementation (10-20 min per task × 16 tasks)
→ task-decomposer breaks into atomic commits
→ task-executor implements with TDD (Red-Green-Refactor)
→ quality-fixer ensures lint/type/test compliance
→ Auto-commits after each quality-assured task

# Result: Production-ready code with:
✓ Full type safety (zero any types)
✓ 100% test coverage for new code
✓ Validated against acceptance criteria
✓ Ready for deployment

This entire process runs with minimal human intervention after initial requirements clarification, demonstrating how specialized agents collaborate to deliver quality code.

Background: My Journey with AI Coding

Like many of you, I've experienced Claude Code producing unexpected implementations. As an Engineering Manager who enjoys systematizing processes, I practiced "Agentic Coding" using Claude Code's Sub-agents—deliberately allowing failures and feeding them back into the system. Watching implementations fail initially felt uncomfortable, but it became a valuable learning process.

I've published the results as AI Coding Project Boilerplate. This article introduces practical methods you can use in real projects through this boilerplate.

The Core Problem: Why Traditional AI Coding Fails

When you delegate everything to a single AI agent, you encounter:

Context becomes too long and execution stops midway
Implementations deviate from your intent
Upon finding errors, it impulsively starts fixing, breaking overall consistency
Attempts large-scale changes at once and loses control

The Solution: Agentic Coding

Agentic Coding shifts from "conversing with one AI agent" to "collaborating with a team of specialized AI agents."

Claude Code's Sub-agents feature creates specialized AI agents that delegate tasks. Each agent maintains independent context, preserving relevant information while completing single responsibilities with maximum accuracy.

My boilerplate coordinates nine specialized agents:

requirement-analyzer: Requirements analysis
technical-designer: Architecture and design documentation
task-executor: TDD-based implementation
quality-fixer: Quality assurance and fixes
rule-advisor: Meta-cognition and context optimization

Why Context Engineering Matters More Than You Think

The Paradigm Shift

"Context Engineering" (gaining attention since 2024) is about building mechanisms that provide the right context at the right time for LLMs to make appropriate decisions.

Key focus areas:

Preventing missing background information
Avoiding context blur from unnecessary information
Recognizing that LLMs, like humans, need proper context for quality output

Measurable Impact: Before vs After

Metric	Before (Traditional)	After (Context Engineering)
Rule Provision	All rules (40+ sections in 10 files)	Only necessary sections (3-5 sections)
Context Exhaustion	Occurs at 200K tokens	No exhaustion even at 770K tokens
Implementation Accuracy	Frequent deviations	236 tests all passing
Development Period	1+ week for similar scale	Completed in 2 days

Data from sub-agents-mcp project

Single Responsibility Principle for AI Agents

Each sub-agent has one clear responsibility:

Agent	Responsibility	Context Usage (Measured)
prd-creator	PRD (Product Requirements Doc) creation	30K tokens
technical-designer	ADR/Design Doc creation	60K tokens
work-planner	Work plan creation	30K tokens
task-decomposer	Task breakdown	50K tokens
task-executor	TDD implementation	5-60K tokens/task
quality-fixer	Quality checks and fixes	90K tokens/task
rule-advisor	Meta-cognition and rule selection	15K tokens

In sub-agents-mcp, we processed 8 tasks totaling ~770K tokens. Each agent's independent context prevented main agent exhaustion.

The Secret Sauce: Providing Information at the Right Time

The rule-advisor sub-agent handles interactive work. When TodoWrite is called, it:

Analyzes task essence
Selects relevant rules
Returns contextualized guidance

This prevents LLMs from being "too helpful"—rushing to answer without proper context.

Evolution process:

Initial: Too many rules → ignored important ones
Iteration: Too few rules → unstable behavior
Current: Dynamic rule selection via meta-cognition

Design Documentation as Foundation

The boilerplate enforces design-document-driven development:

PRD (Product Requirements Doc): What to build and why
ADR (Architecture Decision Record): Why specific technologies
Design Doc: How to implement and design rationale

These documents preserve implementation background throughout the process.

How It Works in Practice

The `/implement` Command Flow

Using /implement activates the orchestrator pattern:

/implement Add a help feature

Step 1: Requirements Analysis

requirement-analyzer evaluates scope and determines the process:

Scale-Based Patterns

Small (1-2 files): Simple plan → Implementation → QA
Medium (3-5 files): Requirements → Design → Implementation → QA
Large (6+ files): PRD → ADR → Design → Implementation → QA

Step 2: Design Documentation with Review Loop

Real-world example from the help function implementation:

● technical-designer (Create Design Doc for help system)
└ Done (12 tool uses・27.0k tokens・2m 53.3s)

● document-reviewer (Review Design Doc consistency)
└ Done (11 tool uses・27.8k tokens・2m 41.4s)

● Design Document Review Result - APPROVED WITH CONDITIONS

Technical Feasibility: 90/100
Completeness Score: 85/100

Critical Issues Found:
1. JSON Validation Library: Need to specify Zod or AJV+TypeBox for schema validation
2. Dependency Management: Must add explicit versions and TypeScript definitions
3. Error Handling: Should implement Result<T, E> pattern per TypeScript rules

Important Issues:
1. Performance Expectations: Adjust from sub-100ms to realistic 50-200ms range
2. Integration Strategy: Define module export and dependency injection patterns

Your Decision Required:
A) Quick Fix & Proceed - Address critical issues only (~30 min)
B) Thorough Revision - Address all issues (~1-2 hours)
C) Proceed As-Is - Accept limitations, fix during implementation

> A

● technical-designer (Update Design Doc with fixes)
└ Done (10 tool uses・29.7k tokens・3m 15.5s)

Changes Applied:
✓ Added Zod for JSON schema validation with TypeScript inference
✓ Specified all dependencies with exact versions
✓ Implemented Result<T, E> pattern for error handling
✓ Adjusted performance targets to 50-200ms range
✓ Added dependency injection pattern for testability

● document-reviewer (Verify Design Doc updates)
└ Done (4 tool uses・23.1k tokens・39.9s)

● All critical issues resolved - APPROVED

The iterative review ensures technical soundness before implementation begins.

Step 3: Task Breakdown and Implementation

work-planner creates phases with quality guardrails:

TDD (Test-Driven Development) approach enforced
Integration tests at meaningful boundaries
Acceptance criteria verification at minimum scale

task-decomposer then generates atomic tasks (one logical commit each).

Key Learning: Layer-by-layer implementation delays integration issues. We now use vertical slices for early validation.

Step 4: Implementation and Quality Loop

For each task:

task-executor implements using TDD (writes test → implements → refactors)
quality-fixer runs comprehensive checks (lint, format, type-check, all tests)
Main LLM commits upon quality assurance

This separation prevents context exhaustion during quality checks.

The Meta-Cognition Innovation: rule-advisor

The Problem It Solves

LLMs exhibit "helpfulness bias"—rushing to answer/fix without proper context, causing:

Missing important project rules
Skipping background understanding
Shortsighted fixes requiring rework

How rule-advisor Works

rule-advisor implements meta-cognition by:

Intercepting TodoWrite calls: Forces a pause before action
Analyzing task essence: Understanding the "why" before the "how"
Selecting relevant rules: From dozens of pages, extracts only what's needed

Actual Output from Production Use:

{
  "taskAnalysis": {
    "taskType": "fix",
    "estimatedFiles": 1,
    "mainFocus": "Fix TypeScript type safety and error handling issues",
    "requiredTags": ["type-safety", "implementation", "quality", "debugging", "error-handling"]
  },
  "selectedRules": [
    {
      "file": "@docs/rules/typescript.md",
      "sections": [
        {
          "title": "Type Safety",
          "content": "**Absolute Rule**: any type is completely prohibited.\n\n**any Type Alternatives (Priority Order)**\n1. **unknown Type + Type Guards**: Use for validating external input\n2. **Generics**: When type flexibility is needed\n3. **Union Types**: Combinations of multiple types\n4. **Type Assertions (Last Resort)**: Only when type is certain"
        }
      ],
      "reason": "Essential TypeScript type safety rules to fix any type usage",
      "priority": "high"
    }
  ],
  "mandatoryChecks": {
    "taskEssence": "Ensuring comprehensive type safety (not just fixing surface errors)",
    "ruleAdequacy": "Selected rules directly address any type usage and error handling",
    "pastFailures": ["quick fixes without proper typing", "ignoring null/undefined cases"],
    "firstStep": "Replace all any types with proper TypeScript types and type guards"
  },
  "metaCognitiveQuestions": [
    "What is the root cause of each type safety violation?",
    "How can we implement proper error handling without suppressing errors?",
    "Are there edge cases (null, undefined, zero division) that need handling?",
    "Should we implement input validation with type guards for robustness?"
  ],
  "criticalRules": [
    "Complete prohibition of any type - use unknown + type guards instead",
    "All errors must have proper handling - no error suppression",
    "Null/undefined checks mandatory before operations"
  ],
  "warningPatterns": [
    "any type usage → Replace with unknown + type guards",
    "Unsafe type assertions (as) → Implement proper type validation",
    "Missing null/undefined checks → Add explicit validation",
    "Error suppression in try-catch → Proper error handling with logging"
  ],
  "firstActionGuidance": {
    "action": "Start by replacing all any types with proper TypeScript types",
    "rationale": "Type safety is the foundation - fixing types first will reveal other issues"
  },
  "confidence": "high"
}

For manual override, I created /task command to force meta-cognition.

Technical Implementation Details

Sub-agent Definition

Agents are defined in .claude/agents/ as Markdown with YAML frontmatter:

---
name: task-executor
description: Specialized agent for steadily executing individual tasks
tools: Read, Edit, Write, MultiEdit, Bash, Grep, Glob, LS, TodoWrite
---

You are a specialized AI assistant that reliably executes individual tasks.

Key responsibilities include:
- TDD implementation following Red-Green-Refactor
- Progress tracking across task files and work plans  
- Dependency analysis and incremental verification
- Structured JSON reporting upon completion

Full implementation details: https://github.com/shinpr/ai-coding-project-boilerplate/blob/main/.claude/agents/task-executor.md

Rule System Architecture

# rules-index.yaml
rules:
  typescript:
    file: "typescript.md"
    tags: [implementation, type-safety, async]
    typical-use: "Creating/modifying TypeScript code"
    sections:
      - "Basic Principles"
      - "Type Safety"
      - "Error Handling"

Quality Assurance Mechanisms

Automatic triggers in CLAUDE.md:

5+ file changes: Report scope and pause
Same error 3x: Force root cause analysis
Every 3 files: Update TodoWrite (checkpoint)

Real-World Validation: sub-agents-mcp Project

Project Metrics

Scope: MCP server implementation for Claude Code/Cursor CLI
Scale: 34 test files, 236 test cases
Duration: 2 days (vs typical 1 week)
Quality: All tests passing, immediately usable

Token Usage Breakdown

Phase	Tokens	Traditional (Cumulative)	Agentic (Independent)
PRD Creation	30K	30K	30K ✓
ADR/Design Doc	60K	90K	60K ✓
Work Planning	30K	120K	30K ✓
Task Breakdown	50K	170K ⚠️ DANGER ZONE	50K ✓
Implementation ×8	480K	>200K ♻️ AUTO-COMPACT	60K each ✓
Quality Checks ×8	720K	—	90K each ✓
Total	770K	FAILED	COMPLETED

Key Insights

Zero context exhaustion (traditional fails at 200K)
Consistent quality across all tasks
Immediate production readiness
Discovered improvements fed back to boilerplate

Getting Started with Your Project

Quick Setup (30 seconds)

# Create and enter project
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install

# Start Claude Code
claude

# Begin implementation
/implement [Your feature description]

Customization for Real Projects

Adapt to your project by modifying:

docs/rules/project-context.md
- Project type and target users
- Implementation characteristics
- Domain-specific requirements
docs/rules/architecture/*.md
- Architecture patterns (layered, vertical slice, etc.)
- Technology stack specifics
- Design principles
docs/rules/rules-index.yaml
- Map your custom rules
- Define tags and sections
- Set rule priorities
CLAUDE.md
- Project-wide constraints
- Global rules affecting all work
- Stop triggers and checkpoints

Key Takeaways

Context Engineering > More Context: Right information at the right time beats information overload
Specialized Agents > One Smart Agent: Single responsibility principle applies to AI
Meta-cognition > Speed: Pausing to think prevents costly rework
Design First > Code First: Documentation preserves intent through implementation
Fail Forward: Let it fail, learn, and systematize the lessons

Join the Movement

The process of watching AI learn from failures and improve is surprisingly satisfying.

Get involved:

⭐ Star AI Coding Project Boilerplate
🐛 Report issues in GitHub Issues
🔧 Submit improvements via PRs
💬 Share your results and learnings

Let's build an AI coding environment that learns from failures and continuously improves.

shinpr / ai-coding-project-boilerplate

TypeScript boilerplate optimized for Claude Code — Sub agents & instant scaffolding

AI Coding Project Boilerplate 🤖

Read this in other languages: 日本語

⚡ This boilerplate is for developers who want to:

Build production-ready TypeScript projects faster with AI
Avoid context exhaustion in long AI coding sessions
Standardize team workflows with specialized AI agents

📖 Table of Contents

Which should you use?

Use this Boilerplate if you're on Claude Code and building TypeScript apps with a rich sub-agent setup.

Use Agentic Code if you want zero-config, tool-agnostic workflows without language restrictions (Codex CLI/Cursor/Aider etc.).

⚡ Quick Start (3 Steps)

# 1. Create your project (30 seconds)
npx github:shinpr/ai-coding-project-boilerplate my-project
# 2. Install dependencies (automatic)
cd my-project && npm install

# 3. Launch Claude Code and configure
claude                    # Launch Claude Code
/project-inject

…

View on GitHub

[UPDATE] Plugin version available - No TypeScript setup required!

The boilerplate showcased in this article is designed for TypeScript projects, meaning you need to set up a TypeScript environment to get started. To make these workflows accessible to everyone regardless of language preference, I’ve released a plugin-based version that requires zero setup!

shinpr / claude-code-workflows

🤖 AI coding team in a plugin: Automated workflows from requirements to deployment. Never hit context limits. Never compromise on quality.

Claude Code Workflows Plugin 🚀

Professional development workflows for Claude Code - Language-agnostic best practices, specialized agents, and quality assurance patterns for building production-ready software.

⚡ Quick Start

# 1. Start Claude Code session
claude

# 2. Inside Claude Code, install the plugin
/plugin marketplace add shinpr/claude-code-workflows
/plugin install claude-code-workflows@shinpr

# 3. Start building with full workflow support
/implement <your feature>

Note: If you encounter SSH authentication errors, set up SSH keys for GitHub:

# 1. Check if SSH key already exists
ls ~/.ssh/id_ed25519.pub
# 2. Generate new SSH key (if needed)
ssh-keygen -t ed25519 -C "your_email@example.com"
# → Press Enter to save to default location
# → Enter a strong passphrase when prompted (recommended for security)

# 3. Add SSH key to ssh-agent
eval "$(ssh-agent -s)"
ssh-add ~/.ssh/id_ed25519

# 4. Copy public key to clipboard
cat ~/.ssh/id_ed25519.pub
#

…

View on GitHub

Just run these commands in your Claude Code session and restart - you’ll have access to the same commands and sub-agents:

/plugin marketplace add shinpr/claude-code-workflows
/plugin install claude-code-workflows@shinpr

No npm install, no TypeScript configuration needed. Try the workflows immediately and see if this approach fits your development style!

Resources

sub-agents-mcp - Proof-of-concept implementation
Claude Code Sub-agents Documentation - Official docs
Model Context Protocol - MCP specification
Context Engineering: Bringing Engineering Discipline to Prompts - Foundational reading

Top comments (2)

Raul P. • Oct 11 • Edited

This is great for the reading only agents. For the writing ones, thats a lot of trust... especially with claude, where you must police what it's doing. I guess the confirmation prompts will work most of the time, and you could just review the commits later, so for certain things, this fits the purpose and is a great workflow. thanks

Shinsuke KAGAWA • Oct 11

Thanks! Yeah, I’ve had the same concerns. That’s why I’m letting the agents handle the full writing flow and feeding failures back into the system. The improvements to confirmation prompts and commit granularity came from those failures — I learned the hard way that things can get skipped unintentionally.
Still refining the workflow as I go, so feel free to check back on the project!

The Problem You're Facing Right Now

The Solution That Actually Works

Try It First (Takes 5 Minutes)

What Actually Happens When You Run It

Background: My Journey with AI Coding

The Core Problem: Why Traditional AI Coding Fails

The Solution: Agentic Coding

Why Context Engineering Matters More Than You Think

The Paradigm Shift

Measurable Impact: Before vs After

Single Responsibility Principle for AI Agents

The Secret Sauce: Providing Information at the Right Time

Design Documentation as Foundation

How It Works in Practice

The /implement Command Flow

Step 1: Requirements Analysis

Step 2: Design Documentation with Review Loop

Step 3: Task Breakdown and Implementation

Step 4: Implementation and Quality Loop

The Meta-Cognition Innovation: rule-advisor

The Problem It Solves

How rule-advisor Works

Technical Implementation Details

Sub-agent Definition

Rule System Architecture

Quality Assurance Mechanisms

Real-World Validation: sub-agents-mcp Project

Project Metrics

Token Usage Breakdown

Key Insights

Getting Started with Your Project

Quick Setup (30 seconds)

Customization for Real Projects

Key Takeaways

Join the Movement

shinpr / ai-coding-project-boilerplate

TypeScript boilerplate optimized for Claude Code — Sub agents & instant scaffolding

AI Coding Project Boilerplate 🤖

📖 Table of Contents

⚡ Quick Start (3 Steps)

[UPDATE] Plugin version available - No TypeScript setup required!

shinpr / claude-code-workflows

🤖 AI coding team in a plugin: Automated workflows from requirements to deployment. Never hit context limits. Never compromise on quality.

Claude Code Workflows Plugin 🚀

⚡ Quick Start

Resources

The `/implement` Command Flow