The Problem You're Facing Right Now
Ever had Claude Code stop mid-implementation with "context window exceeded"? Started a complex refactoring only to watch it fail at 200K tokens? Or worse—gotten code that completely missed your requirements because the AI lost track of the original context?
I hit these walls constantly. My breaking point came when a critical MCP server implementation ground to a halt, leaving me with half-finished code and no way forward except starting over.
The Solution That Actually Works
This boilerplate transforms those failures into success. In our proof-of-concept sub-agents-mcp, the same implementation that died at 200K tokens was completed with 770K tokens processed, zero context exhaustion, 236 tests passing—in just 2 days.
Here's how: Instead of one overloaded AI agent, you get a team of specialized agents, each with fresh context for their specific task. The orchestrator manages them, the meta-cognition layer keeps them on track, and you get production-ready code every time.
Try It First (Takes 5 Minutes)
# Create project (30 seconds)
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install
# Start Claude Code
claude
# Begin auto-implementation
/implement Add a help feature
Requirements analysis, design documentation, implementation, quality checks, and commits all run automatically.
Who benefits:
- Solo developers: Small apps auto-generated with quality assurance built-in
- Teams: Drastically reduced Pull Request review costs
- OSS maintainers: Consistent rules applied to contributor code
What Actually Happens When You Run It
Here's what the /implement
command orchestrates behind the scenes:
/implement Create help function with search capability
# Phase 1: Requirements Analysis (30s)
→ requirement-analyzer determines scale: Medium (4-5 files)
→ Asks clarifying questions about format, search type, interface
# Phase 2: Design Documentation (3-5 min)
→ technical-designer creates Design Doc with architecture
→ document-reviewer validates consistency and completeness
→ Iterative refinement until approval (typically 1-2 cycles)
# Phase 3: Work Planning (2 min)
→ work-planner creates 16 tasks across 4 phases
→ User approval required for autonomous execution
# Phase 4: Autonomous Implementation (10-20 min per task × 16 tasks)
→ task-decomposer breaks into atomic commits
→ task-executor implements with TDD (Red-Green-Refactor)
→ quality-fixer ensures lint/type/test compliance
→ Auto-commits after each quality-assured task
# Result: Production-ready code with:
✓ Full type safety (zero any types)
✓ 100% test coverage for new code
✓ Validated against acceptance criteria
✓ Ready for deployment
This entire process runs with minimal human intervention after initial requirements clarification, demonstrating how specialized agents collaborate to deliver quality code.
Background: My Journey with AI Coding
Like many of you, I've experienced Claude Code producing unexpected implementations. As an Engineering Manager who enjoys systematizing processes, I practiced "Agentic Coding" using Claude Code's Sub-agents—deliberately allowing failures and feeding them back into the system. Watching implementations fail initially felt uncomfortable, but it became a valuable learning process.
I've published the results as AI Coding Project Boilerplate. This article introduces practical methods you can use in real projects through this boilerplate.
The Core Problem: Why Traditional AI Coding Fails
When you delegate everything to a single AI agent, you encounter:
- Context becomes too long and execution stops midway
- Implementations deviate from your intent
- Upon finding errors, it impulsively starts fixing, breaking overall consistency
- Attempts large-scale changes at once and loses control
The Solution: Agentic Coding
Agentic Coding shifts from "conversing with one AI agent" to "collaborating with a team of specialized AI agents."
Claude Code's Sub-agents feature creates specialized AI agents that delegate tasks. Each agent maintains independent context, preserving relevant information while completing single responsibilities with maximum accuracy.
My boilerplate coordinates nine specialized agents:
- requirement-analyzer: Requirements analysis
- technical-designer: Architecture and design documentation
- task-executor: TDD-based implementation
- quality-fixer: Quality assurance and fixes
- rule-advisor: Meta-cognition and context optimization
Why Context Engineering Matters More Than You Think
The Paradigm Shift
"Context Engineering" (gaining attention since 2024) is about building mechanisms that provide the right context at the right time for LLMs to make appropriate decisions.
Key focus areas:
- Preventing missing background information
- Avoiding context blur from unnecessary information
- Recognizing that LLMs, like humans, need proper context for quality output
Measurable Impact: Before vs After
Metric | Before (Traditional) | After (Context Engineering) |
---|---|---|
Rule Provision | All rules (40+ sections in 10 files) | Only necessary sections (3-5 sections) |
Context Exhaustion | Occurs at 200K tokens | No exhaustion even at 770K tokens |
Implementation Accuracy | Frequent deviations | 236 tests all passing |
Development Period | 1+ week for similar scale | Completed in 2 days |
Data from sub-agents-mcp project
Single Responsibility Principle for AI Agents
Each sub-agent has one clear responsibility:
Agent | Responsibility | Context Usage (Measured) |
---|---|---|
prd-creator | PRD (Product Requirements Doc) creation | 30K tokens |
technical-designer | ADR/Design Doc creation | 60K tokens |
work-planner | Work plan creation | 30K tokens |
task-decomposer | Task breakdown | 50K tokens |
task-executor | TDD implementation | 5-60K tokens/task |
quality-fixer | Quality checks and fixes | 90K tokens/task |
rule-advisor | Meta-cognition and rule selection | 15K tokens |
In sub-agents-mcp, we processed 8 tasks totaling ~770K tokens. Each agent's independent context prevented main agent exhaustion.
The Secret Sauce: Providing Information at the Right Time
The rule-advisor
sub-agent handles interactive work. When TodoWrite is called, it:
- Analyzes task essence
- Selects relevant rules
- Returns contextualized guidance
This prevents LLMs from being "too helpful"—rushing to answer without proper context.
Evolution process:
- Initial: Too many rules → ignored important ones
- Iteration: Too few rules → unstable behavior
- Current: Dynamic rule selection via meta-cognition
Design Documentation as Foundation
The boilerplate enforces design-document-driven development:
- PRD (Product Requirements Doc): What to build and why
- ADR (Architecture Decision Record): Why specific technologies
- Design Doc: How to implement and design rationale
These documents preserve implementation background throughout the process.
How It Works in Practice
The /implement
Command Flow
Using /implement
activates the orchestrator pattern:
/implement Add a help feature
Step 1: Requirements Analysis
requirement-analyzer
evaluates scope and determines the process:
Scale-Based Patterns
- Small (1-2 files): Simple plan → Implementation → QA
- Medium (3-5 files): Requirements → Design → Implementation → QA
- Large (6+ files): PRD → ADR → Design → Implementation → QA
Step 2: Design Documentation with Review Loop
Real-world example from the help function implementation:
● technical-designer (Create Design Doc for help system)
└ Done (12 tool uses・27.0k tokens・2m 53.3s)
● document-reviewer (Review Design Doc consistency)
└ Done (11 tool uses・27.8k tokens・2m 41.4s)
● Design Document Review Result - APPROVED WITH CONDITIONS
Technical Feasibility: 90/100
Completeness Score: 85/100
Critical Issues Found:
1. JSON Validation Library: Need to specify Zod or AJV+TypeBox for schema validation
2. Dependency Management: Must add explicit versions and TypeScript definitions
3. Error Handling: Should implement Result<T, E> pattern per TypeScript rules
Important Issues:
1. Performance Expectations: Adjust from sub-100ms to realistic 50-200ms range
2. Integration Strategy: Define module export and dependency injection patterns
Your Decision Required:
A) Quick Fix & Proceed - Address critical issues only (~30 min)
B) Thorough Revision - Address all issues (~1-2 hours)
C) Proceed As-Is - Accept limitations, fix during implementation
> A
● technical-designer (Update Design Doc with fixes)
└ Done (10 tool uses・29.7k tokens・3m 15.5s)
Changes Applied:
✓ Added Zod for JSON schema validation with TypeScript inference
✓ Specified all dependencies with exact versions
✓ Implemented Result<T, E> pattern for error handling
✓ Adjusted performance targets to 50-200ms range
✓ Added dependency injection pattern for testability
● document-reviewer (Verify Design Doc updates)
└ Done (4 tool uses・23.1k tokens・39.9s)
● All critical issues resolved - APPROVED
The iterative review ensures technical soundness before implementation begins.
Step 3: Task Breakdown and Implementation
work-planner
creates phases with quality guardrails:
- TDD (Test-Driven Development) approach enforced
- Integration tests at meaningful boundaries
- Acceptance criteria verification at minimum scale
task-decomposer
then generates atomic tasks (one logical commit each).
Key Learning: Layer-by-layer implementation delays integration issues. We now use vertical slices for early validation.
Step 4: Implementation and Quality Loop
For each task:
-
task-executor
implements using TDD (writes test → implements → refactors) -
quality-fixer
runs comprehensive checks (lint, format, type-check, all tests) - Main LLM commits upon quality assurance
This separation prevents context exhaustion during quality checks.
The Meta-Cognition Innovation: rule-advisor
The Problem It Solves
LLMs exhibit "helpfulness bias"—rushing to answer/fix without proper context, causing:
- Missing important project rules
- Skipping background understanding
- Shortsighted fixes requiring rework
How rule-advisor Works
rule-advisor
implements meta-cognition by:
- Intercepting TodoWrite calls: Forces a pause before action
- Analyzing task essence: Understanding the "why" before the "how"
- Selecting relevant rules: From dozens of pages, extracts only what's needed
Actual Output from Production Use:
{
"taskAnalysis": {
"taskType": "fix",
"estimatedFiles": 1,
"mainFocus": "Fix TypeScript type safety and error handling issues",
"requiredTags": ["type-safety", "implementation", "quality", "debugging", "error-handling"]
},
"selectedRules": [
{
"file": "@docs/rules/typescript.md",
"sections": [
{
"title": "Type Safety",
"content": "**Absolute Rule**: any type is completely prohibited.\n\n**any Type Alternatives (Priority Order)**\n1. **unknown Type + Type Guards**: Use for validating external input\n2. **Generics**: When type flexibility is needed\n3. **Union Types**: Combinations of multiple types\n4. **Type Assertions (Last Resort)**: Only when type is certain"
}
],
"reason": "Essential TypeScript type safety rules to fix any type usage",
"priority": "high"
}
],
"mandatoryChecks": {
"taskEssence": "Ensuring comprehensive type safety (not just fixing surface errors)",
"ruleAdequacy": "Selected rules directly address any type usage and error handling",
"pastFailures": ["quick fixes without proper typing", "ignoring null/undefined cases"],
"firstStep": "Replace all any types with proper TypeScript types and type guards"
},
"metaCognitiveQuestions": [
"What is the root cause of each type safety violation?",
"How can we implement proper error handling without suppressing errors?",
"Are there edge cases (null, undefined, zero division) that need handling?",
"Should we implement input validation with type guards for robustness?"
],
"criticalRules": [
"Complete prohibition of any type - use unknown + type guards instead",
"All errors must have proper handling - no error suppression",
"Null/undefined checks mandatory before operations"
],
"warningPatterns": [
"any type usage → Replace with unknown + type guards",
"Unsafe type assertions (as) → Implement proper type validation",
"Missing null/undefined checks → Add explicit validation",
"Error suppression in try-catch → Proper error handling with logging"
],
"firstActionGuidance": {
"action": "Start by replacing all any types with proper TypeScript types",
"rationale": "Type safety is the foundation - fixing types first will reveal other issues"
},
"confidence": "high"
}
For manual override, I created /task
command to force meta-cognition.
Technical Implementation Details
Sub-agent Definition
Agents are defined in .claude/agents/
as Markdown with YAML frontmatter:
---
name: task-executor
description: Specialized agent for steadily executing individual tasks
tools: Read, Edit, Write, MultiEdit, Bash, Grep, Glob, LS, TodoWrite
---
You are a specialized AI assistant that reliably executes individual tasks.
Key responsibilities include:
- TDD implementation following Red-Green-Refactor
- Progress tracking across task files and work plans
- Dependency analysis and incremental verification
- Structured JSON reporting upon completion
Full implementation details: https://github.com/shinpr/ai-coding-project-boilerplate/blob/main/.claude/agents/task-executor.md
Rule System Architecture
# rules-index.yaml
rules:
typescript:
file: "typescript.md"
tags: [implementation, type-safety, async]
typical-use: "Creating/modifying TypeScript code"
sections:
- "Basic Principles"
- "Type Safety"
- "Error Handling"
Quality Assurance Mechanisms
Automatic triggers in CLAUDE.md:
- 5+ file changes: Report scope and pause
- Same error 3x: Force root cause analysis
- Every 3 files: Update TodoWrite (checkpoint)
Real-World Validation: sub-agents-mcp Project
Project Metrics
- Scope: MCP server implementation for Claude Code/Cursor CLI
- Scale: 34 test files, 236 test cases
- Duration: 2 days (vs typical 1 week)
- Quality: All tests passing, immediately usable
Token Usage Breakdown
Phase | Tokens | Traditional (Cumulative) | Agentic (Independent) |
---|---|---|---|
PRD Creation | 30K | 30K | 30K ✓ |
ADR/Design Doc | 60K | 90K | 60K ✓ |
Work Planning | 30K | 120K | 30K ✓ |
Task Breakdown | 50K | 170K ⚠️ DANGER ZONE | 50K ✓ |
Implementation ×8 | 480K | >200K ♻️ AUTO-COMPACT | 60K each ✓ |
Quality Checks ×8 | 720K | — | 90K each ✓ |
Total | 770K | FAILED | COMPLETED |
Key Insights
- Zero context exhaustion (traditional fails at 200K)
- Consistent quality across all tasks
- Immediate production readiness
- Discovered improvements fed back to boilerplate
Getting Started with Your Project
Quick Setup (30 seconds)
# Create and enter project
npx github:shinpr/ai-coding-project-boilerplate my-project
cd my-project && npm install
# Start Claude Code
claude
# Begin implementation
/implement [Your feature description]
Customization for Real Projects
Adapt to your project by modifying:
-
docs/rules/project-context.md
- Project type and target users
- Implementation characteristics
- Domain-specific requirements
-
docs/rules/architecture/*.md
- Architecture patterns (layered, vertical slice, etc.)
- Technology stack specifics
- Design principles
-
docs/rules/rules-index.yaml
- Map your custom rules
- Define tags and sections
- Set rule priorities
-
CLAUDE.md
- Project-wide constraints
- Global rules affecting all work
- Stop triggers and checkpoints
Key Takeaways
Context Engineering > More Context: Right information at the right time beats information overload
Specialized Agents > One Smart Agent: Single responsibility principle applies to AI
Meta-cognition > Speed: Pausing to think prevents costly rework
Design First > Code First: Documentation preserves intent through implementation
Fail Forward: Let it fail, learn, and systematize the lessons
Join the Movement
The process of watching AI learn from failures and improve is surprisingly satisfying.
Get involved:
🐛 Report issues in GitHub Issues
🔧 Submit improvements via PRs
💬 Share your results and learnings
Let's build an AI coding environment that learns from failures and continuously improves.
shinpr
/
ai-coding-project-boilerplate
TypeScript boilerplate optimized for Claude Code - featuring Sub agents, rule-based development, and instant project scaffolding via npx
AI Coding Project Boilerplate 🤖
Read this in other languages: 日本語
🚀 Beat Context Exhaustion with Sub agents - Production-Ready AI Development at Scale
Sub agents orchestration solves the #1 problem in AI coding: context exhaustion. Maintain consistent quality across large projects with specialized agents handling each task independently.
📸 Quick Demo
Creating a production-ready TypeScript project with sub agents in action
🎯 Real Project Built with This Boilerplate
See what's possible with this boilerplate and Claude Code:
MCP server that enables Claude Code/Cursor CLI to work as sub agents
- Development Time: ~2 days
- Scale: ~30 TypeScript files with comprehensive test suite
- Published: GitHub
-
Features:
- MCP server implementation specialized for AI CLI tools
- Enables Claude Code/Cursor CLI to function as sub agents via MCP
- 3-minute setup with simple installation
- Production-quality code (tests, type definitions, CI/CD included)
💡 Key Insight: With proper…
Resources
sub-agents-mcp - Proof-of-concept implementation
Claude Code Sub-agents Documentation - Official docs
Model Context Protocol - MCP specification
Context Engineering: Bringing Engineering Discipline to Prompts - Foundational reading
Top comments (0)