Manufacturing-Inspired Multi-Agent Architecture
Version: 1.0
Date: 2026-04-02
Status: Design Specification
Table of Contents
- Problem Statement
- Core Philosophy
- Architecture Overview
- Task Card Schema
- Agent Specifications
- Knowledge Base System
- Quality Gates & Frameworks
- Implementation Guide
- Cost Analysis
Problem Statement
Current AI Usage Patterns (Broken)
- Context Window Bloat: Single agent handles everything → 200k tokens of mixed concerns
- Expensive Orchestration: Manual model switching (Opus for planning, Sonnet for execution)
- Poor Focus: Agent context includes requirements + code + tests + debug logs all at once
- High Cognitive Load: Human plays traffic controller, deciding which model for which task
- Subscription Fatigue: Multiple AI services, multiple models, complex pricing
The Insight
"We don't need exceptional AI - we need an exceptional system."
— Manufacturing principle applied to AI workflows
Like Ford's assembly line didn't require master craftsmen, we don't need AGI. We need specialized agents in a robust process.
Core Philosophy
Borrowed from Manufacturing
1. Ford Assembly Line
- Each station does ONE thing well
- Clear handoffs between stations
- Parallel execution only when truly beneficial (in AI: almost never)
- Sequential = cleaner, cheaper, more reliable
2. Six Sigma (DMAIC)
- Define acceptance criteria upfront
- Measure with automated tests
- Analyze failures systematically
- Improve iteratively
- Control with quality gates
3. Kaizen (Continuous Improvement)
- After each task: what worked? what failed?
- Build institutional knowledge
- Baseline improves over time
4. Poka-Yoke (Error-Proofing)
- Make bad outputs impossible
- Gates prevent defects from propagating
- Type checking, linting, security scans = automatic
5. Andon Cord
- Agent pulls cord when stuck
- Human intervention only when needed
- Clear escalation criteria
Key Principle: Process > Individual Capability
Manufacturing doesn't ask: "Is this worker skilled enough?"
Manufacturing asks: "Does the process guarantee quality?"
AI system shouldn't ask: "Is this model smart enough?"
AI system should ask: "Do the gates catch defects?"
Architecture Overview
High-Level Flow
Human creates task → Card enters Kanban board → Agents process sequentially → Output delivered
Kanban Board:
┌─────────┬──────────────┬────────────────┬──────┬────────────┬────────────┐
│ Backlog │ Requirements │ Implementation │ QA │ Refinement │ Complete │
├─────────┼──────────────┼────────────────┼──────┼────────────┼────────────┤
│ TASK-1 │ │ │ │ │ │
│ TASK-2 │ │ │ │ │ │
│ │ TASK-3 ←───→ │ (can bounce) │ │ │ │
│ │ │ TASK-4 ───→ │TASK-5│ │ │
│ │ │ │ │ │ TASK-6 ✓ │
└─────────┴──────────────┴────────────────┴──────┴────────────┴────────────┘
↑ ↑ ↑ ↑ ↑
PM Agent Architect Agent Dev Agent QA Agent Cleanup Agent
Why Sequential (Not Parallel)
Human teams parallelize because:
- Idle labor costs money ($60/hr sitting around)
- Delivery speed matters for business
AI agents should serialize because:
- Idle compute costs $0
- Clean handoffs > integration hell
- Smaller contexts = cheaper + faster
- No coordination overhead
Example:
Parallel (traditional):
├── BE Agent: builds API (guesses contracts)
├── FE Agent: builds UI (mocks data)
└── Integration: expensive reconciliation, context passing
Cost: ~$3.50, messy
Sequential (assembly line):
├── BE Agent: builds API + OpenAPI spec
├── FE Agent: reads spec, builds against REAL endpoints
└── Integration: trivial, already matches
Cost: ~$1.50, clean
Task Card Schema
Complete Metadata Structure
{
// Identity
"id": "TASK-1047",
"title": "Build user authentication system",
"type": "feature|bugfix|refactor|research",
"priority": "critical|high|medium|low",
// Routing
"current_stage": "QA",
"from": "Implementation",
"to": "QA",
"reply_to": null, // Set when bouncing back to specific agent
"next_stage": "Deployment",
"prev_stage": "Implementation",
"available_stages": [
"PM",
"Architect",
"Implementation",
"QA",
"Refinement",
"Deployment"
],
// Agent Assignment
"stages_poc": {
"PM": "pm-agent-001",
"Architect": "architect-agent-001",
"Implementation": "dev-agent-001",
"QA": "qa-agent-001",
"Refinement": "refine-agent-001",
"Deployment": "deploy-agent-001"
},
// Knowledge Base (THE CRITICAL PART)
"knowledge_base": {
// Living documents (agents UPDATE these)
"prd.md": "Product requirements...",
"technical_spec.md": "Architecture decisions...",
"api_contract.json": "OpenAPI spec from BE agent",
"test_coverage.md": "What's tested, gaps",
"decisions.md": "Why we chose X over Y",
"known_issues.md": "Current bugs, workarounds",
// Static references (human-provided)
"figma_mockups": [
"screenshot1.png",
"screenshot2.png",
"link: figma.com/..."
],
"user_research": "Interview notes...",
// Meta
"glossary.md": "Project-specific terms",
"faq.md": "Common questions answered once"
},
// Execution State
"context": {
"spec": "User auth with JWT, refresh tokens...",
"code": "// Implementation here",
"test_results": "87% pass, 3 failing tests",
"issues": [
"Login timeout inconsistent",
"Password validation unclear"
],
"metrics": {
"code_coverage": 87,
"security_score": 92,
"performance_ms": 145
}
},
// Audit Trail
"history": [
{
"timestamp": "2026-04-02T10:00:00Z",
"stage": "PM",
"action": "created",
"agent": "pm-agent-001",
"notes": "Initial requirements gathered"
},
{
"timestamp": "2026-04-02T10:15:00Z",
"stage": "Architect",
"action": "spec_approved",
"agent": "architect-agent-001",
"notes": "JWT-based auth, Redis for sessions"
},
{
"timestamp": "2026-04-02T11:30:00Z",
"stage": "Implementation",
"action": "code_complete",
"agent": "dev-agent-001",
"notes": "Auth endpoints implemented"
},
{
"timestamp": "2026-04-02T12:00:00Z",
"stage": "QA",
"action": "tests_failed",
"agent": "qa-agent-001",
"notes": "Password validation spec unclear, bouncing to PM"
}
],
// Quality Gates
"gates": {
"must_pass": [
"all_tests_green",
"security_scan_clean",
"code_coverage_80_percent",
"linter_no_errors",
"performance_under_200ms"
],
"status": {
"all_tests_green": false,
"security_scan_clean": true,
"code_coverage_80_percent": true,
"linter_no_errors": true,
"performance_under_200ms": true
}
},
// Timestamps
"created_at": "2026-04-02T10:00:00Z",
"updated_at": "2026-04-02T12:00:00Z",
"completed_at": null,
"deadline": "2026-04-05T17:00:00Z"
}
Agent Specifications
Agent Protocol (Universal)
Every agent follows this protocol when triggered:
class Agent:
def on_card_enters_column(self, card):
"""Triggered when card enters this agent's stage"""
# 1. READ KNOWLEDGE BASE FIRST (critical!)
knowledge = self.read_knowledge_base(card)
# 2. Check if answer already exists
if self.can_proceed_with_existing_info(knowledge):
result = self.do_work(card, knowledge)
# 3. If unclear, UPDATE KB with question
elif self.needs_clarification():
self.update_kb_with_question(card)
self.bounce_to_previous_stage(card)
return # Wait for response
# 4. If stuck, escalate (Andon Cord)
elif self.is_stuck():
self.pull_andon_cord(card)
return
# 5. Do the work
result = self.do_work(card, knowledge)
# 6. UPDATE KNOWLEDGE BASE with outputs
self.update_knowledge_base(card, result)
# 7. Run quality gates
if self.passes_gates(card):
self.move_card_forward(card)
else:
self.bounce_card(card, reason="Gates failed")
Specific Agent Definitions
1. PM Agent (Requirements)
Agent: pm-agent-001
Stage: PM
Context Window: 10k tokens max
Responsibilities:
- Parse user requirements
- Create initial PRD
- Define acceptance criteria
- Clarify ambiguities
- Update spec based on feedback from other agents
Inputs:
- User's initial request
- Feedback from other agents (reply_to messages)
Outputs:
- knowledge_base/prd.md
- knowledge_base/acceptance_criteria.md
- knowledge_base/user_stories.md
Quality Gates:
- Acceptance criteria are measurable
- No conflicting requirements
- All ambiguities resolved
Andon Cord Triggers:
- User requirements are contradictory
- Scope is too large (>40 hour estimate)
- Missing critical information user must provide
2. Architect Agent (Technical Design)
Agent: architect-agent-001
Stage: Architect
Context Window: 15k tokens max
Responsibilities:
- Design system architecture
- Define API contracts
- Choose tech stack
- Document technical decisions
- Review implementation for architecture compliance
Inputs:
- knowledge_base/prd.md
- knowledge_base/acceptance_criteria.md
Outputs:
- knowledge_base/technical_spec.md
- knowledge_base/api_contract.json (OpenAPI spec)
- knowledge_base/decisions.md
- knowledge_base/data_models.md
Quality Gates:
- API contracts are complete (all endpoints defined)
- Data models normalize properly
- Security considerations documented
- Performance requirements addressed
Andon Cord Triggers:
- Requirements conflict with existing architecture
- Technology choice requires new infrastructure
- Performance requirements unachievable with current stack
3. Implementation Agent (Code)
Agent: dev-agent-001
Stage: Implementation
Context Window: 20k tokens max
Responsibilities:
- Write code based on spec
- Implement API contracts exactly
- Write unit tests
- Document code
- Iterate until local tests pass
Inputs:
- knowledge_base/technical_spec.md
- knowledge_base/api_contract.json
- knowledge_base/decisions.md
Outputs:
- Source code
- Unit tests
- knowledge_base/implementation_notes.md
- knowledge_base/test_coverage.md
Quality Gates:
- All unit tests pass
- Code coverage >80%
- Linter passes (0 errors)
- Type checking passes
- API matches OpenAPI spec exactly
Iteration Loop:
1. Write code
2. Run linter → fix violations
3. Run tests → fix failures
4. Run type checker → fix errors
5. Repeat until all gates pass
Andon Cord Triggers:
- Stuck for 3+ iterations on same failing test
- API contract is ambiguous/incomplete
- Test coverage impossible to achieve (need architecture change)
4. QA Agent (Testing)
Agent: qa-agent-001
Stage: QA
Context Window: 15k tokens max
Responsibilities:
- Run integration tests
- Run security scans
- Run performance tests
- Verify acceptance criteria met
- Report defects with specificity
Inputs:
- Source code from Implementation
- knowledge_base/acceptance_criteria.md
- knowledge_base/api_contract.json
Outputs:
- Test results
- Security scan report
- Performance metrics
- knowledge_base/qa_report.md
- knowledge_base/known_issues.md (if defects found)
Quality Gates:
- All acceptance criteria pass
- Security scan: 0 HIGH vulnerabilities
- Performance: <200ms response time
- No critical bugs
Decision Logic:
if spec_unclear:
bounce_to("PM", reason="Need clarification on X")
elif implementation_bug:
bounce_to("Implementation", reason="Tests fail: specific error")
elif architecture_issue:
bounce_to("Architect", reason="Design flaw: X")
else:
move_forward()
Andon Cord Triggers:
- Cannot determine if test should pass or fail (spec ambiguous)
- Security vulnerability found but no clear fix
- Performance requirements unmet despite correct implementation
5. Cleanup Agent (Documentation Maintenance)
Agent: cleanup-agent-001
Stage: Background (not on main flow)
Trigger: Cron schedule (daily 3am) OR kb_size > 10MB
Responsibilities:
- Merge duplicate documentation
- Archive stale information
- Resolve contradictions
- Summarize verbose logs
- Rebuild search index
- Validate external links
Context Window: 30k tokens (needs to see entire KB)
Automation Rules:
archive_after: 30 days of no access
merge_duplicates: if content >95% similar
summarize_logs: if file >50KB
compress_images: if total >10MB
rebuild_index: daily
remove_broken_links: after 7 days broken
Safety Rules:
- NEVER delete, only archive
- Keep full history
- Rollback window: 7 days
Human Escalation (ONLY IF):
- Contradiction severity: CRITICAL
- Data loss risk: >10% of KB
- Otherwise: fully automated
Outputs:
- Cleaned knowledge_base/
- knowledge_base/cleanup_log.md
- Health metrics dashboard
Metrics:
- KB health score (0-100)
- Actions taken per run
- Storage saved
- Contradictions resolved
Knowledge Base System
Purpose
Prevent expensive agent-to-agent questioning by maintaining shared context.
The Problem (Before KB)
QA Agent: "What's the password validation rule?"
→ Pings Implementation Agent (API call #1)
→ Implementation: "Check the spec" (API call #2)
→ Pings Architect (API call #3)
→ Architect: "Check PM's PRD" (API call #4)
→ Pings PM (API call #5)
→ PM: "Section 3.2: min 8 chars, 1 special char" (API call #6)
Cost: 6 API calls, ~$3, slow
The Solution (With KB)
QA Agent triggered:
├── Reads task.knowledge_base["prd.md"]
├── Finds password validation rule in Section 3.2
└── Proceeds with testing
Cost: 1 lookup, $0, instant
KB Structure Per Task
knowledge_base/
├── prd.md # Product requirements (PM owns)
├── technical_spec.md # Architecture (Architect owns)
├── api_contract.json # OpenAPI spec (Architect creates, Dev implements)
├── decisions.md # Why we chose X over Y (all agents contribute)
├── test_coverage.md # What's tested (Dev + QA)
├── known_issues.md # Current bugs (QA)
├── implementation_notes.md # Dev notes
├── qa_report.md # Test results (QA)
├── glossary.md # Project-specific terms
├── faq.md # Common questions
├── figma/ # Design assets (human-provided)
│ ├── mockup1.png
│ └── mockup2.png
└── archive/ # Stale docs moved here by Cleanup Agent
└── old_debug_logs/
Update Protocol
def update_knowledge_base(card, new_info):
"""Any agent can update KB, but must follow conventions"""
# 1. Append, don't overwrite (unless owner)
if is_owner_of_document(agent, document):
kb[document] = new_content # Full control
else:
kb[document] += f"\n## Update from {agent.name}\n{new_content}"
# 2. Always log the change
kb["changelog.md"] += f"""
{timestamp} - {agent.name}
Action: Updated {document}
Reason: {reason}
"""
# 3. Tag for cleanup review
if content_might_conflict(new_content):
kb["_needs_cleanup"] = True
Search & Retrieval
# Agents use semantic search over KB
def find_answer(question):
# Vector search over all .md files
results = semantic_search(question, knowledge_base)
# Return top 3 most relevant sections
return results[:3]
# Example:
QA Agent asks: "What's the auth flow?"
→ Finds: technical_spec.md Section 4.2 "Authentication Flow"
→ Also finds: api_contract.json /auth/login endpoint
→ Agent has answer without pinging anyone
Quality Gates & Frameworks
Six Sigma Applied
Target: <3.4 defects per 1000 lines of code
DMAIC Cycle per Task:
Define:
├── Acceptance criteria (measurable)
├── Test cases
└── Performance budgets
Measure:
├── Run all tests
├── Collect metrics (coverage, performance, security)
└── Document baseline
Analyze:
├── Which tests failed?
├── What patterns in failures?
└── Root cause analysis
Improve:
├── Refactor based on analysis
├── Add missing tests
└── Optimize hotspots
Control:
├── Lock in changes only if metrics improve
├── Don't proceed if defect rate increases
└── Document what worked
Quality Gate Definitions
Gate: All Tests Pass
Gate: all_tests_green
Type: Boolean
Pass Criteria: 100% of tests passing
Fail Action: Bounce to Implementation
Owner: QA Agent
Gate: Code Coverage
Gate: code_coverage_80_percent
Type: Percentage
Pass Criteria: ≥80% line coverage
Measurement: pytest --cov
Fail Action: Bounce to Implementation with specific gaps
Owner: QA Agent
Gate: Security Scan
Gate: security_scan_clean
Type: Vulnerability Count
Pass Criteria: 0 HIGH or CRITICAL vulnerabilities
Tools: [Bandit, Snyk, OWASP ZAP]
Fail Action: Bounce to Implementation OR Architect (if design flaw)
Owner: QA Agent
Gate: Performance Budget
Gate: performance_under_200ms
Type: Latency
Pass Criteria: p95 response time <200ms
Measurement: Load test with k6
Fail Action: Bounce to Implementation OR Architect (if arch change needed)
Owner: QA Agent
Gate: Linter Clean
Gate: linter_no_errors
Type: Error Count
Pass Criteria: 0 errors (warnings allowed)
Tools: [ESLint, Pylint, Rubocop]
Fail Action: Auto-fix in Implementation iteration loop
Owner: Implementation Agent
Andon Cord (Escalation)
When Agent Pulls Cord:
def pull_andon_cord(reason, severity="medium"):
"""Stop the line, escalate to human"""
card.status = "BLOCKED"
card.blocked_reason = reason
card.blocked_severity = severity
# Alert human
notify_human({
"task": card.id,
"agent": self.name,
"reason": reason,
"severity": severity,
"context": self.get_relevant_context()
})
# Don't proceed until human resolves
return "WAITING_FOR_HUMAN"
Escalation Criteria:
Severity Levels:
low:
- Minor ambiguity in spec
- Non-critical external dependency
Action: Continue work, flag for human review later
medium:
- Stuck for 3+ iterations
- Test failure without clear fix
- Performance issue needs investigation
Action: Pause task, human review within 24h
high:
- Contradictory requirements
- Security vulnerability with no known fix
- Architecture limitation discovered
Action: Immediate human intervention required
critical:
- Data loss risk
- Security breach
- System-wide failure
Action: Halt all related tasks, immediate escalation
Example: Complete Flow
Task: "Build user login API"
┌─ Human creates task ─────────────────────────────────────┐
│ Title: "Build user login API" │
│ Type: feature │
└───────────────────────────────────────────────────────────┘
↓
┌─ PM Agent (triggered) ───────────────────────────────────┐
│ 1. Reads task title │
│ 2. Generates PRD: │
│ - Endpoint: POST /auth/login │
│ - Input: {email, password} │
│ - Output: {token, user} │
│ - Validation: Email format, password 8+ chars │
│ 3. Updates KB: prd.md │
│ 4. Moves card to "Architect" │
└───────────────────────────────────────────────────────────┘
↓
┌─ Architect Agent (triggered) ────────────────────────────┐
│ 1. Reads prd.md from KB │
│ 2. Designs system: │
│ - JWT-based auth │
│ - bcrypt for password hashing │
│ - Rate limiting: 5 attempts/minute │
│ 3. Creates OpenAPI spec: │
│ POST /auth/login │
│ Request: {email: string, password: string} │
│ Response: {token: string, user: object} │
│ 4. Updates KB: technical_spec.md, api_contract.json │
│ 5. Moves card to "Implementation" │
└───────────────────────────────────────────────────────────┘
↓
┌─ Implementation Agent (triggered) ───────────────────────┐
│ 1. Reads technical_spec.md, api_contract.json │
│ 2. Iteration loop: │
│ a. Generate code │
│ b. Run linter → fixes 3 style issues │
│ c. Run tests → 2 tests fail │
│ d. Fix failing tests │
│ e. Run tests → all pass ✓ │
│ f. Check coverage → 85% ✓ │
│ 3. Updates KB: implementation_notes.md, test_coverage.md │
│ 4. Moves card to "QA" │
└───────────────────────────────────────────────────────────┘
↓
┌─ QA Agent (triggered) ───────────────────────────────────┐
│ 1. Reads api_contract.json, acceptance_criteria.md │
│ 2. Runs integration tests: │
│ ✓ Valid login returns token │
│ ✓ Invalid password returns 401 │
│ ✗ Rate limiting not working │
│ 3. Security scan: 0 vulnerabilities ✓ │
│ 4. Performance test: 145ms average ✓ │
│ 5. GATE FAILED: Rate limiting broken │
│ 6. Updates KB: known_issues.md │
│ 7. Bounces to "Implementation" with specific error │
└───────────────────────────────────────────────────────────┘
↓
┌─ Implementation Agent (re-triggered) ────────────────────┐
│ 1. Reads known_issues.md: "Rate limiting not working" │
│ 2. Fixes rate limiting middleware │
│ 3. Re-runs tests → all pass ✓ │
│ 4. Moves card to "QA" │
└───────────────────────────────────────────────────────────┘
↓
┌─ QA Agent (re-triggered) ────────────────────────────────┐
│ 1. Re-runs all tests → 100% pass ✓ │
│ 2. All gates pass ✓ │
│ 3. Moves card to "Complete" │
└───────────────────────────────────────────────────────────┘
↓
┌─ Cleanup Agent (background, scheduled) ──────────────────┐
│ 1. Scans all task KBs │
│ 2. Finds duplicate API docs in 3 tasks │
│ 3. Merges into single source of truth │
│ 4. Archives old debug logs >30 days │
│ 5. Rebuilds search index │
│ 6. Updates health dashboard: 98/100 │
└───────────────────────────────────────────────────────────┘
Success Metrics
System Health
KPIs:
- Task completion rate: >95%
- Average cost per task: <$5
- Human intervention rate: <10%
- Gate pass rate (first attempt): >80%
- KB health score: >90/100
- Agent uptime: >99.5%
Quality Metrics:
- Defect rate: <3.4 per 1000 LOC (Six Sigma)
- Security vulnerabilities: 0 HIGH/CRITICAL
- Code coverage: >80%
- Performance: p95 <200ms
Efficiency Metrics:
- Average context size per agent: <20k tokens
- KB search hit rate: >90% (answers found without agent ping)
- Cleanup automation rate: 100% (no human intervention)
Dashboard Example
┌─────────────────────────────────────────────────────┐
│ Assembly Line AI System - Dashboard │
├─────────────────────────────────────────────────────┤
│ │
│ Active Tasks: 12 │
│ ├─ In Progress: 8 │
│ ├─ Blocked: 1 (human review needed) │
│ └─ Completed Today: 15 │
│ │
│ Cost Today: $67.50 (avg $4.50/task) │
│ │
│ Quality Gates: │
│ ├─ Pass Rate: 87% (first attempt) │
│ ├─ Security: ✓ 0 vulnerabilities │
│ └─ Performance: ✓ p95 145ms │
│ │
│ Knowledge Base Health: 98/100 ✓ │
│ ├─ Last Cleanup: 4 hours ago │
│ ├─ Actions Taken: 12 merges, 5 archives │
│ └─ Size: 8.2 MB │
│ │
│ Agent Performance: │
│ ├─ PM: 15 tasks, 100% success │
│ ├─ Architect: 15 tasks, 100% success │
│ ├─ Implementation: 15 tasks, 93% first-pass │
│ ├─ QA: 15 tasks, 87% gate pass │
│ └─ Cleanup: Last run 4h ago, 0 issues │
│ │
└─────────────────────────────────────────────────────┘
Conclusion
Core Insight
"We're not building smarter AI. We're building a smarter system."
Like Ford didn't need master craftsmen, we don't need AGI. We need:
- ✅ Specialized agents with focused contexts
- ✅ Clear handoffs between stages
- ✅ Quality gates that catch defects
- ✅ Knowledge base that prevents redundant work
- ✅ Automation that runs in the background
The Promise
Current state:
- Human manually orchestrates models
- Expensive context windows
- Inconsistent quality
- Subscription fatigue
Future state:
- System orchestrates specialized agents
- Small, focused contexts
- Quality guaranteed by gates
- Single cohesive workflow
iPhone philosophy: It just works.
References & Inspiration
- Toyota Production System (TPS) - Lean manufacturing, Kaizen, Andon cord
- Six Sigma - DMAIC, defect reduction, statistical process control
- Ford Assembly Line - Specialization, sequential flow, standardization
- Poka-Yoke - Error-proofing mechanisms
- Kanban - Visual workflow management, WIP limits, pull system
End of Document
For implementation questions or architectural discussions, refer to the Implementation Guide section or escalate to human architect.
"The process doesn't care which Bob shows up. The process guarantees the iPhone."
Top comments (0)