An Empirical Investigation of Enforcement Mechanisms
An empirical investigation into what actually works when building multi-agent plugins for Claude Code — and what doesn't, despite what the documentation says.
Why This Exists
Prompts are suggestions. The LLM can ignore them silently.
When you build a multi-agent system that needs to guarantee execution order, enforce TDD, or block unsafe operations, you need mechanical enforcement — not prose.
This report documents 6 empirical tests, 12 GitHub issues, and community patterns from systems with up to 112 agents in production.
Key Findings
| Finding | Status | Note |
|---|---|---|
exit code 2 in PreToolUse hooks blocks tool calls |
Works | The only reliable blocking method |
exit code 1 blocks tool calls |
Fails | Does not block by design (non-blocking error) |
permissionDecision: "deny" blocks tool calls |
Broken | Issue #4669: Docs say yes, reality says no |
| Subagents can spawn subagents | No | Absolute architectural restriction |
Blocked Write/Edit bypass (via sed, echo >, python3) |
Vector | Known bypass — requires additional Bash hook |
| Infinite loops without recursion guards | Risk | 3 open issues (#10205, #9579, #9704) |
| Agent frontmatter (hooks, skills) works in teammates | Broken | Bug #30703: Frontmatter is silently ignored |
Validated Patterns
-
Hub-and-spoke orchestration: A central "brain" agent as main session (
--agent namespace:brain) spawns specialist subagents. - Recursion guards: Temporary flag files prevent infinite hook loops when hooks propagate to subagents.
-
Bash bypass defense: When Write/Edit are blocked, Claude uses
sed,python3 -c, orecho >instead — a second hook on Bash closes this vector. - Prerequisite gates: Hooks verify artifact existence before allowing specialist agents to execute.
Empirical Validation: Express-to-NestJS Migration
These patterns were validated through a complete framework migration of a multi-tenant production project:
- 11 agents spawned with real parallelism (up to 3 concurrent).
- Quality gate rejected first pass (44% coverage), forced autonomous correction to 93%.
- 422 tests, zero regressions, TypeScript strict with zero errors.
Community References
- Blake Crosley: 95 hooks in production over 9 months.
- kenryu42/claude-code-safety-net: semantic analysis + 5-level recursive wrapper detection.
- wshobson/agents: 112 agents, 16 orchestrators.
- Issue #29795: 5-layer QA system built from 68 documented failures.
Full Research Reports
Credits
- Author: Diego Cheloni
- Date: March 14, 2026
- Environment: Claude Code CLI (March 2026), Claude Opus 4.6
Top comments (0)