DEV Community

Diego Cheloni
Diego Cheloni

Posted on • Originally published at github.com

Building Reliable Multi-Agent Claude Code Plugins: What Actually Works (and What’s Broken)

An Empirical Investigation of Enforcement Mechanisms

An empirical investigation into what actually works when building multi-agent plugins for Claude Code — and what doesn't, despite what the documentation says.

Why This Exists

Prompts are suggestions. The LLM can ignore them silently.

When you build a multi-agent system that needs to guarantee execution order, enforce TDD, or block unsafe operations, you need mechanical enforcement — not prose.

This report documents 6 empirical tests, 12 GitHub issues, and community patterns from systems with up to 112 agents in production.

Key Findings

Finding Status Note
exit code 2 in PreToolUse hooks blocks tool calls Works The only reliable blocking method
exit code 1 blocks tool calls Fails Does not block by design (non-blocking error)
permissionDecision: "deny" blocks tool calls Broken Issue #4669: Docs say yes, reality says no
Subagents can spawn subagents No Absolute architectural restriction
Blocked Write/Edit bypass (via sed, echo >, python3) Vector Known bypass — requires additional Bash hook
Infinite loops without recursion guards Risk 3 open issues (#10205, #9579, #9704)
Agent frontmatter (hooks, skills) works in teammates Broken Bug #30703: Frontmatter is silently ignored

Validated Patterns

  • Hub-and-spoke orchestration: A central "brain" agent as main session (--agent namespace:brain) spawns specialist subagents.
  • Recursion guards: Temporary flag files prevent infinite hook loops when hooks propagate to subagents.
  • Bash bypass defense: When Write/Edit are blocked, Claude uses sed, python3 -c, or echo > instead — a second hook on Bash closes this vector.
  • Prerequisite gates: Hooks verify artifact existence before allowing specialist agents to execute.

Empirical Validation: Express-to-NestJS Migration

These patterns were validated through a complete framework migration of a multi-tenant production project:

  • 11 agents spawned with real parallelism (up to 3 concurrent).
  • Quality gate rejected first pass (44% coverage), forced autonomous correction to 93%.
  • 422 tests, zero regressions, TypeScript strict with zero errors.

Community References

  • Blake Crosley: 95 hooks in production over 9 months.
  • kenryu42/claude-code-safety-net: semantic analysis + 5-level recursive wrapper detection.
  • wshobson/agents: 112 agents, 16 orchestrators.
  • Issue #29795: 5-layer QA system built from 68 documented failures.

Full Research Reports


Credits

  • Author: Diego Cheloni
  • Date: March 14, 2026
  • Environment: Claude Code CLI (March 2026), Claude Opus 4.6

Top comments (0)