Sahil Kathpal

Posted on May 4 • Originally published at codeongrass.com

25 Claude Code Agents in Production: The Hooks Architecture

#ai #architecture #claude #agents

Someone built a production security scanner at cqwerty.com running roughly 25 autonomous Claude Code agents with minimal human oversight. An Architect plans the work. An Engineer ships pull requests. A Reviewer pushes back. A CEO emails a weekly summary. The agents argue in pull request comments. The mechanism behind all of it is Claude Code hooks — three event types that let one agent trigger another, constrain its own behavior, and hand off work without any orchestration glue code. This post deconstructs that architecture and walks you through building your own.

TL;DR

Claude Code's PreToolUse, PostToolUse, and Stop hooks are sufficient primitives for a full multi-agent org chart. Each role is a separate Claude Code session launched with an AGENT_ROLE environment variable. A shared .claude/settings.json routes hooks to role-specific guard scripts. Stop hooks trigger the next agent in the cascade; PostToolUse hooks detect events like PR creation; PreToolUse hooks enforce role boundaries. Branch protection and a universal destructive-command blocklist are the non-negotiable safety layer before you run any of this unattended.

What You'll Build

A four-role agent system where roles trigger each other through hooks and communicate through git pull requests — not shared memory, not a message bus:

Role	Responsibility	Key constraint
Architect	Reads codebase, writes plan documents	Cannot commit code or run tests
Engineer	Implements plans, opens PRs	Cannot modify plan documents
Reviewer	Reviews PRs, leaves comments	Read-only on source files
CEO	Weekly summary, notifications	Cannot execute or write code

The cascade: Architect session ends → Stop hook spawns Engineer → Engineer opens PR → PostToolUse hook detects URL → Reviewer spawns → Reviewer leaves comments → Engineer addresses in follow-up session.

Prerequisites

Claude Code installed and authenticated (npm install -g @anthropic-ai/claude-code)
A GitHub repository with gh CLI authenticated
jq installed (for parsing hook payloads)
Optional: Grass for mobile oversight of unattended sessions (npm install -g @grass-ai/ide)

What Are the Three Hook Primitives?

Claude Code hooks are shell scripts that execute at defined points in an agent session:

PreToolUse — runs before each tool call. Receives the tool name and input via stdin as JSON. Return {"decision": "block", "reason": "..."} to prevent execution, or exit 0 to allow. This is your role constraint and safety layer.
PostToolUse — runs after each tool call with the output. Use this to detect downstream trigger events — like a PR URL appearing in bash output — and spawn the next agent.
Stop — runs when a session ends normally. The right place for role handoffs: when Architect finishes, spawn Engineer.

Configure hooks in .claude/settings.json at the project root. This file applies to every Claude Code session run from that directory.

Step 1: Scaffold the Project Structure

mkdir -p .claude/hooks .claude/logs plans

Create the shared settings file that routes all hook calls:

// .claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/role-guard.sh" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/post-bash.sh" }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/on-stop.sh" }
        ]
      }
    ]
  }
}

Invoke each role by setting AGENT_ROLE in the environment. Hook scripts inherit this variable from the parent process:

AGENT_ROLE=architect claude -p "Your Architect task..."
AGENT_ROLE=engineer  claude -p "Your Engineer task..."
AGENT_ROLE=reviewer  claude -p "Your Reviewer task..."

Step 2: Implement Role Constraints in PreToolUse

role-guard.sh handles both the universal safety blocklist and per-role constraints in a single script:

#!/bin/bash
# .claude/hooks/role-guard.sh

TOOL_INPUT=$(cat)
COMMAND=$(echo "$TOOL_INPUT" | jq -r '.tool_input.command // empty')
ROLE="${AGENT_ROLE:-}"

block() {
  echo "GATE BLOCKED: $1" >&2
  exit 2
}

# ── Universal blocklist: applies to every role ──────────────────────────────
DANGER='(git (reset --hard|clean -f|checkout \\.)|rm -rf|DROP TABLE)'
if echo "$COMMAND" | grep -qiP "$DANGER"; then
  block "safety-guard: destructive operation requires manual approval"
fi

# ── Role-specific constraints ────────────────────────────────────────────────
case "$ROLE" in
  architect)
    if echo "$COMMAND" | grep -qP '(git (commit|push)|npm (run|test|build)|pytest)'; then
      block "Architect constraint: write a plan doc in plans/ instead of executing code"
    fi
    ;;
  reviewer)
    if echo "$COMMAND" | grep -qP '(git (commit|push|checkout -b)|\bsed -i\b)'; then
      block "Reviewer constraint: read-only role — leave GitHub comments instead"
    fi
    ;;
esac

echo '{"decision": "allow"}'

Two details worth noting: block() exits 2 — Claude Code PreToolUse hooks use exit code 2 to block a specific tool call without aborting the session. The message goes to stderr so Claude Code surfaces it as the rejection reason. The universal blocklist runs before role checks so it cannot be bypassed by role misconfigurations.

Even with this in place, read Why Claude Code PreToolUse Hooks Can Still Be Bypassed before running anything production-facing. Hooks catch direct shell commands but can miss multi-step paths to the same destructive outcome.

Step 3: Trigger the Engineer from the Architect's Stop Hook

When the Architect session ends normally, on-stop.sh checks for a new plan file and spawns an Engineer:

#!/bin/bash
# .claude/hooks/on-stop.sh

ROLE="${AGENT_ROLE:-}"
PROJECT="$(pwd)"

case "$ROLE" in
  architect)
    PLAN_FILE=$(ls -t "$PROJECT/plans/"*.md 2>/dev/null | head -1)
    if [[ -f "$PLAN_FILE" ]]; then
      PLAN_NAME=$(basename "$PLAN_FILE" .md)
      nohup env AGENT_ROLE=engineer claude \
        -p "Implement the plan at $PLAN_FILE. Create branch feature/$PLAN_NAME. Open a PR when done. Do not modify files under plans/." \
        >> "$PROJECT/.claude/logs/engineer.log" 2>&1 &
      echo "Engineer spawned for $PLAN_FILE (PID: $!)"
    fi
    ;;
esac

Always use absolute paths in nohup commands. Relative paths resolve against the working directory at spawn time, which may differ from the project root depending on how the Stop hook is invoked.

Step 4: Detect PR Creation and Spawn the Reviewer

The Engineer's PostToolUse hook watches bash outputs for GitHub PR URLs:

#!/bin/bash
# .claude/hooks/post-bash.sh

ROLE="${AGENT_ROLE:-}"

case "$ROLE" in
  engineer)
    TOOL_OUTPUT=$(cat)
    PR_URL=$(echo "$TOOL_OUTPUT" \
      | jq -r '.tool_output // empty' \
      | grep -oP 'https://github\.com/[^\s]+/pull/\d+' | head -1)

    if [[ -n "$PR_URL" ]]; then
      # Dedup: don't spawn multiple reviewers for the same PR
      LOCK="/tmp/reviewer-$(echo "$PR_URL" | md5sum | cut -c1-8).lock"
      [[ -f "$LOCK" ]] && exit 0
      touch "$LOCK"

      nohup env AGENT_ROLE=reviewer claude \
        -p "Review this PR critically. Check implementation against the plan in plans/. Identify bugs, missed requirements, and test gaps. Leave specific GitHub review comments: $PR_URL" \
        >> "$(pwd)/.claude/logs/reviewer.log" 2>&1 &
    fi
    ;;
esac

This is where the "they argue in pull request comments" behavior emerges. The Reviewer calls gh pr review --comment -b "..." with specific feedback. When the Engineer runs in a follow-up session, those review comments are in its context, and it addresses them in new commits.

Step 5: Implement the CEO Weekly Summarizer

Run the CEO agent via cron. It aggregates log tails and recent PR activity, then sends a summary:

#!/bin/bash
# .claude/hooks/ceo-weekly.sh
# Add to crontab: 0 9 * * 1 bash /path/to/.claude/hooks/ceo-weekly.sh

PROJECT="/absolute/path/to/your/project"
LOG_TAIL=$(tail -n 400 "$PROJECT/.claude/logs/"*.log 2>/dev/null)
PR_LIST=$(cd "$PROJECT" && gh pr list --state all --limit 20 \
  --json number,title,state,createdAt 2>/dev/null)

env AGENT_ROLE=ceo claude --no-interactive \
  -p "You are the CEO of an autonomous agent team. Based on the activity below, write a concise weekly summary: what shipped, what's in review, any anomalies. Send it as an email to admin@yourdomain.com.

AGENT LOGS:
$LOG_TAIL

RECENT PRS:
$PR_LIST"

The CEO role needs email capability configured (sendmail, a transactional API, or a custom tool). Keep its allowed-commands list tight — observe and report only.

Step 6: Why Safety Architecture Is Non-Negotiable at This Scale

Before running any of this unattended, read this thread: someone had auto-approve enabled, asked Claude to fix one failing test, and Claude ran git checkout . — four hours of uncommitted refactoring gone in 200ms. No stash. No commit. At one agent, that's bad. At 25 running in parallel, the same event multiplies.

The role-guard.sh blocklist handles obvious cases. Add branch protection as a structural hard limit:

gh api repos/OWNER/REPO/branches/main/protection \
  --method PUT \
  --field enforce_admins=true \
  --field required_pull_request_reviews='{"required_approving_review_count":1}' \
  --field required_status_checks='{"strict":false,"contexts":[]}'

With this in place, no agent can merge to main regardless of what any hook permits. The Engineer opens PRs; merges require human approval or the Reviewer's explicit gh pr review --approve.

Watch for the subtler failure mode too: model-level scope creep. One developer spent two weeks cleaning up after Opus 4.7 ignored a PRD and wired the wrong architecture entirely — not a hook failure, a comprehension failure. Role constraints reduce blast radius; they don't prevent an agent from misunderstanding its task. Keep system prompts tight, re-inject the plan document as context on every spawn, and be aware that Claude agents drift from their system prompt past ~15 tool calls as context pressure grows.

How Do You Verify the System Works?

Run a smoke test against a trivial task before pointing this at real code:

# 1. Trigger the Architect with a minimal task
AGENT_ROLE=architect claude \
  -p "Write a one-sentence plan for adding GET /healthz to an Express app. Save it to plans/healthz.md."

# 2. Confirm the plan was created
ls plans/

# 3. Architect Stop hook should have spawned Engineer — watch the log
tail -f .claude/logs/engineer.log

# 4. Wait for Engineer to open a PR
watch gh pr list

# 5. Confirm Reviewer spawned after PR creation
tail -f .claude/logs/reviewer.log

If the cascade stops at any step, add exec 2>>/tmp/hook-debug.log; set -x to the top of the failing script. Exit-code tracing surfaces the common failures faster than anything else.

Troubleshooting Common Failures

Symptom	Likely cause	Fix
`role-guard.sh` never fires	Bash matcher wrong or tool name mismatch	Use `"matcher": "*"` temporarily; log `TOOL_INPUT` to verify payload shape
Engineer doesn't spawn	`on-stop.sh` exits non-zero, aborting session	Wrap `nohup` in `\
PR URL never detected	{% raw %}`jq` path wrong or `grep` pattern misses format	Test: `echo "$PAYLOAD" \
Reviewer spawns 3×	Lock file not created before async spawn	Move {% raw %}`touch "$LOCK"` before the `nohup` line
Role constraints ignored mid-session	Context pressure overrides system prompt	Re-inject plan doc as context; shorten session tasks
Two Engineers clobber the same files	No filesystem isolation	Use git worktrees — see coordinating parallel sessions

How Grass Adds Mobile Oversight to This Workflow

The architecture above runs without Grass. The gap it leaves: with agents running asynchronously, the only signal you get by default is the CEO's weekly email. That's fine for routine runs. It's not fine when a Reviewer locks itself in a comment loop, a PostToolUse dedup fails and spawns three Engineers, or a session hits a decision point your blocklist doesn't cover.

The developer who built macky.dev — a P2P WebRTC tool specifically to reach a Mac terminal from an iPhone — built significant custom infrastructure just to maintain line-of-sight on their agents. Grass is the pre-built version of that layer.

Three concrete integration points for a multi-agent hooks system:

Dispatch from anywhere. Install Grass on the machine running your agents, scan the QR code on your phone, and you can navigate to your project folder, pick Claude Code as the agent, and send the Architect its initial prompt — from your commute, between meetings, wherever. The cascade runs from there without a laptop open.

Permission forwarding for new roles. For a role you haven't yet fully trusted, run it in default permission mode (without --dangerously-skip-permissions). Claude Code pauses before ambiguous tool calls. Grass surfaces those pauses on your phone as approval modals: you see the exact command, the file path, the repo. Tap Allow or Deny. The agent continues or stops. This is how you build confidence in a role before switching it to full hook automation — the pattern is covered in How to Approve or Deny a Coding Agent Action from Your Phone.

Live monitoring across all sessions. The Grass app shows every active session in your workspace. Stream any session's output, view the diff of what it wrote, and abort a runaway session without touching a laptop. As The Permission Layer Is 98% of Agent Engineering argues, the AI logic in any agentic system is a small fraction of the actual engineering surface — hooks, delegation chains, observability, approval gates make up the rest. Grass handles the observability and approval side from your phone.

Grass is BYOK (your API key never touches Grass servers), agent-agnostic (Claude Code and OpenCode are first-class), and the local CLI is MIT-licensed:

npm install -g @grass-ai/ide
grass start   # run on the machine where your agents live
              # scan the QR code on your phone

For always-on cloud VMs where your agent fleet keeps running when your laptop sleeps, visit codeongrass.com — free tier is 10 hours, no card required.

FAQ

How does cqwerty.com actually run 25 Claude Code agents in production?

Based on the builder's post in r/ClaudeCode, cqwerty.com is a production security scanner using hooks-based orchestration with defined agent roles. The builder's exact words: "~25 agents running it (hooks-based orchestration)... An Architect plans the work. An Engineer ships PRs. A Reviewer pushes back. There's a CEO that emails me a weekly summary... They argue with each other in pull request comments." The full implementation hasn't been published, but the architecture maps directly to the PreToolUse/PostToolUse/Stop primitives described in this post.

What is the difference between PreToolUse, PostToolUse, and Stop hooks in Claude Code?

PreToolUse fires before a tool executes and can block the action — it's your enforcement layer. PostToolUse fires after a tool completes with the output — it's your event-detection layer for triggering downstream agents. Stop fires when a session ends normally — it's your handoff layer for cascading roles. For orchestration topology: PreToolUse is constraints, PostToolUse and Stop are the edges of your agent graph.

Why does role separation require hooks rather than just different system prompts?

Hooks are enforced by the Claude Code harness, not the model. A PreToolUse blocklist prevents a command mechanically regardless of what the model believes its instructions say. System prompts are interpreted by the model, which means they're subject to context pressure. Claude agents have a documented tendency to drift from constraints past ~15 tool calls as the conversation grows. Correct role design uses both: system prompt for intent, hooks for mechanical constraint enforcement.

How do I prevent parallel Engineer sessions from conflicting on the same files?

PreToolUse hooks don't solve concurrent file access — that requires filesystem isolation. Each parallel Engineer should run in its own git worktree (git worktree add ../engineer-feature-branch feature-branch), giving it a physically separate working directory. See how to keep parallel coding agents from stepping on each other for the full ownership and isolation framework.

How do I debug a hook that silently fails or produces no output?

Add exec 2>>/tmp/hook-debug.log; set -x at the top of the suspect script. This logs every command and its result. Common failures: jq returning an empty string because the field path is wrong (dump the full stdin with tee /tmp/hook-input.json first to inspect the actual payload structure), relative path issues in nohup commands (use absolute paths everywhere), and lock files not being written atomically before the async spawn fires.

What to Build Next

The architecture here gets you to a working cascade. Two gaps to close before scaling past a handful of roles:

Worktree isolation — parallel Engineer sessions need file-level boundaries to prevent silent overwrites: Coordinate Multiple Claude Code Sessions on a Shared Repo.

Mobile oversight — monitoring 25 agents from log files doesn't scale. npm install -g @grass-ai/ide && grass start gets you a single mobile view across all active sessions. Or visit codeongrass.com for always-on cloud VMs — your agents keep running whether your laptop is open or not.

The cqwerty.com system isn't exotic infrastructure. It's three hook types, one settings.json, a handful of bash scripts, and git as the inter-agent communication bus. Start with two roles — Architect and Engineer — get the cascade working, then add Reviewer and CEO. The pattern scales from there.

Originally published at codeongrass.com

DEV Community