Edward Kubiak

Posted on Apr 1

How I built production quality gates into a multi-agent Claude Code workflow

#agents #ai #automation #codequality

How I built production quality gates into a multi-agent Claude Code workflow

Published to dev.to — cross-post from GitHub

1. The problem: agents that write code but never review it

When I started using Claude Code's Agent tool to dispatch subagents, I noticed a pattern quickly: the agent would write code, declare success, and move on. There was no review step unless I explicitly asked for one in the prompt — and prompts are unreliable. If the model was running low on context or the task was complex, the review step would get dropped.

The deeper issue is that multi-agent systems are composable but not automatically accountable. You can chain code-writer → commit → push in a plan, but nothing in the default setup prevents a buggy implementation from being committed and pushed before a human or reviewer has seen it. The agent doesn't know what it doesn't know.

I wanted a framework where review wasn't optional — where it was structurally impossible to skip.

2. CAST's hook-driven commit gate

Claude Code exposes a lifecycle hook system via settings.json. One of those hooks is PreToolUse — it fires before every tool call and can return {"decision": "block"} to reject the operation entirely.

I used this to build a hard commit gate. The hook script (pre-tool-guard.sh) intercepts every Bash tool call that matches git commit. If the command doesn't have a specific escape hatch prefix (CAST_COMMIT_AGENT=1), the hook exits with code 2, which Claude Code treats as a hard block — the commit does not happen.

# pre-tool-guard.sh (simplified)
if echo "$FIRST_LINE" | grep -qE "(^|[[:space:]])git[[:space:]]+commit"; then
  echo "**[CAST]** Raw git commit blocked. Dispatch the commit agent instead."
  exit 2
fi

The only way to commit is through the commit agent workflow, which:

Reads staged changes
Dispatches code-reviewer (Claude Haiku) and waits for a DONE status
If the reviewer returns DONE_WITH_CONCERNS, surfaces those to the user before proceeding
Only then runs CAST_COMMIT_AGENT=1 git commit with the escape hatch

The gate is enforced at the shell level, not at the prompt level. It can't be bypassed by rephrasing a request.

The full framework ships 16 agents, 16 slash commands, and a hook architecture covering 19 hooks across 13 Claude Code lifecycle events. The BATS test suite has 301 tests covering every hook script. It's installable via Homebrew:

brew tap ek33450505/cast && brew install cast

3. cast.db as an event store

Every meaningful lifecycle event gets written to a SQLite database at ~/.claude/cast.db. The schema has four main tables:

sessions — one row per Claude Code session, with start/end timestamps and token counts
agent_runs — one row per subagent dispatch, tracking which agent ran, duration, and status
routing_events — one row per tool call that hits a hook, with tool name, exit code, and latency
hook_health — rolling health state for each hook script (last fired, last exit code)

The writes happen via PostToolUse hooks set to async: true, which means they don't block tool execution. The hook script spawns a Python process, parses the Claude Code hook payload from stdin, and appends to the DB. Because it's async, the latency hit to the tool call is effectively zero.

"PostToolUse": [
  {
    "matcher": "Write|Edit|Agent|Bash",
    "hooks": [
      {
        "type": "command",
        "command": "bash ~/.claude/scripts/post-tool-hook.sh",
        "if": "Write|Edit|Agent|Bash",
        "timeout": 10,
        "async": true
      }
    ]
  }
]

The if: field filters are important here — they scope each hook to only the tool types it actually cares about, so the cost tracker only runs when a Bash, Edit, Write, or Agent call completes, not on every Read or Glob.

4. The React dashboard: making agent activity queryable

The companion project (claude-code-dashboard) is a React 19 + Vite frontend backed by an Express 5 API that reads from cast.db. It runs locally at :5173 (Vite dev server) + :3001 (Express API).

Key pages:

/activity — live event stream via SSE; shows tool calls in real time as they fire
/sessions — session history with token spend per session
/analytics — aggregate token spend over time, cost by agent, hook fire frequency
/agents — per-agent run history with duration and status distributions
/hooks — hook health dashboard: which hooks are firing, last exit codes, latency percentiles
/token-spend — daily/weekly cost breakdown

The value of having SQLite as the backing store vs. just log files: you can query it. Want to know which agent costs the most per session? One SQL query. Want to see hook latency over the last week? Aggregate routing_events by day. The data is local, structured, and queryable without a cloud backend.

5. Lessons learned

Async hooks changed the performance profile. Early versions had all hooks synchronous. Adding async telemetry hooks (PostToolUse, SubagentStart/Stop, TaskCreated, Stop) eliminated measurable latency from observability overhead. The key insight: telemetry hooks can be async because you don't need their output to make a decision. Security and commit gates must stay synchronous because they need to block.

if: filters are essential at scale. Without them, every hook fires on every tool call. The security guard was running on ls commands. Adding if: "Bash(curl *)" filters means it only fires when curl is about to run — which is the only time it matters. The Claude Code if: field supports glob-style matching against the tool name and input.

effort frontmatter changes model behavior. Setting effort: low on lightweight agents (commit, code-reviewer, push, test-runner) and effort: high on deep analysis agents (security, planner, researcher, debugger) lets the runtime allocate thinking budget appropriately. A commit agent doesn't need extended thinking. A security agent reviewing auth code does.

isolation: worktree prevents file conflicts in parallel dispatches. When the orchestrator dispatches multiple agents in parallel — code-writer and test-writer running simultaneously on the same codebase — they can clobber each other's edits without worktree isolation. Adding isolation: worktree to parallelizable agents (code-writer, test-writer, security, frontend-qa) gives each agent its own git worktree.

The BATS test suite is non-negotiable. Shell scripts are easy to break silently. 301 BATS tests covering every hook, every exit code path, and every escape hatch means I can refactor hooks without guessing whether I broke the commit gate. CI runs on every push.

The repo is at github.com/ek33450505/claude-agent-team. Issues and PRs welcome — especially around the hook architecture and DB schema.

Top comments (1)

William Wang • Apr 12

This is exactly the pattern I've been converging on. Quality gates in multi-agent workflows are the difference between "AI wrote code" and "AI shipped production-ready code."

A few things I'd add from my experience:

Gate ordering matters more than gate count. Lint → type check → unit test → integration test → security scan. Each gate filters out a class of issues so the next gate isn't overwhelmed with noise.
The review agent should be a different model or at least a different prompt than the implementation agent. Having the same agent review its own code is like asking a developer to review their own PR — they have blind spots for the same reasons.
Cost attribution per gate is underrated. Knowing that your security scan gate costs 2x more tokens than your lint gate lets you optimize the pipeline. Sometimes moving a cheap gate earlier saves expensive downstream re-runs.

The multi-agent approach also opens up a natural observability layer — each agent's output becomes a checkpoint you can audit. That's way better than trying to debug a single agent's 50-step reasoning chain.