DEV Community

Yurukusa
Yurukusa

Posted on • Edited on

5 Design Patterns for LLM Agent Teams (From Someone Who Learned Them the Hard Way)

I don't write code. Never have.

What I do is run a team of AI agents that build things for me. Claude Code writes the code. Claude Desktop coordinates the work. Shell hooks enforce the rules. Together they've shipped a 15,000-line roguelike game, a browser automation toolkit, and a marketing pipeline across 11 platforms.

Along the way, these agents deleted 5 published articles in a single API call, crashed Chrome in the middle of automated tasks, and forgot critical lessons between sessions.

Here are 5 design patterns for running LLM agent teams. Not theory. Battle scars.


1. The Guard Pattern (Pre-Hook Enforcement)

The failure: An AI agent needed to update article metadata on Zenn (a Japanese dev blogging platform). It used a REST PUT call with only the new footer text. PUT replaces the entire resource. The article body was overwritten with a single line. Five articles destroyed.

We wrote a lesson: "Always GET before PUT."

The next day, the same agent made the same mistake. The lesson existed. The agent didn't read it.

Writing rules is useless if nobody checks them.

The pattern: Don't ask agents to remember rules. Intercept their actions and check automatically.

Claude Code has a hook system called PreToolUse that fires before any tool execution. We built a guard engine that pattern-matches every command against a library of structured lessons:

# api-put-safety.yaml
id: api-put-safety
severity: critical
violated_count: 2
trigger_patterns:
  - "PUT /api/"
  - "requests\\.put"
  - "curl.*-X PUT"
  - "fetch.*method.*PUT"
lesson: |
  PUT replaces the ENTIRE resource. Fields not included
  will be overwritten with empty/default values.
  ALWAYS: GET first, modify, then PUT ALL fields.
checklist:
  - "GET the current resource state"
  - "PUT body contains ALL required fields"
  - "Test on 1 item before batch operation"
Enter fullscreen mode Exit fullscreen mode

When an agent tries to run curl -X PUT, the guard fires before the command executes:

$ brain guard "curl -X PUT https://api.zenn.dev/api/articles/abc"

============================================================
CRITICAL LESSON: api-put-safety
   (violated 2x, last: 2026-02-09)
============================================================
   PUT replaces the ENTIRE resource...

   Checklist:
   [ ] GET the current resource state
   [ ] PUT body contains ALL required fields

Proceed? [y/N]
Enter fullscreen mode Exit fullscreen mode

The command is blocked until the agent explicitly acknowledges the lesson.

Key insight: Trust is a liability in multi-agent systems. Enforcement is an asset. The Guard Pattern turns "I hope you remember" into "I know you checked."


2. The Audit Trail Pattern (Compliance Logging)

The failure: After installing the guard, a natural question emerged: Is it actually working? Are agents checking lessons or bypassing them? When something breaks at 3 AM while you're asleep, how do you figure out what happened?

Without logs, you're debugging with vibes.

The pattern: Every guard check, every acknowledgment, every bypass gets logged to a JSONL file. Not for blame. For evidence.

{"timestamp":"2026-02-09T10:30:00Z","agent":"cc-main","action":"PUT /api/articles/abc","lessons_matched":["api-put-safety"],"checked":true,"followed":true}
{"timestamp":"2026-02-09T10:31:00Z","agent":"cc-sub-3","action":"PUT /api/articles/def","lessons_matched":["api-put-safety"],"checked":false,"followed":false,"incident":"article body overwritten"}
Enter fullscreen mode Exit fullscreen mode

One command shows the compliance report:

$ brain audit

Audit Report
==================================================
Total checks: 47
Followed:     45
Blocked:      2
Compliance:   96%

Per-lesson breakdown:
  [api-put-safety]   checks=12, followed=12, blocked=0
  [git-force-push]   checks=8,  followed=7,  blocked=1
Enter fullscreen mode Exit fullscreen mode

This data answers real questions. Which lessons fire the most? Which agents have the worst compliance? Is the violation rate trending down?

After one week, our api-put-safety lesson had fired 12 times and been followed 12 times. Zero violations since installation. Before that? Two violations in two days.

It's also essential for debugging. When a subagent broke something at 3 AM, we traced back through the log and saw exactly which command triggered which lesson. No guessing.

Key insight: "If it's not logged, it didn't happen." Audit trails turn agent behavior from a black box into a dashboard.


3. The Coordinator Pattern (Single Source of Truth)

The failure: I had two AI agents running simultaneously. Claude Code was writing code. Claude Desktop (which we call Tachikoma) was supposed to be managing priorities. Nobody was actually in charge. Both agents started working on different tasks. One pushed changes that broke what the other was building. Merge conflicts. Wasted context. Chaos.

Giving agents autonomy without coordination is just entropy with extra steps.

The pattern: Designate one agent as the coordinator. It doesn't do the work. It assigns, tracks, and reviews.

Our setup:

Tachikoma (Claude Desktop)     Claude Code (CC)
        Coordinator        <-->        Worker

  - Receives tasks from human    - Executes tasks
  - Prioritizes work             - Reports progress
  - Reviews output               - Asks Tachikoma (not human)
  - Manages state                  for decisions
Enter fullscreen mode Exit fullscreen mode

The communication protocol is explicit:

  1. Human gives high-level goals to Tachikoma
  2. Tachikoma breaks them into concrete tasks and sends them to CC
  3. CC executes, reports completion back to Tachikoma
  4. Tachikoma reviews, gives feedback, assigns next task
  5. CC never asks the human directly. If CC is stuck, it asks Tachikoma

This is not micromanagement. It is the exact opposite. The human steps away entirely. The coordinator handles everything. When I wake up in the morning, the work is done and the decisions are documented.

The rule we enforce: "When a task is done or blocked, report to Tachikoma and get the next one. Do not ask the human." This keeps the loop running without a person in the chair.

One subtlety: the coordinator needs to be a different session than the worker. Same agent doing both roles will context-switch between "manager brain" and "coder brain" and do both badly. Separation of concerns applies to agents too.

Also, the coordinator needs enforcement power, not just advisory power. Early on, Tachikoma would suggest tasks and CC would ignore them. We added a hard rule to CC's prompt: "Do NOT start a new task without Tachikoma's assignment." That made the difference between a suggestion and a protocol.

Key insight: "Autonomous" does not mean "unsupervised." It means supervised by another agent. The Coordinator Pattern replaces human bottlenecks with agent bottlenecks -- which scale better because they don't sleep.


4. The Recovery Pattern (Auto-Detect, Auto-Fix)

The failure: My agents automate browser tasks through Chrome DevTools Protocol (CDP). Chrome crashes. Connection gets refused. Lock files pile up. When an agent hits ECONNREFUSED at 2 AM, the entire pipeline stops. The task stalls. The loop breaks. I wake up to nothing done.

Telling agents "just avoid errors" is fantasy. External tools fail. Networks drop. Processes die. The question isn't whether it will break. It's what happens when it does.

The pattern: Hook into tool execution outputs. Detect failure signatures. Trigger automated recovery.

We use Claude Code's PostToolUse hook to inspect every command's output:

# cdp-failure-recovery.sh (PostToolUse hook)

# Detect failure patterns in command output
if echo "$OUTPUT" | grep -qiE 'ECONNREFUSED|connection refused'; then
    FAILURE_DETECTED=true
fi
if echo "$OUTPUT" | grep -qiE 'SingletonLock|profile.*lock'; then
    FAILURE_DETECTED=true
fi

# Trigger recovery
if [[ "$FAILURE_DETECTED" == "true" ]]; then
    /home/user/.claude/hooks/cdp-recover.sh "$CDP_PORT"
fi
Enter fullscreen mode Exit fullscreen mode

The recovery script:

  1. Checks if the port is alive (via PowerShell, because WSL2 networking)
  2. Deletes stale lock files (SingletonLock)
  3. Restarts Chrome with the correct profile
  4. Retries the port check for 15 seconds
  5. If recovery fails, reports to the coordinator agent (not to the human)
$ # Agent runs a CDP command, Chrome is dead
?? CDP failure detected on port 9223. Attempting auto-recovery...
[cdp-recover] Port 9223 is down. Attempting recovery...
[cdp-recover] Deleted SingletonLock for cc-chrome
[cdp-recover] Waiting for CDP on port 9223...
[cdp-recover] Port 9223 recovered successfully!
? CDP port 9223 has been recovered. Please retry the command.
Enter fullscreen mode Exit fullscreen mode

The agent never even paused. The hook detected the failure, fixed it, and told the agent to retry. Total downtime: 15 seconds instead of 8 hours (until I woke up).

Key insight: "Fix it" beats "avoid it." The Recovery Pattern assumes failure is inevitable and builds the fix into the infrastructure. Your agents should heal themselves before they call for help.


5. The Context Handoff Pattern (Planned Amnesia)

The failure: LLM agents have a context window. It fills up. When it's full, the session ends. Everything the agent knew -- the current task, the project state, the tricky workaround it just figured out -- evaporates. The next session starts from zero.

This happened to us constantly. An agent would spend 120 tool calls making progress on a complex task, hit the context limit, and the new session had no idea what was going on. Work got repeated. Bugs got reintroduced. It felt like Groundhog Day.

The pattern: Monitor context consumption. Auto-generate handoff checkpoints before the session dies.

We built a PostToolUse hook that counts every tool invocation:

# context-monitor.sh
SOFT_WARNING=80    # "Heads up"
HARD_WARNING=120   # "Start wrapping up"
CRITICAL=150       # "Save everything NOW"

COUNT=$(cat /tmp/cc-context-monitor-count 2>/dev/null || echo 0)
COUNT=$((COUNT + 1))
echo "$COUNT" > /tmp/cc-context-monitor-count
Enter fullscreen mode Exit fullscreen mode

At the CRITICAL threshold, the hook auto-generates a session checkpoint:

# Session Checkpoint (auto-generated)
**Generated**: 2026-02-10 14:35 JST
**Tool calls**: 150 / 150 (CRITICAL)

## Immediate actions required:
1. Save current task state to status.md
2. Report to Tachikoma (coordinator)
3. Start new session
Enter fullscreen mode Exit fullscreen mode

The Stop hook fires when a session ends, saving the timestamp and state to a persistent memory file. The next session reads these files first and picks up where the last one left off.

The chain looks like this:

Session N:
  context-monitor.sh detects 80% usage ? warns agent
  context-monitor.sh detects CRITICAL  ? generates checkpoint
  on-stop.sh fires                     ? saves state to memory/

Session N+1:
  Reads CLAUDE.md ? memory files ? checkpoint
  Continues from where Session N stopped
Enter fullscreen mode Exit fullscreen mode

No data lost. No repeated work. The agent plans for its own amnesia.

Key insight: Plan for amnesia. LLM agents will forget. That's not a bug, it's a hardware constraint. The Context Handoff Pattern treats context like a battery: monitor the level, save state before it dies, and make the next charge seamless.


Putting It All Together

These five patterns form a stack:

Layer Pattern Purpose
Prevention Guard Stop mistakes before they happen
Observability Audit Trail Prove the system works
Orchestration Coordinator Keep multiple agents aligned
Resilience Recovery Fix failures automatically
Continuity Context Handoff Survive session boundaries

Each pattern exists because we didn't have it, and something broke. The Guard Pattern came from deleted articles. The Recovery Pattern came from waking up to a dead Chrome. The Coordinator Pattern came from two agents trampling each other's work.

The common thread: don't trust agents to behave correctly. Build infrastructure that forces correct behavior.

I'm not an engineer. I don't write code. But I've spent months running AI agent teams that build real software, and these patterns emerged from that experience. They're not theoretical best practices. They're scar tissue.

If you're building with LLM agents, you'll rediscover these patterns eventually. Hopefully this saves you a few deleted articles along the way.

One last thing. These patterns compound. The Guard catches a mistake. The Audit Trail logs it. The Coordinator assigns a task to write a new lesson. The Recovery keeps the machine running overnight. The Context Handoff ensures the next session picks up where the last one died.

That's not five tools. That's a system. And it gets smarter every time something breaks.


I built Shared Brain, an open-source CLI tool that implements the Guard and Audit Trail patterns. Check it out if you want your agents to stop repeating the same mistakes.

We also open-sourced the hooks: Claude Code Ops Starter - 4 bash hooks (context monitor, autonomous mode, syntax check, decision guard) that let Claude Code run without babysitting. One-command install, MIT licensed.

Not sure where to start? Check your setup safety in 10 seconds - free, runs locally, no signup.

For the complete autonomous operations setup, see the CC-Codex Ops Kit ($79) - 22 files, 15-minute setup.


Free Tools for Claude Code Operators

Tool What it does
cc-health-check 20-check setup diagnostic (CLI + web)
cc-session-stats Usage analytics from session data
cc-audit-log Human-readable audit trail
cc-cost-check Cost per commit calculator

Interactive: Are You Ready for an AI Agent? - 10-question readiness quiz | 50 Days of AI - the raw data


More tools: Dev Toolkit - 56 free browser-based tools for developers. JSON, regex, colors, CSS, SQL, and more. All single HTML files, no signup.

Top comments (0)