DEV Community

maymay5692
maymay5692

Posted on

How to Run 9 AI Agents in Parallel with tmux and Claude Code

How to Run 9 AI Agents in Parallel with tmux and Claude Code

Disclaimer: This article describes my personal setup. Costs and results may vary. Some links may be affiliate links.

I run nine AI agents simultaneously in a terminal. They write code, draft articles, review each other's work, and manage deployments — all coordinated through YAML files and tmux panes. No orchestration framework. No cloud infrastructure. Just a Mac, tmux, and Claude Code.

Credit where it's due: This system is built on top of @shio_shoppaize's multi-agent-shogun project and their Zenn article "Claude Codeで『AI部下10人』を作ったら…" (in Japanese). The core design — tmux + Claude Code, shogun/karo/ashigaru hierarchy, YAML-based communication, no-polling architecture — comes directly from their work. I adopted it wholesale and then layered on my own customizations (MAGI consensus system, watchdog monitoring, cost optimization, etc.) over three months of daily use.

This post explains how the system works, what broke along the way, and what I'd change if I started over.

The Setup: tmux + Claude Code

The core idea is simple: each tmux pane runs its own Claude Code session. Each session has its own context window, its own memory, its own task. Nine panes, nine independent agents.

# Create the session with 9 panes
tmux new-session -d -s multiagent
for i in $(seq 1 8); do
  tmux split-window -t multiagent
done
tmux select-layout -t multiagent tiled
Enter fullscreen mode Exit fullscreen mode

But nine agents doing whatever they want is chaos. So I added hierarchy.

The Hierarchy: Feudal Japan, but for AI

I needed to name things, and "agent_manager_01" is boring. So I went with Sengoku-era Japanese military ranks. Partly because it's fun, partly because shogun_to_karo.yaml is more readable than orchestrator_to_dispatcher_queue.yaml.

Human (The Lord)
    │
    ▼
┌──────────┐
│  SHOGUN  │  ← Commander (1 agent)
└────┬─────┘    Never writes code. Delegates everything.
     │
┌────┴─────┐
│   KARO   │  ← Advisor (1 agent)
└────┬─────┘    Decomposes objectives. Manages foot soldiers.
     │
┌─┬─┬─┬─┬─┐
│1│2│3│4│5│  ← Foot soldiers (up to 5)
└─┴─┴─┴─┴─┘    Write code, run tests, draft articles.

Plus: MAGI system (3 agents) for quality review
Enter fullscreen mode Exit fullscreen mode

The Shogun — Never Touches Code

The Shogun's job is to receive instructions from me, write a YAML command file, and send it to the Karo. That's it.

There's a hard rule in the instruction file: if the Shogun starts reading code or debugging, something has gone wrong. Early on, the Shogun tried to fix a bug directly. The Karo didn't know. Two foot soldiers were editing the same file from different angles. Three-way merge conflict. An hour of untangling.

Now the Shogun has exactly one workflow:

  1. Receive command from human
  2. Write YAML to queue/shogun_to_karo.yaml
  3. Wake the Karo with tmux send-keys
  4. Wait for the dashboard to update

The Karo — Middle Management That Works

The Karo takes high-level objectives and decomposes them into concrete tasks. Each task gets written to a dedicated YAML file for a specific foot soldier.

# queue/tasks/ashigaru2.yaml
task:
  task_id: cmd_060_task_002
  description: |
    Run backtests for ETH/USDT and SOL/USDT
    using TOP5 strategies from the engine.
  status: assigned
  assigned_to: ashigaru2
  timestamp: "2026-02-15T09:30:00"
Enter fullscreen mode Exit fullscreen mode

The Karo also maintains dashboard.md — a human-readable status page that I check every morning. It tracks what's in progress, what's blocked, what's done, and what needs my attention.

The Foot Soldiers — Do the Actual Work

These are the workers. They pick up tasks from their personal YAML file, execute them, write a report YAML, and ping the Karo.

Each foot soldier has its own task file and its own report file. They can't see each other's files. This prevents the most common multi-agent failure mode: two agents editing the same file simultaneously.

Communication: YAML + send-keys (No Polling)

Here's the part that matters most for cost: agents never poll for new instructions.

If an agent sits in a loop checking "any new tasks?" every 30 seconds, that's burning tokens on nothing. Instead, the system is entirely event-driven. Instructions are written to YAML files, and the recipient gets woken up with tmux send-keys:

# Step 1: Write the message (DO NOT press Enter yet)
tmux send-keys -t multiagent:0.2 'New task assigned. Check your queue.'

# Step 2: Press Enter (MUST be a separate call)
tmux send-keys -t multiagent:0.2 Enter
Enter fullscreen mode Exit fullscreen mode

The two-step send-keys is critical. If you combine them into one call, the Enter key sometimes gets swallowed or the message gets corrupted. Every agent's instruction file has this drilled in: always two separate bash calls.

Think of it like Slack. The YAML file is the message. The send-keys is the notification ping. If nobody pings you, you sit idle — and idle agents consume zero API tokens.

The Communication Chain

foot soldier → Karo → Shogun → human

Foot soldiers NEVER talk to the Shogun directly.
The Shogun NEVER talks to foot soldiers directly.
The Karo NEVER sends messages to the human's pane.
Enter fullscreen mode Exit fullscreen mode

That last rule exists because the human might be typing when a message comes in. A stray send-keys into the human's pane would corrupt their input. So reports go to a dedicated monitoring pane (shogun:0.1), never the human's input pane (shogun:0.0).

The Report Format

# queue/reports/ashigaru2_report.yaml
worker_id: ashigaru2
task_id: cmd_060_task_002
timestamp: "2026-02-15T10:15:00"
status: done
result:
  summary: "Backtests complete for ETH and SOL"
  files_modified:
    - "results/eth_usdt_top5.csv"
    - "results/sol_usdt_top5.csv"
  notes: "SOL showed higher volatility than expected"
skill_candidate:
  found: false
Enter fullscreen mode Exit fullscreen mode

Every report requires a skill_candidate field — if the foot soldier notices a repeatable pattern that could be turned into a reusable skill, it flags it. This is how the system improves over time.

The Watchdog: Auto-Recovery

Agents crash. Context windows fill up. Sessions hang. The watchdog script runs via crontab every five minutes and checks every pane:

# crontab entry
*/5 * * * * /path/to/tools/watchdog.sh
Enter fullscreen mode Exit fullscreen mode

The watchdog does three things:

1. Detects idle agents with pending tasks

is_pane_idle() {
    local content
    content=$(tmux capture-pane -t "$pane" -p | tail -5)

    if echo "$content" | grep -qE 'thinking|Effecting|Calculating'; then
        echo "busy"
    elif echo "$content" | grep -q '❯ '; then
        echo "idle"
    else
        echo "busy"
    fi
}
Enter fullscreen mode Exit fullscreen mode

If an agent is idle but has unfinished tasks in its queue, the watchdog sends a nudge via send-keys.

2. Detects crashed sessions

If Claude Code isn't running in a pane (just a shell prompt), the watchdog restarts it with the instruction file pre-loaded.

3. Detects deadlocks

The trickiest case: all agents are idle, no pending tasks in queues, but the dashboard shows work in progress. This means something got lost — a report wasn't sent, a task wasn't assigned, or the Karo dropped the ball after a context window reset.

When this happens, the watchdog escalates to the Shogun's monitoring pane, not the Karo. The Shogun decides what to do next. This prevents the Karo from making decisions without the Shogun's knowledge.

MAGI: The Quality Gate

Three of the nine agents are reserved for MAGI — a consensus review system named after the supercomputer in Evangelion.

Pane Persona Evaluation Lens
0.6 MELCHIOR (Scientist) Data accuracy, logic, factual correctness
0.7 BALTHAZAR (Mother) Safety, sustainability, risk assessment
0.8 CASPER (Woman) Gut instinct, honesty, reader experience

All three review independently — no peeking at each other's opinions. Then they vote: approve, reject, or conditional. Two out of three must approve.

CASPER is the hardest to please. She catches things like:

  • Paragraphs that are all the same length (AI writing pattern)
  • Lists with exactly three items (AI loves threes)
  • Opening sentences that sound like fortune cookies
  • Technical content that's accurate but boring

My first article went through five rounds of MAGI rejection before passing. Annoying? Yes. Worth it? Also yes.

How MAGI Gets Triggered

# Shogun sends the same question to all three panes simultaneously
tmux send-keys -t multiagent:0.6 'Review article_17. Read queue/magi_question.yaml'
tmux send-keys -t multiagent:0.6 Enter
tmux send-keys -t multiagent:0.7 'Review article_17. Read queue/magi_question.yaml'
tmux send-keys -t multiagent:0.7 Enter
tmux send-keys -t multiagent:0.8 'Review article_17. Read queue/magi_question.yaml'
tmux send-keys -t multiagent:0.8 Enter
Enter fullscreen mode Exit fullscreen mode

Each persona writes their verdict to queue/reports/magi_{name}.yaml. The Shogun collects all three and makes the final call.

What Broke

Context Exhaustion

The Karo runs out of context window mid-task more than anyone else. It's the bottleneck — every report passes through it, every task decomposition happens in it, every dashboard update goes through it.

When the Karo's context fills up and gets compacted, it loses the nuance of what's in progress. Reports get mishandled. Tasks get assigned to the wrong foot soldier. The recovery takes longer than the original task.

The fix: monitor context usage and rotate the Karo proactively at 30% remaining. Don't wait for it to hit the wall.

Race Conditions

Two foot soldiers editing the same file at the same time. This happened three times before I added the dedicated-file rule. Now each foot soldier has its own task file, its own report file, and a hard rule: if your task requires editing a file that another soldier might touch, report status: blocked and let the Karo figure out sequencing.

API Rate Limits

Five foot soldiers working simultaneously hit Claude's rate limits. The solution: maximum four agents active at once. If there are five tasks, deploy four, wait for the first to finish, then deploy the fifth.

The Karo manages this automatically — it counts how many foot soldiers are currently in_progress and holds the rest in queue.

The Shogun Doing Code Review

I mentioned this already, but it's the most important lesson. The Shogun tried to be helpful by reading code directly. This violated the hierarchy, created merge conflicts, and confused the Karo. Now there's a rule enforced by a pre-prompt hook: if the Shogun attempts to read source code files, the hook blocks it and logs a violation.

Watchdog False Positives

Early version of the watchdog couldn't distinguish between "Claude Code is idle at the prompt" and "Claude Code hasn't started yet." It would restart agents that were perfectly fine, killing their context. The fix was parsing the pane content more carefully — looking for the Claude Code prompt character () versus a shell prompt.

Cost and Performance

Running Cost

Nine agents don't all run simultaneously. On a typical day:

  • Shogun: 2-3 activations (receive command, check dashboard, handle report)
  • Karo: 5-10 activations (decompose tasks, process reports, update dashboard)
  • Foot soldiers: 1-2 tasks each, 4 max at once
  • MAGI: triggered only for quality reviews (maybe once a day)

Idle agents cost zero. The send-keys approach means you only pay for work that's actually happening.

Throughput

Real example: deploying sixteen articles across three platforms (note.com, Zenn, dev.to) in three weeks. Five foot soldiers writing simultaneously — three drafting articles for different platforms, one running backtests, one building the publish automation. The Karo coordinating all of them through YAML task files.

A single agent would have taken three times as long. The hierarchy prevents the coordination chaos that makes naive multi-agent setups slower than a single agent.

Lessons Learned

Start with two agents, not nine. One commander, one worker. Get the YAML protocol working. Get send-keys reliable. Then add more workers one at a time.

The hierarchy isn't overhead, it's load-bearing. I tried a flat structure first. Agents duplicated work, overwrote each other's files, and sent conflicting reports. The three-layer hierarchy (Shogun → Karo → foot soldiers) eliminated 90% of coordination failures.

Idle is free, restart is expensive. An idle Claude Code session at the prompt consumes zero API tokens. A restarted session loses its entire context — all the code understanding, task history, and project knowledge. Keep agents idle, don't exit them.

The watchdog is essential. Without it, a single crashed agent can stall the entire pipeline. The five-minute check interval catches problems before they cascade.

Context window is the real bottleneck. Not speed, not rate limits — context. The Karo processes every report and every task. It fills up fastest. Plan for rotation.

YAML is the right protocol. I considered JSON, plain text, and structured tool calls. YAML won because it's human-readable (I debug by reading the files), AI-readable (Claude handles YAML natively), and diff-friendly (git shows clean diffs).

No polling. Ever. Event-driven communication via send-keys is the single most important cost optimization. One agent polling every 30 seconds would cost more per day than all nine agents doing actual work.

Try It

The full system runs on any machine with tmux and Claude Code installed. The instruction files, YAML protocol, and watchdog script are the only infrastructure. No Docker, no Kubernetes, no cloud services.

If you want to experiment, start with this:

# Terminal 1: The commander
tmux new-session -s multiagent -n commander

# Terminal 2: One worker
tmux split-window -t multiagent

# In the commander pane, start Claude Code with an instruction file
# In the worker pane, start Claude Code with a different instruction file
# Write a task YAML, send-keys to the worker, see what happens
Enter fullscreen mode Exit fullscreen mode

Scale up when the two-agent version works. The hierarchy will earn its keep around agent number four.


Built with Claude Code. The system described here manages a crypto trading bot, a technical article pipeline, and a Telegram signal service — all coordinated through YAML files and tmux.

Top comments (0)