~K¹yle Million

Posted on Apr 21

Forked Agent Architecture: Run 4 Claude Code Agents in Parallel Without Deadlocks

#claudecode #devtools #aiagents #productivity

Forked Agent Architecture: Run 4 Claude Code Agents in Parallel Without Deadlocks

Most Claude Code setups run one agent at a time. That's fine for small tasks. For anything that decomposes — research across 20 files, writing 8 independent modules, processing 50 API responses — sequential execution means you're waiting on a single thread when you have a machine that could run four.

The naive solution is "just spawn more agents." The naive solution deadlocks.

This post covers the pattern that actually works in production: forked agent architecture. One coordinator, N independent workers, filesystem-mediated handoffs, deterministic collection. No shared state. No race conditions.

Why Naive Parallelism Fails

When two Claude Code agents write to the same file at the same time, one wins and one loses. When both read the same state before either writes, you get duplicate work. When neither checks what the other is doing, you get a coordination nightmare that's harder to debug than sequential execution was slow.

The specific failure modes I've seen:

Race condition on output: Agent A and Agent B both write outputs/result.md. Agent B's write lands 200ms after Agent A's and overwrites it. You get B's work only, with no indication A ran at all.

Duplicate execution: Coordinator spawns workers, workers read from a shared task queue, two workers grab the same task because neither locked it first. You pay for the same work twice.

State corruption: Worker A reads a config file, Worker B reads the same config file, Worker A writes an update. Worker B's subsequent write is based on the stale read, not A's update. The final state is wrong and neither agent knows it.

All of these come from shared mutable state. The fix is to eliminate shared mutable state entirely.

The Forked Agent Pattern

The pattern has three components: a coordinator, isolated worker contexts, and a merge step.

Coordinator
├── splits task into N independent subtasks
├── assigns each subtask a unique ID and isolated output directory
├── spawns N workers, each with its own CLAUDE.md and context
└── waits for all workers to write .done markers, then merges

Worker_1 → ~/work/tasks/001/  (reads only: task_spec.md)
Worker_2 → ~/work/tasks/002/  (reads only: task_spec.md)
Worker_3 → ~/work/tasks/003/  (reads only: task_spec.md)
Worker_N → ~/work/tasks/N/    (reads only: task_spec.md)
           ↓
        outputs/001.md
        outputs/002.md  
        outputs/003.md
        outputs/N.md
           ↓
     Coordinator merges

Each worker reads exactly one file (its task spec) and writes exactly one file (its output). No worker reads another worker's output. No worker writes to a shared location. The coordinator is the only process that sees all outputs.

Implementation

Step 1: Task Decomposition

The coordinator writes one spec file per worker before spawning anything:

# coordinator.sh — runs in ~/intuitek/

TASK_BASE="$HOME/intuitek/work/tasks/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$TASK_BASE"/{001,002,003,004}/

# Write spec for each worker
cat > "$TASK_BASE/001/task_spec.md" << 'EOF'
## Task 001
Analyze files in ~/intuitek/capabilities/revenue/ 
Identify the top 3 revenue blockers. 
Write findings to outputs/001.md.
When complete, write "DONE" to outputs/001.done
EOF

cat > "$TASK_BASE/002/task_spec.md" << 'EOF'
## Task 002
Analyze files in ~/intuitek/capabilities/comms/
Identify the top 3 communication gaps.
Write findings to outputs/002.md.
When complete, write "DONE" to outputs/002.done
EOF

# ... repeat for 003, 004

Step 2: Spawn Workers in Parallel

# Spawn all workers in parallel
for i in 001 002 003 004; do
    WORKER_DIR="$TASK_BASE/$i"
    claude -p "Read $WORKER_DIR/task_spec.md and execute the task exactly as specified." \
        --allowedTools "Read(*),Write(*),Bash(ls,cat,grep)" \
        --output-format text \
        > "$WORKER_DIR/outputs/${i}.md" 2>&1 &
    echo "Spawned worker $i (PID $!)"
done

echo "All workers spawned. Waiting..."
wait
echo "All workers complete."

The & at the end of each claude -p call is what makes this parallel. wait blocks until all background processes finish. Each worker gets its own --allowedTools scope — they can only read and write within their assigned context.

Step 3: Deterministic Merge

# Merge step — coordinator only
MERGED="$HOME/intuitek/outputs/merged_analysis_$(date +%Y%m%d_%H%M%S).md"

echo "# Merged Analysis" > "$MERGED"
echo "Generated: $(date -u)" >> "$MERGED"
echo "" >> "$MERGED"

for i in 001 002 003 004; do
    echo "## Worker $i" >> "$MERGED"
    if [ -f "$TASK_BASE/$i/outputs/${i}.md" ]; then
        cat "$TASK_BASE/$i/outputs/${i}.md" >> "$MERGED"
    else
        echo "[Worker $i produced no output]" >> "$MERGED"
    fi
    echo "" >> "$MERGED"
done

echo "Merged output: $MERGED"

The merge step is pure file concatenation. No reasoning, no synthesis by the coordinator — that would require another LLM call and burns credits. If you need synthesis, schedule it as a separate, single-agent pass over the merged file.

The `.done` Marker Pattern

For tasks with unpredictable duration, polling is safer than relying on wait:

# Each worker writes this when complete
echo "DONE" > "$WORKER_DIR/outputs/${TASK_ID}.done"

# Coordinator polls
MAX_WAIT=300  # 5 minutes
for i in 001 002 003 004; do
    elapsed=0
    while [ ! -f "$TASK_BASE/$i/outputs/${i}.done" ]; do
        sleep 5
        elapsed=$((elapsed + 5))
        if [ "$elapsed" -ge "$MAX_WAIT" ]; then
            echo "Worker $i timed out after ${MAX_WAIT}s" >> "$HOME/intuitek/logs/errors.log"
            break
        fi
    done
done

This pattern is essential when workers are Claude Code agents running in tmux panes rather than bare background processes. Tmux panes don't participate in shell job control — wait won't catch them. The .done marker is the only reliable signal.

What Not to Do

Don't use a shared task queue without locks. If workers read from a common queue.txt to pick up tasks, you will get duplicate execution. Either pre-assign tasks before spawning (what this pattern does) or implement a lock file before any queue read.

Don't have workers read each other's outputs mid-run. Dependencies between workers turn parallel execution into a pipeline, which is a different pattern with different failure modes. If Worker 3 needs Worker 1's output, Worker 3 should run after Worker 1, not concurrently.

Don't share CLAUDE.md context between workers. Each worker should see only its task spec and the files it needs. Broad CLAUDE.md context in each worker wastes tokens and can cause workers to take on coordinator responsibilities they weren't assigned.

Production Considerations

Tmux for interactive workers. If you need to monitor workers in real time or your workers need interactive capability, spawn them in tmux panes rather than bare background processes:

tmux new-session -d -s "workers"
for i in 001 002 003 004; do
    tmux new-window -t "workers" -n "w$i"
    tmux send-keys -t "workers:w$i" "claude -p 'Read $TASK_BASE/$i/task_spec.md and execute.' --allowedTools 'Read(*),Write(*)'" Enter
done

Rate limiting. Four concurrent Claude Code sessions against a Max subscription will hit the hourly rate limit faster than one. Batch workers in groups of 2 if you're running close to the cap, or stagger spawn times by 30 seconds.

Error propagation. Workers don't know about each other. If Worker 2 fails, Workers 1, 3, and 4 continue. The coordinator must check for missing .done markers and missing output files before the merge step, and log failures explicitly.

I packaged the full production implementation — including the coordinator script, worker spec templates, the .done polling loop, tmux spawn variant, and error handling — as a ClawMart skill:

Forked Agent Architecture — Production Parallel Execution

It's the pattern I use when any task in ~/intuitek/ can be split into independent subtasks. Works on Max subscription without API key.

DEV Community

Forked Agent Architecture: Run 4 Claude Code Agents in Parallel Without Deadlocks

Forked Agent Architecture: Run 4 Claude Code Agents in Parallel Without Deadlocks

Why Naive Parallelism Fails

The Forked Agent Pattern

Implementation

Step 1: Task Decomposition

Step 2: Spawn Workers in Parallel

Step 3: Deterministic Merge

The `.done` Marker Pattern

What Not to Do

Production Considerations

Top comments (0)

Forked Agent Architecture: Run 4 Claude Code Agents in Parallel Without Deadlocks

Why Naive Parallelism Fails

The Forked Agent Pattern

Implementation

Step 1: Task Decomposition

Step 2: Spawn Workers in Parallel

Step 3: Deterministic Merge

The .done Marker Pattern

What Not to Do

Production Considerations

The `.done` Marker Pattern