The serial bottleneck
You have a plan with six batches of AI-driven work: build the auth module, write its tests, scaffold the dashboard, add the API routes, wire up the middleware, write the integration tests. Batches 1–3 have nothing to do with batches 4–6. No shared files, no dependency chain, no ordering constraint.
But they run one at a time. Twenty minutes of wall-clock time for work that could finish in ten.
This is the embarrassingly parallel problem. A single Claude Code session is inherently serial — it processes one task, commits, moves to the next. If your batches are independent, you're paying a serial tax for no reason.
The pattern: do it by hand
The fix is git worktrees. A worktree gives you a second (or third, or fourth) working directory for the same repository, each checked out on its own branch. Two Claude Code sessions can work simultaneously in two worktrees without ever touching each other's files.
The manual version is about 15 lines of shell:
[code block]
Step by step:
-
git worktree addcreates a new working directory on a fresh branch. Both branches start from HEAD, so they share an identical starting point. -
claude --headlesslaunches Claude Code without a terminal UI. The-pflag passes a prompt;&sends each session to the background. -
waitblocks until both background processes finish. - The merge brings Stream B's changes into Stream A, then Stream A — now containing both sets of changes — back into your original branch.
- Cleanup removes the worktrees and their directories.
That's the entire pattern. Each session has its own working directory, its own branch, complete isolation. No file conflicts mid-flight.
It works — but there's a lot that can go wrong. Hit Ctrl+C and you have orphaned claude processes in the background. Forget cleanup and you have stale worktrees cluttering your repo. A merge conflict leaves you stuck with no error handling and no visibility into what happened.
Which is why I automated it.
Where it breaks
Before the automation, some honest caveats.
Batch dependencies. If batch 4 needs output from batch 2, splitting them across streams will cause failures. You need to know your dependency graph before splitting. Independent batches parallelize cleanly; dependent ones don't.
Merge conflicts. Isolated worktrees prevent simultaneous file conflicts — neither session can see the other's uncommitted changes. But they can't prevent logical conflicts. If both sessions modify the same function in different ways, the merge will fail. That's a feature, not a bug: you want to know about it rather than have it silently auto-resolved.
Double API cost. Two concurrent sessions means double the token usage. For large plans with 6+ batches, the time savings are worth it. For a 3-batch plan, probably not.
Automating it: cast-parallel
I wrapped this pattern into a script called cast-parallel. Before running anything, preview the split with a dry run:
[code block]
The script reads an Agent Dispatch Manifest — a JSON block embedded in a plan file — counts the batches, and splits them at the midpoint. Override with --split N to force a different cut point.
Here's what it adds on top of the manual approach:
[diagram block]
A few design decisions worth calling out:
- Subprocess guard: Checks an environment variable at startup and exits immediately if a parent CAST session spawned the script — preventing recursive execution inside agent chains.
-
Trap handler: Catches
INTandTERMsignals, kills both background processes, and removes worktrees. No orphaned processes, no stale directories. -
PID-based branch names (e.g.,
cast-parallel-a-12345): Prevents collisions when running multiple parallel executions against the same repo. - Merge conflicts are never auto-resolved: Worktrees are preserved so you can inspect and fix them yourself.
Optional database logging records events at each stage (parallel_start, parallel_streams_done, parallel_complete, parallel_fail, parallel_merge_conflict) for observability. If the logger isn't present, it's silently skipped.
When to use this pattern
Good fit: Large plans with 6+ independent batches. The wall-clock savings scale linearly — a 20-minute plan becomes a 10-minute plan.
Not worth it: Small plans under 4 batches. Worktree setup, merge, and cleanup overhead eats into the savings.
Don't use: Plans with strict batch ordering where later batches depend on earlier ones. Use sequential execution instead.
Always dry-run first. Preview the split, verify the batches in each stream are truly independent, and adjust with --split N if the auto-midpoint is wrong.
Try it
The pattern is simple enough to implement by hand. The automation handles the parts that break — signal traps, PID tracking, merge conflict preservation, cleanup. If you're already running Claude Code on multi-batch plans, this is a low-effort way to cut your wall-clock time roughly in half.
The repo is at github.com/ek33450505/cast-parallel. Part of the CAST ecosystem, but works standalone with just the Claude CLI and git. MIT licensed, contributions welcome.
Top comments (0)