GitHub shipped /fleet for parallel subagent dispatch in Copilot CLI earlier this year. It works. The problem is what happens after: you get code from multiple agents and no structured way to know if any of it actually works.
I built Copilot Swarm Orchestrator to fill that gap. It wraps copilot -p as isolated subprocesses, each on its own git branch, and verifies every agent's output against its session transcript before anything merges.
What verification looks like
The orchestrator captures each agent's /share transcript and parses it for concrete evidence: commit SHAs, test runner output, build markers, file changes. Every claim the agent makes gets cross-referenced against that evidence. If the agent says "all tests pass" but the transcript shows a test failure, the step fails verification.
Failed steps don't just get retried blindly. The Repair Agent classifies the failure (build error, test failure, missing artifact, dependency issue, timeout) and applies a strategy specific to that failure type. Context accumulates across retries so the agent doesn't repeat the same mistake.
Quality gates
After merge, six automated gates scan the generated code:
| Gate | What it catches |
|---|---|
| Scaffold leftovers | TODO placeholders, Lorem ipsum |
| Duplicate detection | Repeated code blocks |
| Hardcoded config | Magic strings and values |
| README drift | Claims that don't match actual code |
| Test isolation | Cross-test dependencies and shared state |
| Runtime correctness | Execution-time failures |
Gates that fail can auto-inject follow-up remediation steps. The orchestrator spawns a targeted Copilot session to fix what the gate flagged.
Cost tracking
Every copilot -p invocation burns a premium request. Model multipliers compound this: o3 costs 20x per invocation, o4-mini costs 5x. An 8-step plan with retries on a 20x model can consume a month's Pro allowance in one run.
The cost estimator predicts consumption before execution starts, using model multipliers and historical failure rates from a persistent knowledge base. You can preview the estimate and exit, or set a hard budget that aborts if the estimate exceeds it. After execution, per-step attribution shows exactly where requests went.
Example: cost estimate output for dashboard-showcase demo
Plan Analysis:
Steps: 8 (4 parallel waves)
Model: o4-mini (5x multiplier)
Estimated requests: 40-65 (accounting for ~30% retry rate)
Budget impact: ~13-22% of monthly Pro allocation
Per-step breakdown:
Step 1 (scaffold): 5 requests (low retry probability)
Step 2 (API routes): 8 requests (moderate complexity)
Step 3 (chart logic): 10 requests (high retry probability)
...
v3.2: the speed release
The latest release replaced the wave-barrier scheduler with greedy scheduling. Steps launch the moment their dependencies resolve instead of waiting for an entire wave to finish.
Other changes in this release:
-
Prompt compression extracts shared boilerplate into
.copilot-instructions.md(which Copilot CLI reads natively), cutting ~60% of repeated tokens per step - Octopus merge for parallel branch completion: one merge commit instead of N
-
Event-driven dependency resolution via
EventEmitterinstead of file polling
The dashboard-showcase demo (4 agents building a React + Chart.js + Express app with 27 tests) dropped from 8m 48s to 7m 56s.
By the numbers
| Metric | Value |
|---|---|
| TypeScript source files | 71 |
| Lines of code | 17,903 |
| Tests passing | 649 (Mocha + Node.js assert) |
| Built-in demo scenarios | 6 (1-min smoke test to 40-min SaaS MVP) |
| Contributors | 1 |
| License | ISC |
The quickest way to see it work:
npm start demo-fast
Runs two parallel agents in about a minute.
Top comments (0)