TL;DR: Swarm-IOSM is an orchestration engine for Claude Code that transforms complex development tasks into coordinated parallel work streams. It implements continuous dispatch scheduling (no wave barriers), hierarchical file lock management, and enforces IOSM quality gates before merge. Real-world speedup: commonly 3-8x faster than sequential execution.
The Parallel Agent Problem
You're working on a complex feature. It needs:
- Codebase analysis to understand existing patterns
- Architecture design for the new system
- Implementation across 3 modules (independent)
- Integration tests
- Security audit
Traditional approach: One agent does everything sequentially. 15 hours of wall-clock time.
What if you could run analysis, design, and implementation in parallel? 4-6 hours.
But here's the catch: parallel AI agents need coordination. They can't all edit the same file. They need to share knowledge. And you need quality guarantees before merging their work.
That's what Swarm-IOSM solves.
What is Swarm-IOSM?
Swarm-IOSM is a Claude Code Skill that orchestrates parallel AI agent execution with built-in quality enforcement. It combines:
- Continuous Dispatch Loop — Tasks launch immediately when dependencies are met (no artificial wave barriers)
- File Lock Management — Hierarchical conflict detection prevents parallel write chaos
- PRD-Driven Planning — Structured requirements → decomposition → execution
- IOSM Quality Gates — Automated code quality, performance, and modularity checks
- Auto-Spawn Protocol — Agents discover new work during execution
Core Model
Touches → Locks → Gates → Done
A correctness model for parallel agent work:
- Declare what files you touch
- Acquire locks to prevent conflicts
- Pass quality gates
- Ship
Key Innovation: Continuous Dispatch
Traditional orchestration waits for entire "waves" to complete:
Wave 1: [T01, T02, T03] → Wait for ALL to finish
Wave 2: [T04, T05] → Can't start until Wave 1 done
Swarm-IOSM uses continuous scheduling:
T01 done → T04 starts IMMEDIATELY (even if T02, T03 still running)
This eliminates idle time and maximizes parallelism. Here's the dispatch algorithm:
while not gates_met:
# 1. Collect ready tasks (deps satisfied, no conflicts)
ready = [t for t in backlog if deps_satisfied(t) and not conflicts(t)]
# 2. Classify by mode (background vs foreground)
bg = [t for t in ready if can_auto_background(t)]
fg = [t for t in ready if needs_user_input(t)]
# 3. Dispatch batch (max 3-6 tasks)
launch_parallel(bg[:6], mode='background')
launch_parallel(fg[:2], mode='foreground')
# 4. Monitor & spawn
for report in collect_completed():
spawn_candidates = parse_spawn_candidates(report)
backlog.extend(deduplicate(spawn_candidates))
# 5. Check gates
if all_gates_pass():
break
Result: Tasks launch as soon as they're ready, not when an arbitrary wave completes.
Live Example: Adding Redis Caching
Let's walk through a real track from examples/demo-track/.
Problem
API endpoint /api/natal/chart has 450ms P95 latency. Database CPU at 75% during peak hours.
Goal
Add Redis caching to reduce latency to <200ms and achieve 80%+ cache hit rate.
Step 1: Create Track
/swarm-iosm new-track "Add Redis caching to API endpoints"
Claude generates:
-
PRD.md— 10 sections (Problem, Goals, Requirements, Risks, IOSM Targets) -
spec.md— Technical design with acceptance tests -
plan.md— Task breakdown with dependencies
Generated plan (7 tasks):
T01: Analyze current performance (Explorer, 1h, read-only)
T02: Design caching strategy (Architect, 2h, foreground)
T03: Implement cache service (Implementer-A, 3h, background)
T04: Add caching to /natal endpoint (Implementer-B, 2h, background, after T03)
T05: Add caching to /transits endpoint (Implementer-C, 2h, background, after T03)
T06: Integration testing (TestRunner, 2h, background, after T04+T05)
T07: Security audit + merge (Integrator, 1h, foreground, after T06)
Step 2: Execute Plan
/swarm-iosm implement
Orchestrator creates continuous_dispatch_plan.md:
## Initial Ready Set
- T01 (Explorer, background)
## Expected Timeline
Batch 1: T01 → completes in 1h
Batch 2: T02 → completes in 2h (total: 3h)
Batch 3: T03 → completes in 3h (total: 6h)
Batch 4: T04, T05 (PARALLEL) → completes in 2h (total: 8h)
Batch 5: T06 → completes in 2h (total: 10h)
Batch 6: T07 → completes in 1h (total: 11h)
Serial estimate: 13h
Parallel estimate: 11h
Speedup: ~1.2x
But wait — T01 discovers an N+1 query issue:
## SpawnCandidates (from T01 report)
| ID | Subtask | Touches | Effort | Severity |
|----|---------|---------|--------|----------|
| SC-01 | Optimize calculate_aspects N+1 query | `backend/core/astro/natal.py` | M | medium |
Orchestrator auto-spawns SC-01 and adjusts timeline.
Step 3: Integration & Quality Gates
/swarm-iosm integrate demo-add-caching
Generated iosm_report.md:
## Gate Evaluation Summary
| Gate | Target | Final | Status |
|------|--------|-------|--------|
| Gate-I (Code Quality) | ≥0.75 | 0.89 | ✅ PASS |
| Gate-O (Performance) | Tests pass | All pass | ✅ PASS |
| Gate-M (Modularity) | No circular deps | Pass | ✅ PASS |
| Gate-S (Simplicity) | API stable | N/A | ⚪ SKIP |
IOSM-Index: 0.85 ✅ (threshold: 0.80)
**Result:** APPROVED FOR PRODUCTION MERGE
Results
- ⚡ P95 latency: 450ms → 180ms (60% improvement)
- 🎯 Cache hit rate: 82%
- ✅ All tests passing (24 unit + 6 integration)
- 🔒 Zero production errors during rollout
- ⏱️ Total time: 9.25h parallel vs 16h+ sequential (~1.7x faster)
Technical Deep Dive
1. File Lock Management
Challenge: How do you prevent two agents from editing the same file simultaneously?
Solution: Hierarchical lock manager with folder/file awareness.
Lock rules:
def conflicts(lock_a: str, lock_b: str) -> bool:
a, b = normalize(lock_a), normalize(lock_b)
# Exact match
if a == b:
return True
# Folder contains file
if a.startswith(b + '/') or b.startswith(a + '/'):
return True
return False
Example:
## Lock Plan
Tasks with overlapping touches (sequential only):
- `backend/core/__init__.py`: T03, T04 → ❌ Cannot run parallel
- `backend/api/`: T05, T06 → ❌ Folder conflict
Safe parallel execution:
- `backend/auth.py` (T02) + `backend/payments.py` (T07) → ✅ No overlap
Read-only tasks: Always parallel (no locks needed).
2. IOSM Quality Gates
Four gates enforce production-grade quality:
Gate-I: Improve (Code Quality)
semantic_coherence: ≥0.95 # Clear naming, no magic numbers
duplication_max: ≤0.05 # Max 5% duplicate code
invariants_documented: true # Pre/post-conditions
todos_tracked: true # All TODOs in issue tracker
Measured by: AST analysis, clone detection, docstring coverage.
Gate-O: Optimize (Performance & Resilience)
latency_ms:
p50: ≤100
p95: ≤200
p99: ≤500
error_budget_respected: true
chaos_tests_pass: true
no_obvious_inefficiencies: true # N+1 queries, memory leaks
Measured by: Load testing (locust, k6), chaos engineering, profiling.
Gate-M: Modularize (Clean Boundaries)
contracts_defined: 1.0 # 100% of modules
change_surface_max: 0.20 # ≤20% of codebase touched
no_circular_deps: true
coupling_acceptable: true
Measured by: Dependency graph analysis, interface stability.
Gate-S: Shrink (Minimal Complexity)
api_surface_reduction: ≥0.20 # Or justified growth
dependency_count_stable: true
onboarding_time_minutes: ≤15
Measured by: Public API count, requirements.txt diff, README clarity.
IOSM-Index Calculation:
IOSM-Index = (Gate-I + Gate-O + Gate-M + Gate-S) / 4
Production Threshold: ≥ 0.80
Auto-spawn rules:
- Gate-I < 0.75 → Spawn clarity/duplication fixes
- Gate-O fails → Spawn test/performance fixes
- Gate-M fails → Spawn boundary clarification tasks
3. Auto-Spawn Protocol
Problem: Agents discover issues during execution (e.g., N+1 queries, missing tests).
Solution: Structured SpawnCandidates section in reports.
Format:
## SpawnCandidates
| ID | Subtask | Touches | Effort | User Input | Severity | Dedup Key | Accept Criteria |
|----|---------|---------|--------|------------|----------|-----------|-----------------|
| SC-01 | Fix missing type annotation | `backend/auth.py` | S | false | medium | auth.py\|type-annot | mypy passes |
| SC-02 | Clarify API contract | `docs/api_spec.yaml` | M | true | high | api_spec\|contract | Contract approved |
Orchestrator actions:
- Parse
SpawnCandidatesfrom completed task reports -
Deduplicate by
dedup_key(prevents duplicate work) - If
needs_user_input=falseandseverity != critical→ auto-spawn - If
needs_user_input=true→ Add to blocked queue - Run new tasks through planner and dispatch
Spawn protection: Budget limits (default: 20 auto-spawns per track) prevent infinite loops.
4. Cost Tracking & Model Selection
Model selection rules:
| Model | Use Case | Cost (per 1M tokens) |
|---|---|---|
| Haiku | Read-only analysis | $0.25 / $1.25 |
| Sonnet | Standard implementation | $3.00 / $15.00 |
| Opus | Architecture, security | $15.00 / $75.00 |
Budget controls:
- Default limit: $10.00 per track
- ⚠️ 80% usage → Warning
- 🛑 100% usage → Pause execution
Check current spend:
## Cost Tracking (from iosm_state.md)
- budget_total: $10.00
- spent_so_far: $6.50
- remaining: $3.50
Real-World Use Cases
1. Greenfield Feature (Email Notifications)
Task: Add complete email notification system to SaaS app
Plan:
- T01: Design email templates (Architect, foreground)
- T02: Implement SMTP service (Implementer-A, background)
- T03: Add queue system (Implementer-B, background, parallel with T02)
- T04: Write integration tests (TestRunner, background, after T02+T03)
- T05: Add API endpoints (Implementer-C, background, after T02)
Results:
- ⚡ ~3x faster (4-6h parallel vs 12-15h sequential)
- ✅ 100% test coverage (Gate-O enforcement)
- 📉 Minimal technical debt (Gate-I: 0.92)
2. Brownfield Refactoring (Payment Module)
Task: Refactor legacy payment processing (5000+ LOC, 3 years old)
Workflow:
- Plan mode: Explorer analyzes codebase (read-only, safe)
- PRD with rollback strategy
- Comprehensive regression tests (before touching code)
- Parallel implementation (2 modules refactored simultaneously)
- Gate-M fails: Circular dependency detected
- Auto-spawn: "Break circular import between Payment and Invoice"
- Re-check Gate-M: Pass ✅
Results:
- 🎯 Gate-driven quality — Forced resolution of hidden issues
- 🔒 Safe refactor — All tests passing before merge
- 📊 Measured improvement — 40% reduction in module coupling
3. Multi-Module Feature (Multi-Tenant Architecture)
Task: Add multi-tenancy (affects 8 modules)
Plan: 20+ tasks across 5 waves
- Wave 1: T01 Design schema (Architect, critical path)
- Wave 2: T02-T04 Database migrations (3 parallel implementers)
- Wave 3: T05-T10 Update 6 modules (6 parallel implementers)
- Wave 4: T11-T15 Tests (5 parallel test runners)
- Wave 5: T16 Integration
Auto-spawn: 3 critical tasks discovered during execution
Results:
- 📈 High parallelism — 6 modules updated simultaneously
- 💰 Budget control — $6.50 spent (within $10 limit)
- ⏱️ Time savings — ~18h parallel vs 60h+ sequential
Getting Started (5 Minutes)
Installation
# Clone into Claude Code skills directory
git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm
Verify: type /swarm-iosm in Claude Code.
Create Your First Track
/swarm-iosm new-track "Add user authentication with JWT"
Claude will:
- Ask questions (mode: greenfield/brownfield, priorities, constraints)
- Generate PRD (10 sections)
- Create
plan.mdwith task breakdown - Show orchestration plan
Execute
/swarm-iosm implement
Watch the magic:
- Parallel agents launch automatically
- Progress tracked in
iosm_state.md - Reports appear in
reports/directory
Integrate
/swarm-iosm integrate <track-id>
Quality gates run automatically. You get iosm_report.md with pass/fail.
Commands Reference
| Command | Description |
|---|---|
/swarm-iosm setup |
Initialize project context |
/swarm-iosm new-track "<desc>" |
Create feature track |
/swarm-iosm implement |
Execute plan (auto mode) |
/swarm-iosm status |
Check progress |
/swarm-iosm watch |
Live monitoring (v1.3) |
/swarm-iosm simulate |
Dry-run with timeline (v1.3) |
/swarm-iosm resume |
Resume after crash (v1.3) |
/swarm-iosm retry <task-id> |
Retry failed task (v1.2) |
/swarm-iosm integrate <id> |
Merge and run gates |
What Swarm-IOSM is NOT
To set clear expectations:
- ❌ Not a general-purpose workflow engine — Designed specifically for Claude Code agent orchestration
- ❌ Not a replacement for CI/CD — Complements your pipeline, doesn't replace it
- ❌ Not a code generator "autopilot" — Requires human oversight and decision-making
- ❌ Not safe to run unattended on production repos — Always review changes before merge
Architecture Overview
┌──────────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (Main Claude Agent) │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Continuous Dispatch Loop (v1.1+) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Collect │→ │ Classify │→ │ Conflict │→ │ Dispatch Batch │ │ │
│ │ │ Ready │ │ Modes │ │ Check │ │ (max 3-6 tasks) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │ │
│ │ ↑ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ↓ │ │
│ │ └────────│ IOSM │←─│ Auto- │←────────┘ │ │
│ │ │ Gates │ │ Spawn │ │ │
│ │ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ ↓ ↓ ↓ │
│ ┌────────────────────┐ ┌────────────────────┐ ┌─────────────────┐ │
│ │ Subagent (BG) │ │ Subagent (BG) │ │ Subagent (FG) │ │
│ │ Explorer │ │ Implementer-A │ │ Architect │ │
│ │ read-only │ │ write-local │ │ needs_user │ │
│ └────────────────────┘ └────────────────────┘ └─────────────────┘ │
│ │ │ │ │
│ ↓ ↓ ↓ │
│ reports/T01.md reports/T02.md reports/T03.md │
│ + SpawnCandidates + SpawnCandidates + Escalations │
└──────────────────────────────────────────────────────────────────────┘
IOSM Framework Integration
Swarm-IOSM implements the IOSM methodology (Improve → Optimize → Shrink → Modularize) as an executable system:
┌────────────────────────────────────────────────────────────────────────────┐
│ IOSM FRAMEWORK │
│ https://github.com/rokoss21/IOSM │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ IMPROVE │ → │ OPTIMIZE │ → │ SHRINK │ → │ MODULARIZE │ │
│ │ │ │ │ │ │ │ │ │
│ │ Clarity │ │ Speed │ │ Simplify │ │ Decompose │ │
│ │ No dups │ │ Resil. │ │ Surface │ │ Contracts │ │
│ │ Invars │ │ Chaos │ │ Deps │ │ Coupling │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────────┬─────────┘ │
│ │ │ │ │ │
│ ┌────▼─────┐ ┌────▼─────┐ ┌────▼─────┐ ┌────────▼─────────┐ │
│ │ Gate-I │ │ Gate-O │ │ Gate-S │ │ Gate-M │ │
│ │ ≥0.85 │ │ ≥0.75 │ │ ≥0.80 │ │ ≥0.80 │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
│ │
│ IOSM-Index = (Gate-I + Gate-O + Gate-S + Gate-M) / 4 │
│ Production threshold: ≥ 0.80 │
└────────────────────────────────────────────────────────────────────────────┘
Version History
v2.1 (2026-01-19) — Current
- Automated State Management (
iosm_state.mdauto-generated) - Status Sync CLI (
--update-task) - Improved Report Conflict Detection
v2.0 (2026-01-18)
- Inter-Agent Communication (
shared_context.md) - Task Dependency Visualization (
--graph) - Anti-Pattern Detection
- Template Customization
v1.3 (2026-01-17)
- Simulation Mode (
/swarm-iosm simulate) with ASCII Timeline - Live Monitoring (
/swarm-iosm watch) - Checkpointing & Resume (
/swarm-iosm resume)
v1.2 (2026-01-16)
- Concurrency Limits (Resource Budgets)
- Cost Tracking & Model Selection (Haiku/Sonnet/Opus)
- Intelligent Error Diagnosis & Retry (
/swarm-iosm retry)
v1.1 (2026-01-15)
- Continuous Dispatch Loop (no wave barriers)
- Gate-Driven Continuation
- Auto-Spawn from SpawnCandidates
- Touches Lock Manager
Contributing
We welcome contributions! Key areas:
- Gate Automation Scripts — Measure IOSM criteria automatically
- CI/CD Integration — GitHub Actions, GitLab CI examples
- Language-Specific Checkers — Python, TypeScript, Rust evaluators
- More Examples — Real-world track demonstrations
- IDE Integration — VS Code extension
See CONTRIBUTING.md for guidelines.
Conclusion
Swarm-IOSM proves that AI agent orchestration can be both fast (3-8x speedup through parallelism) and safe (quality gates before merge).
The continuous dispatch model eliminates artificial wave barriers, file lock management prevents conflicts, and IOSM gates enforce production-grade standards.
Key takeaway: Don't choose between speed and quality. With proper orchestration, you get both.
Try it today:
git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm
/swarm-iosm new-track "Your next feature"
Links
- Repository: github.com/rokoss21/swarm-iosm
- IOSM Methodology: github.com/rokoss21/IOSM
-
Author: Emil Rokossovskiy
- Email: ecsiar@gmail.com
- Web: rokoss21.tech
-
Related Projects:
- FACET Standard — Deterministic Contract Layer for AI
- FACET Compiler — Reference Implementation (Rust)
- AstroVisor.io — Production IOSM Case Study
Questions? Ideas? Issues?
Built with ⚡ by @rokoss21 | IOSM: Improve → Optimize → Shrink → Modularize
Top comments (0)