rokoss21

Posted on Jan 18

Parallel Agents Are Easy. Shipping Without Chaos Isn’t.

#webdev #programming #ai #opensource

Introducing Swarm-IOSM — a Parallel Subagent Orchestration Engine for Claude Code

Everyone is building multi-agent workflows now.

Swarm prompts. Agent teams. Tool calling. “Auto-developers”.

And yet… most of them collapse the moment you try to use them on real codebases.

Not because the models can’t code.

Because parallel development has two hard problems that prompt-chains don’t solve:

Safe concurrency (two agents writing into the same file is not “parallelism”, it’s a race condition)
Stop conditions (how do you know the result is shippable, not just “it ran”)

I built Swarm-IOSM to turn agent orchestration into an engineering discipline:
locks, dispatch scheduling, gates, and anti-chaos rules — executable, repeatable, and production-oriented.

GitHub: https://github.com/rokoss21/swarm-iosm

The Hidden Failure Mode of “Agent Swarms”

Here’s the truth nobody wants to say out loud:

Most “agent swarms” are just concurrency without a correctness model.

They don’t fail spectacularly. They fail quietly:

Agent A fixes a bug and touches auth.py
Agent B adds a feature and also touches auth.py
You merge both and discover behavior drift
The PR looks large, architecture degrades, confidence drops
Then the swarm spawns more tasks to “fix” the mess
Congratulations, you built a self-replicating backlog generator

The root cause is simple:

“Parallel agents” ≠ Parallel development

Parallel development requires conflict prevention, not conflict resolution.

Swarm-IOSM: IOSM Methodology + Execution Engine

IOSM is the methodology:

Improve → Optimize → Shrink → Modularize
A disciplined loop that forces engineering quality to remain measurable, not performative.

Swarm-IOSM is the execution engine:

PRD-driven decomposition
Continuous dispatch scheduling
File-conflict prevention via lock discipline
Auto-spawn protocol for discoveries
Quality gates as stop conditions

It’s not “a prompt”.

It’s a workflow runtime for parallel software development inside Claude Code.

The Architecture: An Orchestrator That Does Not Implement

Swarm-IOSM is intentionally designed around one rule:

The Orchestrator does NOT implement.

The main agent coordinates only.

All implementation work happens in subagents, each producing a report.

This is not a style preference — it’s a safety boundary.

When the orchestrator writes code, it stops being a scheduler and becomes “yet another contributor”, losing global coordination ability.

So Swarm-IOSM splits responsibilities cleanly:

Orchestrator = scheduling + gates + conflict check + state tracking
Subagents = execution + reports + spawn candidates

The Core Engine: Continuous Dispatch (No Wave Barriers)

Most orchestration frameworks work like this:

Prepare plan → run wave 1 → wait → run wave 2 → wait → merge

That’s not how software work actually flows.

Reality is continuous: tasks unblock tasks every minute.

Swarm-IOSM implements continuous dispatch scheduling:

tasks move through states: backlog → ready → running → done
as soon as dependencies are satisfied, tasks are eligible to run
you dispatch ready tasks immediately (no waiting for a “wave boundary”)

This is what makes it feel fast.

It maximizes parallelism without turning the repo into a battlefield.

The Missing Primitive: “Touches” Lock Manager

This is the centerpiece.

Swarm-IOSM treats a codebase like a shared memory system.

If agents are threads, then files are memory regions.

So Swarm introduces a primitive that classic “agent swarms” ignore:

Touches = the set of files/folders a task may modify.

Each task declares:

Touches: auth.py, services/auth/
Concurrency class:
- read-only (no locks, always safe)
- write-local (lock only touches)
- write-shared (exclusive, sequential)

Then Swarm enforces locks:

folder lock blocks everything inside it
file lock blocks only that file
read-only tasks remain parallel always

Result:

✅ real parallelism
✅ predictable merges
✅ no random collisions “because agent decided to edit config too”

Auto-Spawn… Without Infinite Task Proliferation

Auto-spawn sounds cool until you actually run it.

A naive swarm will spawn tasks forever.

Swarm-IOSM forces auto-spawn to be bounded and deduplicated:

spawn budget total
per-gate budgets
dedup key: <primary_touch>|<intent_category>
severity thresholds
anti-loop counters (max iterations without progress)

This is what transforms “agent creativity” into something you can safely run in an engineering process.

IOSM Gates: Stop Conditions That Mean Something

Most systems “stop” when tasks finish.

Swarm-IOSM stops when quality is achieved.

It tracks four gate families:

Gate-I (Improve)

Clarity, invariants, low duplication.

Gate-O (Optimize)

Latency budget, error budget, chaos checks, no obvious inefficiencies.

Gate-S (Shrink)

Surface area reduction, dependency stability, onboarding time.

Gate-M (Modularize)

Contracts, coupling limits, no circular dependencies.

Swarm is not just “agents executing tasks”.

It’s agents executing tasks until the system crosses a production threshold.

Quick Start (The Happy Path)

Swarm-IOSM lives here:

https://github.com/rokoss21/swarm-iosm

1) Install as a Claude Code skill

Project-level:

git clone https://github.com/rokoss21/swarm-iosm.git .claude/skills/swarm-iosm

User-level:

git clone https://github.com/rokoss21/swarm-iosm.git ~/.claude/skills/swarm-iosm

2) Initialize project context

/swarm-iosm setup

3) Create a feature track

/swarm-iosm new-track "Add user authentication with JWT"

Swarm generates PRD + plan and returns a track id like:

2026-01-17-001

4) Validate & generate a continuous dispatch plan

python .claude/skills/swarm-iosm/scripts/orchestration_planner.py \
  swarm/tracks/<track-id>/plan.md --validate

python .claude/skills/swarm-iosm/scripts/orchestration_planner.py \
  swarm/tracks/<track-id>/plan.md --continuous

5) Execute

/swarm-iosm implement

6) Integrate

/swarm-iosm integrate <track-id>

This produces integration artifacts and quality gate reporting.

Why This Is Different From “Yet Another Agent Framework”

This part matters.

Swarm-IOSM doesn’t compete with “prompt frameworks” by being smarter.

It wins by being stricter.

Swarm-IOSM treats a repo as a concurrency system.

Locks are not optional.

Swarm-IOSM treats quality as a stop condition.

No gates = no ship.

Swarm-IOSM treats spawn as a budgeted resource.

Infinite loops are a design bug, not “agent autonomy”.

You can replace models, providers, or toolchains.

But you can’t replace engineering discipline with vibes.

Real-World Fit: Where Swarm-IOSM Shines

Use Swarm-IOSM when:

multi-file features require coordination
brownfield refactoring needs guardrails
parallel implementation streams are valuable
acceptance criteria must exist (not “it compiles”)

Avoid Swarm-IOSM when:

it’s a single-file change
you want quick fixes without planning
you’re doing purely exploratory research

A hammer is not a screwdriver.

A swarm is not a substitute for architecture.

The Meta-Point: This Is Part of a Bigger Stack

I’m building a full deterministic engineering ecosystem around AI systems:

IOSM = methodology layer
Swarm-IOSM = execution/orchestration layer
FACET = deterministic contract layer for AI behavior

If you’ve read my FACET articles, you already know the thesis:

We don’t need “more prompting”.
We need engineering primitives: contracts, determinism, orchestration rules, replayable artifacts.

Swarm-IOSM is exactly that philosophy applied to parallel agent development.

Closing Thoughts

Parallel agents are not the hard part.

The hard part is shipping without chaos:

no file conflicts
no accidental coupling
no architecture collapse
no infinite spawn loops
gates that enforce engineering quality

Swarm-IOSM is my answer to that.

If you’re using Claude Code and you’ve ever tried to scale beyond a single agent — try it: