Sahil Singh

Posted on Feb 8 • Originally published at glue.tools

OpenAI Swarm: Lightweight Multi-Agent Coordination for Developer Tools

#devtools #programming #architecture #ai

OpenAI released Swarm as an "educational framework" for multi-agent orchestration. The developer community immediately started building production systems with it.

This is both predictable and concerning. Here's why.

What Swarm Gets Right

Swarm's core insight is that most multi-agent systems are over-engineered. You don't need a complex orchestration layer if your agents can simply hand off to each other.

The API is dead simple:

from swarm import Swarm, Agent

client = Swarm()

triage_agent = Agent(
    name="Triage",
    instructions="Route the user to the right specialist.",
    functions=[transfer_to_code_agent, transfer_to_docs_agent]
)

code_agent = Agent(
    name="Code Analyst",
    instructions="Analyze code and answer questions.",
    functions=[search_codebase, get_file_content]
)

Agent-to-agent handoff via function calls. No message queues, no state machines, no orchestration layer. Just functions returning agents.

For simple workflows — customer support routing, basic task delegation — this is the right level of abstraction.

Where Swarm Falls Short

But developer tools aren't simple workflows. When we built Glue's multi-agent indexing system, we needed:

1. Parallel Execution

Swarm agents execute sequentially. Agent A finishes, hands off to Agent B. For a codebase indexer processing 4,000 files, sequential execution means hours instead of minutes.

Glue runs 6 agents in parallel — symbol extraction, dependency analysis, feature clustering, documentation, architecture mapping, and git history analysis. They share a data layer but don't block each other.

2. Shared State with Consistency

Swarm passes context through conversation history. That works for chat. It doesn't work when Agent 3 needs to read the output of Agent 1 while Agent 1 is still running.

We use a shared PostgreSQL layer where agents write results as they complete. Other agents can read partial results immediately. This is boring database engineering, but it's what makes parallel agents practical.

3. Failure Recovery

If a Swarm agent fails mid-conversation, the whole chain fails. In production, you need:

Per-agent retry logic
Partial result caching (don't re-index 3,999 files because file 4,000 failed)
Graceful degradation (show results from successful agents even if one failed)

4. Cost Control

Swarm doesn't track token usage per agent or provide budgeting. In production, a runaway agent can burn through API credits in minutes. You need per-agent token limits and cost alerting.

The Right Mental Model

Think of agent frameworks on a spectrum:

Use Swarm when: you have 2-3 agents with clear handoff points, sequential execution is fine, and failure recovery isn't critical.
Use LangGraph when: you need conditional routing, cycles, and more structured state management.
Build custom when: you need parallel execution, shared state, cost control, failure recovery, and observability.

What We Use at Glue

Glue's agent system is custom-built because our requirements don't fit any framework:

6 parallel agents processing simultaneously
Shared PostgreSQL state — agents read each other's partial results
Per-agent token budgets — no runaway costs
Incremental processing — only re-index changed files
MCP tool layer — 60+ specialized tools shared across agents

The total orchestration code is ~500 lines. Not because frameworks are bad, but because our specific requirements (parallel, stateful, incremental) don't match any framework's assumptions.

The Takeaway

Swarm is a great teaching tool and a solid choice for simple agent workflows. But if you're building developer tools that process large codebases, need parallel execution, or require production-grade reliability — you'll outgrow it quickly.

The meta-lesson: the best architecture for multi-agent systems is the simplest one that meets your actual requirements. Start with Swarm. Graduate to something more structured when you hit the walls.

Originally published on glue.tools. Glue is the pre-code intelligence platform — paste a ticket, get a battle plan.

DEV Community