DEV Community

Cover image for Single Agent vs Agent Swarm: How to Choose
Gantz AI for Gantz

Posted on

Single Agent vs Agent Swarm: How to Choose

Do you build one agent that does everything?

Or many specialized agents that talk to each other?

The same debate that haunts backend architecture haunts agents too.

Here's how to choose.

The monolith agent

One agent. One system prompt. All capabilities.

class MonolithAgent:
    def __init__(self):
        self.tools = [
            read_file,
            write_file,
            search_code,
            run_tests,
            query_database,
            send_email,
            create_ticket,
            deploy,
            monitor_metrics,
            # ... 30 more tools
        ]

        self.system_prompt = """
You are an all-in-one development assistant.
You can:
- Read and write code
- Run tests and debug
- Query databases
- Send notifications
- Deploy applications
- Monitor systems
...
"""

    def run(self, user_request):
        return agent_loop(self.system_prompt, self.tools, user_request)
Enter fullscreen mode Exit fullscreen mode

One agent handles everything.

The microservices approach

Many specialized agents. Each does one thing well.

# Separate agents
coding_agent = Agent(
    tools=[read_file, write_file, search_code],
    prompt="You are a coding assistant. You read and write code."
)

testing_agent = Agent(
    tools=[run_tests, analyze_coverage, debug],
    prompt="You are a testing specialist. You run and debug tests."
)

database_agent = Agent(
    tools=[query, migrate, backup],
    prompt="You are a database administrator. You manage data."
)

deploy_agent = Agent(
    tools=[build, deploy, rollback],
    prompt="You are a deployment specialist. You ship code safely."
)

# Orchestrator routes to specialists
class Orchestrator:
    def __init__(self):
        self.agents = {
            "coding": coding_agent,
            "testing": testing_agent,
            "database": database_agent,
            "deploy": deploy_agent,
        }

    def route(self, request):
        agent_type = self.classify(request)
        return self.agents[agent_type].run(request)
Enter fullscreen mode Exit fullscreen mode

Comparison

Aspect Monolith Microservices
Complexity Simple Complex
Context Shared Isolated
Latency Lower Higher (routing)
Failure blast radius Everything One service
Prompt size Large Small, focused
Tool confusion Higher Lower
Development speed Faster initially Faster at scale
Debugging Harder Easier (isolated)

When monolith wins

Small scope

Total tools: 5-10
Use cases: Focused (just coding, just support, etc.)
Team size: 1-3 developers
Enter fullscreen mode Exit fullscreen mode

One agent handles it fine. Don't over-engineer.

Highly connected tasks

User: "Read the config, update the database URL, then run migrations"

Monolith: One agent, full context, executes sequentially
Microservices: Config agent → Database agent → Migration agent
              (Context lost between handoffs)
Enter fullscreen mode Exit fullscreen mode

When tasks need shared context, monolith is simpler.

Speed matters

Monolith:
User → Agent → Response
Latency: ~2s

Microservices:
User → Router → Classify → Agent → Response
Latency: ~4s (extra LLM call for routing)
Enter fullscreen mode Exit fullscreen mode

If you need fast responses, avoid the routing overhead.

Early stage

Week 1: Build monolith
Week 2: Ship to users
Week 3: Learn what's actually needed
Week 4: Refactor if necessary

vs.

Week 1-4: Design perfect microservices architecture
Week 5: Ship
Week 6: Realize requirements were wrong
Enter fullscreen mode Exit fullscreen mode

Start monolith. Split when you feel pain.

When microservices win

Many tools cause confusion

# Monolith with 40 tools
tools = [
    read_file, write_file, search_code,    # coding
    run_tests, debug, coverage,            # testing
    query_db, migrate, backup,             # database
    build, deploy, rollback,               # deployment
    send_email, send_slack, create_ticket, # notifications
    # ... 25 more tools
]

# Agent thinks:
# "User wants to 'run the tests'...
#  Should I use run_tests? Or debug? Or coverage?
#  Maybe I should query_db first to check test data?
#  Let me send_slack to ask..."
Enter fullscreen mode Exit fullscreen mode

Too many tools = wrong tool selection.

# Microservice: Testing agent with 3 tools
tools = [run_tests, debug, coverage]

# Agent thinks:
# "User wants to 'run the tests'...
#  I'll use run_tests."
Enter fullscreen mode Exit fullscreen mode

Fewer tools = better decisions.

Different trust levels

# Coding agent: Can read/write user's code
# Allowed: read_file, write_file, search

# Deploy agent: Can push to production
# Requires: Extra confirmation, audit log, approval

# Admin agent: Can modify infrastructure
# Requires: 2FA, senior approval, change ticket
Enter fullscreen mode Exit fullscreen mode

Microservices let you apply different security policies.

Different models for different tasks

# Simple routing: Fast, cheap model
router = Agent(model="haiku", ...)

# Complex coding: Smart, expensive model
coding_agent = Agent(model="opus", ...)

# Simple lookups: Fast model is fine
docs_agent = Agent(model="haiku", ...)
Enter fullscreen mode Exit fullscreen mode

Microservices let you optimize cost per task type.

Independent scaling

Coding requests: 1000/day → needs fast responses
Deploy requests: 10/day → can be slower, more careful

Monolith: Scale everything together
Microservices: Scale coding agent, keep deploy agent minimal
Enter fullscreen mode Exit fullscreen mode

Independent deployment

# Coding agent updated → deploy coding agent only
# Deploy agent unchanged → no risk to deployments

vs.

# Monolith updated → everything changes
# Bug in coding → might affect deployments
Enter fullscreen mode Exit fullscreen mode

Blast radius is smaller.

The hybrid: Monolith with modules

You don't have to go full microservices. Modularize within one agent:

class ModularAgent:
    def __init__(self):
        self.modules = {
            "coding": CodingModule(),
            "testing": TestingModule(),
            "database": DatabaseModule(),
        }

        self.system_prompt = self.build_prompt()

    def build_prompt(self):
        prompt = "You are a development assistant.\n\n"

        # Only include relevant module instructions
        for name, module in self.modules.items():
            prompt += f"## {name.title()}\n{module.instructions}\n\n"

        return prompt

    def get_tools(self, context):
        """Return relevant tools based on context"""
        relevant = self.detect_relevant_modules(context)
        tools = []
        for module_name in relevant:
            tools.extend(self.modules[module_name].tools)
        return tools

    def run(self, request):
        # Dynamic tool selection based on request
        tools = self.get_tools(request)
        return agent_loop(self.system_prompt, tools, request)
Enter fullscreen mode Exit fullscreen mode

One agent, but tool set changes based on context.

The routing problem

If you go microservices, how do you route?

Option 1: Keyword routing

def route(request: str) -> str:
    keywords = {
        "coding": ["code", "file", "function", "bug", "implement"],
        "testing": ["test", "coverage", "debug", "failing"],
        "database": ["query", "database", "sql", "migrate"],
        "deploy": ["deploy", "release", "rollback", "production"],
    }

    request_lower = request.lower()
    for agent, words in keywords.items():
        if any(word in request_lower for word in words):
            return agent

    return "coding"  # default
Enter fullscreen mode Exit fullscreen mode

Fast, no LLM call. But brittle.

Option 2: LLM routing

def route(request: str) -> str:
    response = llm.create(
        model="haiku",  # Fast, cheap
        messages=[{
            "role": "user",
            "content": f"""Classify this request into one category:
- coding: Reading, writing, or modifying code
- testing: Running tests, debugging, coverage
- database: Database queries, migrations
- deploy: Deployment, releases, rollbacks

Request: {request}

Return only the category name."""
        }]
    )
    return response.strip().lower()
Enter fullscreen mode Exit fullscreen mode

More accurate, but adds latency and cost.

Option 3: Embedding routing

from sklearn.metrics.pairwise import cosine_similarity

class EmbeddingRouter:
    def __init__(self):
        # Pre-compute embeddings for each agent type
        self.agent_embeddings = {
            "coding": embed("code files functions implementation"),
            "testing": embed("tests debugging coverage failures"),
            "database": embed("queries sql migrations data"),
            "deploy": embed("deployment release production rollback"),
        }

    def route(self, request: str) -> str:
        request_embedding = embed(request)

        best_match = None
        best_score = -1

        for agent, agent_emb in self.agent_embeddings.items():
            score = cosine_similarity([request_embedding], [agent_emb])[0][0]
            if score > best_score:
                best_score = score
                best_match = agent

        return best_match
Enter fullscreen mode Exit fullscreen mode

No LLM call, reasonably accurate.

Context handoff

The hardest part of microservices: passing context between agents.

Bad: No context

User: "Check if the users table has an email column"
Database Agent: "Yes, it has an email column."

User: "Great, now write code to validate emails"
Coding Agent: "What users table? What email column?"
Enter fullscreen mode Exit fullscreen mode

Context lost between agents.

Better: Explicit handoff

class Orchestrator:
    def __init__(self):
        self.context = {}

    def run(self, request):
        agent = self.route(request)

        # Pass accumulated context
        result = agent.run(request, context=self.context)

        # Capture context from result
        self.context.update(result.get("context", {}))

        return result["response"]
Enter fullscreen mode Exit fullscreen mode

Best: Shared memory

class SharedMemory:
    def __init__(self, redis_client):
        self.redis = redis_client

    def store(self, session_id: str, key: str, value: str):
        self.redis.hset(f"session:{session_id}", key, value)

    def get(self, session_id: str, key: str) -> str:
        return self.redis.hget(f"session:{session_id}", key)

    def get_all(self, session_id: str) -> dict:
        return self.redis.hgetall(f"session:{session_id}")

# All agents read/write to shared memory
class Agent:
    def __init__(self, memory: SharedMemory):
        self.memory = memory

    def run(self, request, session_id):
        # Get shared context
        context = self.memory.get_all(session_id)

        # Do work
        result = self.execute(request, context)

        # Store discoveries
        for key, value in result.discoveries.items():
            self.memory.store(session_id, key, value)

        return result.response
Enter fullscreen mode Exit fullscreen mode

Migration path

Start simple, split when needed:

Stage 1: Monolith
- One agent, all tools
- Ship fast, learn

Stage 2: Monolith with modules
- Group tools logically
- Dynamic tool loading
- Still one agent

Stage 3: Extract high-value specialists
- Keep monolith for most tasks
- Extract: deploy agent (needs extra safety)
- Extract: database agent (needs audit)

Stage 4: Full microservices (if needed)
- Router + specialists
- Only if scale/complexity demands it
Enter fullscreen mode Exit fullscreen mode

Most projects never need Stage 4.

With Gantz

Gantz Run naturally supports the modular approach - group tools into focused configs:

# gantz-coding.yaml
name: coding-tools
tools:
  - name: read
    description: Read a file
    script:
      shell: cat "{{path}}"

  - name: write
    description: Write to a file
    script:
      shell: echo "{{content}}" > "{{path}}"

  - name: search
    description: Search code
    script:
      shell: rg "{{query}}" . --max-count=20
Enter fullscreen mode Exit fullscreen mode
# gantz-testing.yaml
name: testing-tools
tools:
  - name: run_tests
    description: Run test suite
    script:
      shell: npm test

  - name: coverage
    description: Check test coverage
    script:
      shell: npm run coverage
Enter fullscreen mode Exit fullscreen mode

Run different MCP servers for different agent specializations. Your client decides which to connect.

Summary

Situation Recommendation
< 10 tools Monolith
Early stage, learning Monolith
Tightly coupled tasks Monolith
Latency critical Monolith
> 20 tools Consider splitting
Different security needs Microservices
Different model needs Microservices
Independent scaling Microservices
Large team Microservices

The rule: Start monolith. Split when you feel the pain.

Don't architect for problems you don't have.


Are your agents monoliths or microservices? What drove the decision?

Top comments (0)