Gantz AI for Gantz

Posted on Jan 5

Single Agent vs Agent Swarm: How to Choose

Do you build one agent that does everything?

Or many specialized agents that talk to each other?

The same debate that haunts backend architecture haunts agents too.

Here's how to choose.

The monolith agent

One agent. One system prompt. All capabilities.

class MonolithAgent:
    def __init__(self):
        self.tools = [
            read_file,
            write_file,
            search_code,
            run_tests,
            query_database,
            send_email,
            create_ticket,
            deploy,
            monitor_metrics,
            # ... 30 more tools
        ]

        self.system_prompt = """
You are an all-in-one development assistant.
You can:
- Read and write code
- Run tests and debug
- Query databases
- Send notifications
- Deploy applications
- Monitor systems
...
"""

    def run(self, user_request):
        return agent_loop(self.system_prompt, self.tools, user_request)

One agent handles everything.

The microservices approach

Many specialized agents. Each does one thing well.

# Separate agents
coding_agent = Agent(
    tools=[read_file, write_file, search_code],
    prompt="You are a coding assistant. You read and write code."
)

testing_agent = Agent(
    tools=[run_tests, analyze_coverage, debug],
    prompt="You are a testing specialist. You run and debug tests."
)

database_agent = Agent(
    tools=[query, migrate, backup],
    prompt="You are a database administrator. You manage data."
)

deploy_agent = Agent(
    tools=[build, deploy, rollback],
    prompt="You are a deployment specialist. You ship code safely."
)

# Orchestrator routes to specialists
class Orchestrator:
    def __init__(self):
        self.agents = {
            "coding": coding_agent,
            "testing": testing_agent,
            "database": database_agent,
            "deploy": deploy_agent,
        }

    def route(self, request):
        agent_type = self.classify(request)
        return self.agents[agent_type].run(request)

Comparison

Aspect	Monolith	Microservices
Complexity	Simple	Complex
Context	Shared	Isolated
Latency	Lower	Higher (routing)
Failure blast radius	Everything	One service
Prompt size	Large	Small, focused
Tool confusion	Higher	Lower
Development speed	Faster initially	Faster at scale
Debugging	Harder	Easier (isolated)

When monolith wins

Small scope

Total tools: 5-10
Use cases: Focused (just coding, just support, etc.)
Team size: 1-3 developers

One agent handles it fine. Don't over-engineer.

Highly connected tasks

User: "Read the config, update the database URL, then run migrations"

Monolith: One agent, full context, executes sequentially
Microservices: Config agent → Database agent → Migration agent
              (Context lost between handoffs)

When tasks need shared context, monolith is simpler.

Speed matters

Monolith:
User → Agent → Response
Latency: ~2s

Microservices:
User → Router → Classify → Agent → Response
Latency: ~4s (extra LLM call for routing)

If you need fast responses, avoid the routing overhead.

Early stage

Week 1: Build monolith
Week 2: Ship to users
Week 3: Learn what's actually needed
Week 4: Refactor if necessary

vs.

Week 1-4: Design perfect microservices architecture
Week 5: Ship
Week 6: Realize requirements were wrong

Start monolith. Split when you feel pain.

When microservices win

Many tools cause confusion

# Monolith with 40 tools
tools = [
    read_file, write_file, search_code,    # coding
    run_tests, debug, coverage,            # testing
    query_db, migrate, backup,             # database
    build, deploy, rollback,               # deployment
    send_email, send_slack, create_ticket, # notifications
    # ... 25 more tools
]

# Agent thinks:
# "User wants to 'run the tests'...
#  Should I use run_tests? Or debug? Or coverage?
#  Maybe I should query_db first to check test data?
#  Let me send_slack to ask..."

Too many tools = wrong tool selection.

# Microservice: Testing agent with 3 tools
tools = [run_tests, debug, coverage]

# Agent thinks:
# "User wants to 'run the tests'...
#  I'll use run_tests."

Fewer tools = better decisions.

Different trust levels

# Coding agent: Can read/write user's code
# Allowed: read_file, write_file, search

# Deploy agent: Can push to production
# Requires: Extra confirmation, audit log, approval

# Admin agent: Can modify infrastructure
# Requires: 2FA, senior approval, change ticket

Microservices let you apply different security policies.

Different models for different tasks

# Simple routing: Fast, cheap model
router = Agent(model="haiku", ...)

# Complex coding: Smart, expensive model
coding_agent = Agent(model="opus", ...)

# Simple lookups: Fast model is fine
docs_agent = Agent(model="haiku", ...)

Microservices let you optimize cost per task type.

Independent scaling

Coding requests: 1000/day → needs fast responses
Deploy requests: 10/day → can be slower, more careful

Monolith: Scale everything together
Microservices: Scale coding agent, keep deploy agent minimal

Independent deployment

# Coding agent updated → deploy coding agent only
# Deploy agent unchanged → no risk to deployments

vs.

# Monolith updated → everything changes
# Bug in coding → might affect deployments

Blast radius is smaller.

The hybrid: Monolith with modules

You don't have to go full microservices. Modularize within one agent:

class ModularAgent:
    def __init__(self):
        self.modules = {
            "coding": CodingModule(),
            "testing": TestingModule(),
            "database": DatabaseModule(),
        }

        self.system_prompt = self.build_prompt()

    def build_prompt(self):
        prompt = "You are a development assistant.\n\n"

        # Only include relevant module instructions
        for name, module in self.modules.items():
            prompt += f"## {name.title()}\n{module.instructions}\n\n"

        return prompt

    def get_tools(self, context):
        """Return relevant tools based on context"""
        relevant = self.detect_relevant_modules(context)
        tools = []
        for module_name in relevant:
            tools.extend(self.modules[module_name].tools)
        return tools

    def run(self, request):
        # Dynamic tool selection based on request
        tools = self.get_tools(request)
        return agent_loop(self.system_prompt, tools, request)

One agent, but tool set changes based on context.

The routing problem

If you go microservices, how do you route?

Option 1: Keyword routing

def route(request: str) -> str:
    keywords = {
        "coding": ["code", "file", "function", "bug", "implement"],
        "testing": ["test", "coverage", "debug", "failing"],
        "database": ["query", "database", "sql", "migrate"],
        "deploy": ["deploy", "release", "rollback", "production"],
    }

    request_lower = request.lower()
    for agent, words in keywords.items():
        if any(word in request_lower for word in words):
            return agent

    return "coding"  # default

Fast, no LLM call. But brittle.

Option 2: LLM routing

def route(request: str) -> str:
    response = llm.create(
        model="haiku",  # Fast, cheap
        messages=[{
            "role": "user",
            "content": f"""Classify this request into one category:
- coding: Reading, writing, or modifying code
- testing: Running tests, debugging, coverage
- database: Database queries, migrations
- deploy: Deployment, releases, rollbacks

Request: {request}

Return only the category name."""
        }]
    )
    return response.strip().lower()

More accurate, but adds latency and cost.

Option 3: Embedding routing

from sklearn.metrics.pairwise import cosine_similarity

class EmbeddingRouter:
    def __init__(self):
        # Pre-compute embeddings for each agent type
        self.agent_embeddings = {
            "coding": embed("code files functions implementation"),
            "testing": embed("tests debugging coverage failures"),
            "database": embed("queries sql migrations data"),
            "deploy": embed("deployment release production rollback"),
        }

    def route(self, request: str) -> str:
        request_embedding = embed(request)

        best_match = None
        best_score = -1

        for agent, agent_emb in self.agent_embeddings.items():
            score = cosine_similarity([request_embedding], [agent_emb])[0][0]
            if score > best_score:
                best_score = score
                best_match = agent

        return best_match

No LLM call, reasonably accurate.

Context handoff

The hardest part of microservices: passing context between agents.

Bad: No context

User: "Check if the users table has an email column"
Database Agent: "Yes, it has an email column."

User: "Great, now write code to validate emails"
Coding Agent: "What users table? What email column?"

Context lost between agents.

Better: Explicit handoff

class Orchestrator:
    def __init__(self):
        self.context = {}

    def run(self, request):
        agent = self.route(request)

        # Pass accumulated context
        result = agent.run(request, context=self.context)

        # Capture context from result
        self.context.update(result.get("context", {}))

        return result["response"]

Best: Shared memory

class SharedMemory:
    def __init__(self, redis_client):
        self.redis = redis_client

    def store(self, session_id: str, key: str, value: str):
        self.redis.hset(f"session:{session_id}", key, value)

    def get(self, session_id: str, key: str) -> str:
        return self.redis.hget(f"session:{session_id}", key)

    def get_all(self, session_id: str) -> dict:
        return self.redis.hgetall(f"session:{session_id}")

# All agents read/write to shared memory
class Agent:
    def __init__(self, memory: SharedMemory):
        self.memory = memory

    def run(self, request, session_id):
        # Get shared context
        context = self.memory.get_all(session_id)

        # Do work
        result = self.execute(request, context)

        # Store discoveries
        for key, value in result.discoveries.items():
            self.memory.store(session_id, key, value)

        return result.response

Migration path

Start simple, split when needed:

Stage 1: Monolith
- One agent, all tools
- Ship fast, learn

Stage 2: Monolith with modules
- Group tools logically
- Dynamic tool loading
- Still one agent

Stage 3: Extract high-value specialists
- Keep monolith for most tasks
- Extract: deploy agent (needs extra safety)
- Extract: database agent (needs audit)

Stage 4: Full microservices (if needed)
- Router + specialists
- Only if scale/complexity demands it

Most projects never need Stage 4.

With Gantz

Gantz Run naturally supports the modular approach - group tools into focused configs:

# gantz-coding.yaml
name: coding-tools
tools:
  - name: read
    description: Read a file
    script:
      shell: cat "{{path}}"

  - name: write
    description: Write to a file
    script:
      shell: echo "{{content}}" > "{{path}}"

  - name: search
    description: Search code
    script:
      shell: rg "{{query}}" . --max-count=20

# gantz-testing.yaml
name: testing-tools
tools:
  - name: run_tests
    description: Run test suite
    script:
      shell: npm test

  - name: coverage
    description: Check test coverage
    script:
      shell: npm run coverage

Run different MCP servers for different agent specializations. Your client decides which to connect.

Summary

Situation	Recommendation
< 10 tools	Monolith
Early stage, learning	Monolith
Tightly coupled tasks	Monolith
Latency critical	Monolith
> 20 tools	Consider splitting
Different security needs	Microservices
Different model needs	Microservices
Independent scaling	Microservices
Large team	Microservices

The rule: Start monolith. Split when you feel the pain.

Don't architect for problems you don't have.

Are your agents monoliths or microservices? What drove the decision?

DEV Community