Multi-Agent Systems in Production: When One Agent Isn't Enough and How We Coordinate Them

#ai #agents #architecture #python

We built our first multi-agent system by accident. We had a single agent handling document analysis for a client — extract data, validate it, write a summary, trigger a follow-up action. It worked in demos. In production, it hallucinated its way through the validation step roughly 15% of the time, because one context window was doing too much and losing the thread. The fix wasn't better prompting. It was splitting the work across agents that each had one job.

That experience shaped how we think about multi-agent architecture now. Not as something exotic, but as the natural answer to a specific set of problems: when a task is too complex for a single context, when different subtasks need different models or tools, or when you need parallelism and one agent is a bottleneck.

The Signal That You Need More Than One Agent

A single agent starts failing in predictable ways. You'll notice the LLM losing track of early context by the time it reaches a later step. You'll see a validation step get skipped or half-done because the agent prioritised completing the previous task. You'll hit context limits on long-running workflows.

The cleaner framing: if you're writing a prompt with more than three distinct roles in it ("you are a researcher, and also a critic, and also a summariser"), you probably need three agents.

We apply a simple test. If we can describe the workflow as a sequence of handoffs — agent A produces output X, agent B takes X and produces Y — we build it as multiple agents. If it's a single stream of reasoning with tool calls, one agent is fine.

How We Structure Agent Pipelines in Django

Our typical multi-agent setup runs on Celery for orchestration, with each agent as a separate task. The orchestrator agent decides what to run and in what order; the worker agents execute. Here's a stripped-down version of the pattern:

# tasks.py
from celery import shared_task
from .agents import ResearchAgent, ValidationAgent, SummaryAgent

@shared_task
def run_document_pipeline(document_id: str) -> dict:
    """Orchestrator: runs the full multi-agent pipeline."""
    document = Document.objects.get(id=document_id)

    # Step 1: Extract structured data
    research_result = research_agent_task.delay(document.content)
    extracted = research_result.get(timeout=60)

    # Step 2: Validate (separate agent, fresh context)
    validation_result = validation_agent_task.delay(extracted)
    validated = validation_result.get(timeout=30)

    if not validated["is_valid"]:
        raise ValueError(f"Validation failed: {validated['reason']}")

    # Step 3: Generate summary
    summary = summary_agent_task.delay(validated["data"])
    return summary.get(timeout=45)


@shared_task
def research_agent_task(content: str) -> dict:
    agent = ResearchAgent()
    return agent.run(content)


@shared_task
def validation_agent_task(data: dict) -> dict:
    agent = ValidationAgent()
    return agent.run(data)

Each agent class wraps its own system prompt and model config. The key constraint: agents don't share context. Agent B gets only what agent A explicitly returns — not the full conversation history. This is deliberate. It prevents error propagation and keeps each agent's prompt focused.

Handling Failures Without Cascading Collapse

The failure modes in multi-agent systems are different from single-agent ones. An individual agent can fail silently — returning something plausible but wrong — and the downstream agent has no way to know.

We handle this with explicit validation contracts between agents. Before any agent hands off to the next, the output is schema-validated. We use Pydantic for this:

from pydantic import BaseModel, ValidationError
from typing import Optional

class ExtractionOutput(BaseModel):
    company_name: str
    revenue_figure: float
    reporting_period: str
    confidence_score: float
    raw_excerpt: Optional[str] = None

def research_agent_task(content: str) -> dict:
    agent = ResearchAgent()
    raw_output = agent.run(content)

    try:
        validated = ExtractionOutput(**raw_output)
        return validated.model_dump()
    except ValidationError as e:
        # Log and retry with a clarification prompt
        logger.error(f"Research agent output failed validation: {e}")
        refined = agent.run_with_clarification(content, str(e))
        return ExtractionOutput(**refined).model_dump()

If validation fails after retry, the task raises and Celery handles the retry at the task level with exponential backoff. We never silently pass bad data downstream.

Parallel Agents: When Sequence Isn't Required

Not all multi-agent pipelines are sequential. Some tasks can be parallelised. For a client in market research, we run three analysis agents in parallel — one for sentiment, one for entity extraction, one for trend detection — then pass all three outputs to a synthesis agent.

from celery import group

@shared_task
def run_parallel_analysis(article_ids: list[str]) -> dict:
    # Fan out to parallel agents
    analysis_group = group(
        sentiment_agent_task.s(article_id),
        entity_agent_task.s(article_id),
        trend_agent_task.s(article_id),
    for article_id in article_ids)

    results = analysis_group.apply_async().get(timeout=120)

    # Synthesis agent gets all results
    return synthesis_agent_task.delay(results).get(timeout=60)

The synthesis agent's system prompt is built specifically for receiving structured outputs from the three parallel agents. It doesn't need to know how those outputs were generated — just their schemas.

What This Doesn't Solve

Multi-agent systems introduce coordination overhead. You're now managing multiple LLM calls per user request, which adds latency and cost. A pipeline that runs three agents sequentially will be slower than a single agent for simple tasks. We only reach for this pattern when the task genuinely requires it — not as a default architecture.

Debugging is harder. When a result is wrong, the error could have originated in any agent. We address this with structured logging on every agent call (input, output, token count, latency) but it's still more work to trace than a single-agent error.

And multi-agent systems don't fix underlying model quality issues. If your agents are calling a model that can't reliably extract the data you need, splitting into five agents won't help. Get the individual agent working first.

The Honest Summary

Multi-agent systems are the right tool when: a single agent is losing context mid-task, different steps require different tools or expertise, or you need parallelism. In those cases, the explicit handoff boundaries between agents are a feature — they force you to define what "done" means for each step, and make validation explicit rather than hoped-for.

The Django + Celery pattern we use is practical and observable. Each agent is a Celery task with clear inputs and outputs. The orchestrator coordinates, not the framework. And every handoff is schema-validated so errors surface at the boundary, not buried in a final output that looks plausible but isn't.

If you're hitting the limits of single-agent workflows, this is the natural next step.

Lycore builds production AI systems for businesses — including multi-agent pipelines, RAG systems, and AI integrations built on your existing Django or Python stack. Get in touch if you want to talk through your use case.