Lycore Development

Posted on Jun 28

Multi-Agent Systems in Production: When One Agent Isn't Enough and How We Coordinate Them

#python #ai #agents #django

We built our first "multi-agent system" by accident. What started as a single agent that could research a topic, draft a report, check it against source data, and send a summary email had grown into a 2,000-token system prompt and a function list so long that the model kept forgetting tools existed. It wasn't a system — it was a monolith pretending to be intelligent.

Breaking it apart into coordinated agents fixed most of the problems. It also introduced a new category of problems we hadn't thought about. Here's what we actually learned.

When One Agent Is Enough (and When It Isn't)

The temptation to add more agents is real, but the overhead isn't free. Every agent boundary you add is a place where context can get lost, latency increases, and errors compound.

One agent is the right call when:

The task fits in a single LLM context window without crowding
The steps are sequential and each depends heavily on the prior output
You need tight reasoning across all the information (summarising a document, for example)

You need multiple agents when:

A single agent's context window is being maxed out with tool definitions, history, or data
Different steps require genuinely different "personas" or instruction sets (research vs. writing vs. fact-checking)
Steps can run in parallel and the latency saving matters
You want to isolate failure — if the data extraction agent fails, the report-writing agent shouldn't be affected

The key question we ask: Is this one job or a pipeline of jobs? If you'd describe it to a human as "first do X, then Y takes that and does Z", you probably have a pipeline, not a single task.

The Three Patterns We Actually Use

1. Supervisor-Worker

A thin orchestrator agent decides what needs doing, dispatches to specialised worker agents, and stitches the results together. The workers are narrow — they do one thing and don't need to know about the rest of the workflow.

This is our most common pattern. The supervisor's system prompt stays small because it's routing, not reasoning. The workers' prompts can be highly optimised for their specific job.

2. Sequential Pipeline

Each agent's output is the next agent's input. No orchestrator — just a chain. We use this for document processing: extract → chunk → summarise → classify. Each step is independent enough that we can swap out or retrain one without touching the others.

3. Event-Driven Agents

Agents subscribe to events rather than being called directly. An intake agent processes a new customer request and emits an event; a triage agent picks it up, classifies it, and emits another; a response agent drafts the reply. We use this with Celery and Redis when the steps can happen asynchronously and we don't need the full chain to complete before responding to the user.

The Orchestrator in Django + Celery

Here's a simplified version of how we implement the supervisor pattern. The orchestrator Celery task manages the workflow; individual agent tasks do the actual LLM calls.

# tasks/orchestrator.py
from celery import chain, chord
from .agents import extract_data_task, analyse_data_task, draft_report_task

@app.task(bind=True, max_retries=3)
def run_report_pipeline(self, document_id: int, user_id: int):
    """
    Supervisor: extract → analyse → draft, with error isolation at each step.
    """
    try:
        # Build the pipeline as a Celery chain
        pipeline = chain(
            extract_data_task.s(document_id),
            analyse_data_task.s(user_id=user_id),
            draft_report_task.s(user_id=user_id),
        )
        result = pipeline.apply_async()
        return {"pipeline_id": result.id, "status": "started"}

    except Exception as exc:
        # Retry with exponential backoff before giving up
        raise self.retry(exc=exc, countdown=2 ** self.request.retries)

Each agent task is responsible for its own LLM call and its own error handling. The orchestrator doesn't need to know what model each agent uses, or whether agent two calls a tool — it just cares about the shape of the data passing between steps.

Passing State Between Agents Without Losing Your Mind

The naïve approach is to pass the full output of each agent directly into the next. This breaks down fast: LLM outputs are verbose, and feeding 3,000 tokens of analysis into a drafter that only needs 5 key facts wastes tokens and degrades quality.

We use a structured intermediate format — a plain Python dataclass or Pydantic model — as the contract between agents. Each agent's output is validated against this schema before it's passed downstream.

from pydantic import BaseModel
from typing import Optional

class ExtractionResult(BaseModel):
    document_id: int
    key_facts: list[str]          # Max 10 bullet points
    raw_data_summary: str          # Under 500 chars
    confidence_score: float        # 0–1
    extraction_warnings: list[str] # Anything the agent flagged

class AnalysisResult(BaseModel):
    document_id: int
    findings: list[str]
    risk_flags: list[str]
    recommended_actions: list[str]
    analysis_notes: Optional[str] = None

# In the extraction agent task:
@app.task
def extract_data_task(document_id: int) -> dict:
    raw_output = call_llm(
        system="You are a data extraction specialist...",
        user=get_document_text(document_id),
        response_format=ExtractionResult,  # Structured output enforced
    )
    result = ExtractionResult.model_validate(raw_output)
    return result.model_dump()  # Celery serialises as dict

Enforcing the schema at the boundary means your analysis agent never has to guess what the extraction agent gave it. When something breaks, the error is at the boundary where it belongs, not buried three steps later.

Handling Failures in a Multi-Agent Chain

The hardest part of multi-agent systems is failure handling. In a monolithic agent, one failure terminates one task. In a pipeline, a failure in step two means you've wasted step one and need to decide whether to retry from the start or from step two.

Our approach:

Checkpoint results to the database after each step, not just at the end. If step three fails, we can replay from step two's saved output.
Each agent retries independently with a backoff before propagating failure. Most LLM failures are transient.
The orchestrator tracks pipeline state — we have a PipelineRun model with status fields for each step. This lets us resume partial pipelines and gives us visibility into where things are breaking.

# models.py
class PipelineRun(models.Model):
    document = models.ForeignKey(Document, on_delete=models.CASCADE)
    status = models.CharField(max_length=20, default='pending')

    # Checkpointed results per step
    extraction_result = models.JSONField(null=True)
    analysis_result = models.JSONField(null=True)
    draft_result = models.JSONField(null=True)

    # Step-level status
    extraction_status = models.CharField(max_length=20, default='pending')
    analysis_status = models.CharField(max_length=20, default='pending')
    draft_status = models.CharField(max_length=20, default='pending')

    error_detail = models.TextField(blank=True)
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

This makes debugging a failed pipeline actually feasible. You open the admin, find the PipelineRun, see which step failed, and read the error. Without this, you're parsing Celery logs hoping something tells you what happened.

The Honest Summary

Multi-agent architectures solve real problems — context overflow, specialisation, parallelism, and failure isolation. But they introduce coordination overhead that a single agent doesn't have. You're trading simplicity for scalability and resilience.

The things this doesn't solve: it won't fix a poorly designed system prompt on an individual agent, it won't save you if your task decomposition is wrong, and it adds latency. Every agent boundary is a round-trip to an LLM.

Start with one agent. Add a second when you have a clear reason — not because it sounds more impressive. The moment you're debugging why agent three hallucinated because agent two gave it a vague extraction result, you'll appreciate the value of simple.

We run multi-agent pipelines in production for document processing, automated research workflows, and customer triage. They work well, but every one of them started life as a single agent that we only split apart when we had a concrete reason.

Lycore builds production AI systems for businesses — we design and implement multi-agent pipelines, RAG systems, and LLM integrations that hold up in production. Get in touch if you want to talk through your use case.

Top comments (1)

Alex Shev • Jun 28

The coordination layer is where multi-agent systems become real software instead of a chat pattern. I would make roles, handoff format, shared state, cancellation rules, and approval gates explicit. Otherwise parallel agents create motion, but no one owns the final state.