Originally published on BlockSimplified — 24 min read
This post is part of my AI Fluency series. We've covered single agents in Module 4; now we're scaling up. Module 5 is about getting multiple agents to work together, which is harder than it sounds.
I remember the moment I realized single agents weren't enough. I had built a research assistant that could search the web, summarize articles, and answer questions. It worked well for simple queries. Then I asked it to "research the AI agent landscape, compare the top 5 frameworks, and write a technical blog post with code examples." It choked. The context window filled up, the output became unfocused, and the code examples were hallucinated garbage.
That's when I started exploring multi-agent systems. The idea is simple: instead of one agent doing everything, you create a team. A researcher agent gathers information. A writer agent crafts the prose. A coder agent handles the technical examples. A reviewer agent catches errors. Each specialist does what it's good at, and together they produce something none could create alone.
Why Single Agents Hit a Wall
Single agents are powerful. With the right tools and prompts, they can do impressive things. The question is where they break down.
Here are the walls I've hit:
Context window limits. Complex tasks pile up information fast: research results, previous outputs, tool responses, conversation history. A single agent running a 10-step workflow runs out of context space before it finishes the job.
Specialization beats generalization. A system prompt can only stretch one agent so far. Ask it to be a world-class researcher AND writer AND coder AND editor, and you get the jack-of-all-trades problem: competent everywhere, excellent nowhere.
No second opinion. A single agent can hallucinate, make logical errors, or drift off-task, and nobody is watching. A second agent that reviews the first one's work catches mistakes that would otherwise slip through.
Parallelization. A single agent works through tasks one at a time. When subtasks are independent, multiple agents can research different parts of a problem at the same time.
The Human Team Analogy
Think about how a software team actually ships a feature. Nobody assigns one person to design, build, test, and document it solo. You split the work.
Here's how that maps:
| Role | Responsibility | Agent Equivalent |
|---|---|---|
| Product Manager | Defines requirements, prioritizes | Planner Agent |
| Researcher | Investigates solutions, gathers context | Research Agent |
| Developer | Writes the code | Coder Agent |
| Code Reviewer | Catches bugs, suggests improvements | Reviewer Agent |
| Technical Writer | Documents the work | Writer Agent |
| QA Tester | Validates the implementation | Tester Agent |
Each person has deep expertise in their area. They communicate through defined channels (standups, PRs, docs). A project manager coordinates the workflow. Sound familiar?
Agent orchestration is the AI equivalent of project management. Someone (or something) needs to decide: which agent handles this task? In what order? What happens when an agent fails?
Role Specialization: What Makes Each Agent Unique
In a multi-agent system, each agent has a distinct role. The role defines:
- What the agent knows (system prompt, context)
- What the agent can do (available tools)
- What the agent is responsible for (its piece of the workflow)
Here's a concrete example. Let's say you're building a "research and write" system for technical blog posts.
The Research Agent
Role: Technical Researcher
Goal: Gather accurate, comprehensive information on the given topic
Backstory: You're a meticulous researcher who digs deep into technical topics.
You cite sources, verify claims, and organize findings clearly.
Tools: web_search, read_documentation, fetch_github_repos
This agent's entire job is research. It doesn't write prose or format content. It searches, reads, and compiles facts. Its output is structured research notes that another agent will use.
The Writer Agent
Role: Technical Writer
Goal: Transform research into engaging, clear technical content
Backstory: You're an experienced technical writer who explains complex topics
in accessible language. You use analogies, examples, and structure.
Tools: none (pure generation)
The writer takes the researcher's output and crafts it into readable content. It doesn't search the web or verify facts; that was done upstream. It focuses purely on writing quality.
The Editor Agent
Role: Technical Editor
Goal: Review content for accuracy, clarity, and consistency
Backstory: You're a detail-oriented editor who catches errors others miss.
You verify technical claims, improve sentence structure, and ensure
the content matches the target audience.
Tools: fact_check, grammar_check
The editor is the quality gate. It reviews the writer's output, flags issues, and either approves or requests revisions.
Collaboration Patterns: Centralized vs. Decentralized
Once you have multiple agents, you need to decide how they collaborate. The two main agent collaboration patterns are centralized and decentralized.
Centralized: The Manager Pattern
In centralized orchestration, one agent (the "manager" or "coordinator") controls the workflow. It receives the initial task, breaks it into subtasks, assigns each to the appropriate specialist agent, collects results, and delivers the final output.
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────────┐ │
│ │ MANAGER │ │
│ │ AGENT │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ RESEARCH │ │ WRITER │ │ EDITOR │ │
│ │ AGENT │ │ AGENT │ │ AGENT │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────┘
Pros:
- Clear control flow
- Easy to debug (you can trace every decision through the manager)
- Single point of coordination
- Works well for defined workflows
Cons:
- Manager is a bottleneck
- If the manager fails, everything fails
- Doesn't scale well to large agent networks
Most teams should start here. It's simpler, and you can always evolve to decentralized later.
Decentralized: Agent-to-Agent
In decentralized patterns, agents communicate directly with each other based on protocols or discovery mechanisms. There's no central manager; agents negotiate, delegate, and collaborate autonomously.
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ RESEARCH │◄───────►│ WRITER │ │
│ │ AGENT │ │ AGENT │ │
│ └────┬─────┘ └────┬─────┘ │
│ │ │ │
│ │ ┌──────────┐ │ │
│ └───►│ EDITOR │◄───┘ │
│ │ AGENT │ │
│ └──────────┘ │
│ │
└─────────────────────────────────────────────────┘
Pros:
- No single point of failure
- Scales to large agent networks
- Agents can dynamically discover collaborators
- Works for marketplace/negotiation scenarios
Cons:
- Harder to debug (who decided what?)
- Requires protocols and discovery mechanisms
- More failure modes
- Higher complexity
Decentralized patterns make sense when you have many agents, dynamic environments, or need agents from different vendors/platforms to collaborate. This is where protocols like A2A (Agent2Agent) come in.
Communication: How Agents Talk to Each Other
Agents need to exchange information. The mechanism you choose affects how easy the system is to debug, how reliably it delivers results, and whether agents from different vendors can work together.
Message Passing
The simplest approach: agents send messages to each other. The message includes:
- Who it's from
- Who it's for
- The content (task, results, questions)
- Any relevant context
# Simplified message structure
message = {
"from": "research_agent",
"to": "writer_agent",
"type": "task_result",
"content": {
"topic": "Multi-agent systems",
"findings": [...],
"sources": [...]
}
}
This works for simple systems but gets messy as you scale. Who manages the message queue? How do you handle failed delivery? What if agents speak different "languages"?
Shared Memory
Instead of passing messages, agents read from and write to a shared state (like a database or in-memory store). Each agent checks the shared memory for new tasks, updates it with results, and other agents see those updates.
┌─────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ RESEARCH │ │ WRITER │ │ EDITOR │ │
│ │ AGENT │ │ AGENT │ │ AGENT │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────┐ │
│ │ SHARED MEMORY / STATE │ │
│ │ (Redis, Vector Store, Database) │ │
│ └──────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────┘
When to use shared memory:
- Agents need access to the same data
- You want to decouple producers from consumers
- State needs to persist across agent restarts
Communication Protocols: A2A (and where ACP went)
For agents to communicate across platforms or vendors, you need standardized protocols. The picture got a lot clearer in 2025-2026, and it's worth knowing how it shook out so you don't bet on a dead standard.
***Agent2Agent (A2A):* Originally Google's open protocol for agent interoperability, donated to the Linux Foundation in June 2025 under neutral governance. Agents publish "Agent Cards" describing what they can do, and other agents can discover and invoke them. A2A reached v1.0 in April 2026 with Signed Agent Cards (cryptographic identity verification so a receiving agent can confirm a card really came from its claimed owner), multi-tenancy, and JSON-RPC/gRPC bindings. By its one-year mark it had 150+ supporting organizations and production integrations across Azure AI Foundry, Amazon Bedrock AgentCore, and Google Cloud.
Agent Communication Protocol (ACP): IBM's REST-based protocol (launched March 2025 to power the BeeAI platform) for lightweight agent invocation. In August 2025, ACP merged into A2A under the Linux Foundation — the BeeAI platform now uses A2A. So if you saw "ACP vs A2A" debates from early 2025, that question has been answered: it's A2A.
These protocols matter for the future of multi-agent systems. Today, most teams still let frameworks handle communication internally. But as agent ecosystems grow and you need agents from different vendors to collaborate, A2A is becoming the interoperability layer worth learning.
One concrete sign of maturity: A2A now has an official payments extension, the Agent Payments Protocol (AP2), built with 60+ payments and tech companies (Mastercard, PayPal, American Express, Coinbase, and others) so agents can securely initiate and authorize transactions on a user's behalf. Agentic commerce is moving from demo to standard.
A Simple Multi-Agent Example
Here's pseudocode for a two-agent "research and write" system with centralized orchestration:
# Pseudocode for a simple multi-agent system
def run_multi_agent(task: str):
# Step 1: Manager breaks down the task
manager = create_agent(
role="Manager",
goal="Coordinate research and writing tasks"
)
subtasks = manager.plan(task)
# subtasks = ["Research X", "Write article about X"]
results = {}
# Step 2: Research agent handles research
researcher = create_agent(
role="Researcher",
tools=[web_search, read_docs]
)
results["research"] = researcher.execute(subtasks[0])
# Step 3: Writer agent handles writing, using research results
writer = create_agent(
role="Writer",
context=results["research"]
)
results["draft"] = writer.execute(subtasks[1])
# Step 4: Manager reviews and returns
final = manager.review(results["draft"])
return final
This is about 20 lines of pseudocode, but it captures the core pattern:
- A manager agent plans the workflow
- Specialist agents execute their piece
- Results flow from one agent to the next
- The manager delivers the final output
Real implementations add error handling, retries, logging, and more sophisticated orchestration. But the foundation is this simple.
When Things Go Wrong: Handling Agent Failures
In single-agent systems, failure is straightforward: the agent errors, you handle it. In multi-agent systems, failures cascade. Agent A fails, so Agent B doesn't get input, so Agent C produces garbage.
This isn't hand-waving. It's measured. UC Berkeley's MAST study ("Why Do Multi-Agent LLM Systems Fail?") hand-annotated 150 conversation traces across seven popular open-source multi-agent frameworks and found 14 distinct failure modes that cluster into three buckets: system/specification design (~41%), inter-agent misalignment (~37%), and task verification (~21%). The headline takeaway: most multi-agent failures aren't reasoning failures — they're coordination and verification failures. Agents act on stale or divergent views of shared state, or nobody checks the final output. That's exactly where you should spend your engineering effort.
Here's how I think about failure handling:
1. Fail Fast with Clear Errors
Each agent should validate its inputs and outputs. If an agent receives garbage, it should fail immediately with a clear error rather than produce garbage output.
def execute_with_validation(agent, task, input_data):
# Validate input
if not input_data or not input_data.get("content"):
raise ValueError(f"Agent {agent.role} received empty input")
result = agent.execute(task, input_data)
# Validate output
if not result or len(result) < 100:
raise ValueError(f"Agent {agent.role} produced insufficient output")
return result
2. Retry with Backoff
Transient failures (rate limits, network blips) should trigger retries:
def execute_with_retry(agent, task, max_retries=3):
for attempt in range(max_retries):
try:
return agent.execute(task)
except TransientError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt) # Exponential backoff
3. Fallback Agents
For critical tasks, have a backup. If your primary research agent fails, maybe a simpler agent with different tools can provide basic results:
def research_with_fallback(task):
try:
return primary_research_agent.execute(task)
except AgentError:
log.warning("Primary researcher failed, using fallback")
return fallback_research_agent.execute(task)
4. Human Escalation
Sometimes the right answer is "ask a human." Build escalation paths for high-stakes or ambiguous situations:
def execute_with_escalation(agent, task, confidence_threshold=0.7):
result = agent.execute(task)
if result.confidence < confidence_threshold:
return request_human_review(task, result)
return result
Practical Starting Point: CrewAI
If you want to try multi-agent systems today, CrewAI is a good starting point. It provides clear abstractions for:
- Agents: Define role, goal, backstory, tools
- Tasks: Define what needs to be done, expected output
- Crews: Group agents and tasks into workflows
- Processes: Sequential or hierarchical execution
Here's what a minimal CrewAI setup looks like:
from crewai import Agent, Task, Crew
# Define agents
researcher = Agent(
role="Senior Researcher",
goal="Find accurate, comprehensive information",
backstory="You're an expert researcher with attention to detail",
tools=[search_tool, scrape_tool]
)
writer = Agent(
role="Technical Writer",
goal="Create clear, engaging technical content",
backstory="You explain complex topics in simple terms"
)
# Define tasks
research_task = Task(
description="Research multi-agent systems and their applications",
expected_output="Structured research notes with sources",
agent=researcher
)
writing_task = Task(
description="Write a blog post based on the research",
expected_output="1500-word blog post",
agent=writer
)
# Create and run the crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task]
)
result = crew.kickoff()
This is real code (not pseudocode). CrewAI handles the orchestration, message passing, and execution order. You focus on defining agents and tasks.
What's Next
You've got the foundations: why single agents hit walls, how to carve up roles, and when centralized beats decentralized. The next posts go deeper:
{/* TODO: Uncomment when these articles are published
*/}
For now, pick a task that naturally splits into two phases (research + writing is a good one), build one agent for each, and run it. You'll hit a failure mode or an unexpected behavior within the first few runs. That's the real learning.
Key Concepts Recap
- Multi-Agent Systems
- Agent Orchestration
- Collaboration Patterns
- AI Agents
- Agentic AI
- Agentic Systems
- A2A Protocol
- Function Calling
- Guardrails
FAQs
What is a multi-agent system in AI?
A multi-agent system is a setup where several specialized AI agents collaborate to solve problems that would overwhelm a single agent. Each agent has a specific role, set of tools, and responsibility, like a researcher, a writer, and an editor working as a team instead of one person trying to do all three jobs.
When should I use a multi-agent system instead of a single agent?
Reach for multi-agent when you hit real walls with a single agent: context window limits on long workflows, the need for true specialization that one system prompt can't deliver, wanting a second agent to review for errors, or genuine parallelization. If a well-crafted prompt chain already solves your problem, stop there. Multi-agent adds real complexity.
What is the difference between centralized and decentralized agent orchestration?
Centralized orchestration uses a manager agent to assign tasks and collect results, which gives you clear control flow, easy debugging, and a good fit for defined workflows. Decentralized lets agents communicate directly without a central coordinator, which scales better in dynamic environments but is harder to debug and more complex to build. Start centralized.
How many agents should I use in a multi-agent system?
Start with the minimum: usually 2-3 agents with clearly distinct roles. Every additional agent adds coordination overhead, more API calls, and more potential failure points. Add agents only when you've identified a clear capability gap. In practice, 3-5 agents handle most use cases. If you need more, you might be over-engineering or could restructure into sub-crews.
Can different agents use different LLM providers?
Yes, and sometimes you should. A research agent might benefit from a model with strong web browsing capabilities, while a coding agent works better with a model optimized for code generation. Mix and match based on each agent's needs. Just watch out for increased complexity in error handling, cost tracking, and latency management.
What is the difference between multi-agent systems and prompt chaining?
Prompt chaining is sequential: output from prompt A becomes input for prompt B. It is linear and deterministic. Multi-agent systems add autonomy: agents can decide what to do, use tools, and communicate in non-linear ways. Prompt chains are simpler and sufficient for many use cases. Multi-agent systems handle more complex, dynamic workflows.
Continue Learning
Enjoyed this article? Put your knowledge to the test:
- Take the interactive quiz on BlockSimplified to see how much you retained
- Explore 14 linked Learning Blocks, curated resources, FAQs for deeper understanding
- Follow for more insights on AI, development, and tech

Top comments (0)