Introduction
Agent orchestration is the operational layer that sits between your LLM calls and production reality. A single agent with tool calling works fine for demos. But production systems need multi-agent coordination, task routing, error recovery, and observability. This guide covers the patterns that separate proof-of-concept agent systems from production-grade deployments.
## The Three Orchestration Primitives
Agent orchestration boils down to three coordination patterns, each with different trade-offs:
### 1. Sequential Chain (Linear Pipeline)
Each agent executes in order, passing output to the next. Simple to reason about, easy to debug, but inflexible.
```
Research Agent → Analysis Agent → Report Writer → Formatter
plaintext
**Use when:** Task has clear sequential dependencies (can't analyze before researching)
**Avoid when:** Steps could run in parallel, or flow needs branching logic
### 2. Router (Conditional Branching)
A coordinator agent examines the input and routes to specialized sub-agents based on task type.
```
`Router Agent
├─ Code Review Agent (if PR detected)
├─ Bug Triage Agent (if issue detected)
└─ Documentation Agent (if docs change)`
**Use when:** Multiple specialized agents handle different task types
**Avoid when:** All tasks require the same pipeline (unnecessary overhead)
### 3. Supervisor (Hierarchical Delegation)
A supervisor agent breaks down complex tasks and delegates to worker agents, aggregating results.
```
Supervisor
├─ Worker 1: Analyze codebase
├─ Worker 2: Run tests
├─ Worker 3: Check dependencies
└─ Supervisor: Aggregate and decide
python
**Use when:** Task needs decomposition, parallel execution, and synthesis
**Avoid when:** Task is atomic (supervisor adds latency for no benefit)
## Tool Calling Architecture
Tool calling is where agents interact with external systems. The naive approach (agents call tools directly) breaks in production. Instead, use a centralized tool registry:
```
`class ToolRegistry:
def __init__(self):
self.tools = {}
self.call_log = []
def register(self, name, func, schema):
self.tools[name] = {
'function': func,
'schema': schema,
'calls': 0,
'errors': 0
}
async def execute(self, tool_name, args):
if tool_name not in self.tools:
raise ToolNotFoundError(f"{tool_name} not registered")
tool = self.tools[tool_name]
try:
result = await tool['function'](**args)
tool['calls'] += 1
self.call_log.append({
'tool': tool_name,
'args': args,
'success': True,
'timestamp': time.time()
})
return result
except Exception as e:
tool['errors'] += 1
self.call_log.append({
'tool': tool_name,
'args': args,
'success': False,
'error': str(e),
'timestamp': time.time()
})
raise`
This gives you:
- **Centralized observability** — all tool calls logged in one place
- **Error tracking** — which tools are failing, with what args
- **Rate limiting** — apply per-tool quotas
- **Schema validation** — catch malformed tool calls before execution
## Error Recovery Strategies
Agents fail. LLMs hallucinate tool names, pass malformed JSON, or return nonsense. Your orchestration layer must handle this:
### Retry with Context
Don't just retry blindly. Append the error to the context and let the agent course-correct:
```
messages.append({
"role": "user",
"content": f"Tool call failed: {error}. Please try again with correct parameters."
})
python
### Fallback Chains
Define fallback tools for critical operations. If primary tool fails, try secondary:
```
`FALLBACK_CHAINS = {
'search': ['tavily_search', 'duckduckgo_search', 'bing_search'],
'code_exec': ['e2b_sandbox', 'local_docker', 'read_only_eval']
}`
### Circuit Breaker
If a tool fails repeatedly, stop calling it and notify ops:
```
if tool.error_rate() > 0.5 and tool.calls > 10:
tool.circuit_open = True
alert_ops(f"{tool.name} circuit breaker opened")
json
## State Management
Multi-agent workflows need shared state. Three patterns:
### 1. Message-Passing (Stateless)
Agents communicate only via messages. No shared state.
**Pros:** Simple, no race conditions
**Cons:** Context duplication, token waste
### 2. Shared Memory (Stateful)
Agents read/write to a shared key-value store.
**Pros:** Efficient, no duplication
**Cons:** Race conditions, requires locking
### 3. Hybrid (Event Sourcing)
Agents emit events to a log. Derived state is computed from event replay.
**Pros:** Auditable, time-travel debugging
**Cons:** Complex, higher latency
## Observability in Practice
Production agent systems need tracing at three levels:
### 1. LLM Calls (Token Level)
Log every LLM request/response with tokens, latency, cost:
```
`{
"model": "claude-3-5-sonnet-20250219",
"prompt_tokens": 1523,
"completion_tokens": 412,
"latency_ms": 3421,
"cost_usd": 0.0234
}`
### 2. Tool Calls (Action Level)
Log tool invocations with args and results:
```
{
"tool": "search_codebase",
"args": {"query": "authentication", "file_pattern": "*.py"},
"result": {"matches": 17, "files": ["auth.py", "login.py"]},
"latency_ms": 234
}
json
### 3. Workflow Runs (Job Level)
Track end-to-end workflow execution:
```
`{
"workflow_id": "run_abc123",
"agents": ["router", "code_reviewer", "test_runner"],
"total_tokens": 8934,
"total_cost": 0.12,
"duration_ms": 45000,
"success": true
}`
For RAG-based agent workflows, structured debugging tools like [RAG Debugger](https://rag-debugger.pages.dev) help trace retrieval issues by showing which chunks were retrieved, their similarity scores, and how the LLM used them in responses.
## Scaling Patterns
When agent workflows exceed single-machine capacity:
### Horizontal Agent Pools
Run N instances of the same agent type, load-balance across them:
```
worker_pool = [CodeReviewAgent() for _ in range(10)]
task_queue.submit(random.choice(worker_pool), task)
python
### Async Execution with Queues
Don't block on long-running agents. Use task queues:
```
`await queue.enqueue('research_agent', {'topic': 'LLM scaling'})
# Later...
result = await queue.get_result(task_id)`
### Streaming Results
For long workflows, stream intermediate results to the user:
```
async for event in orchestrator.run_streaming(task):
if event.type == 'agent_started':
print(f"Starting {event.agent_name}...")
elif event.type == 'tool_called':
print(f"Called {event.tool_name}")
elif event.type == 'agent_completed':
print(f"Finished {event.agent_name}")
## Common Anti-Patterns
What to avoid:
### 1. Over-Orchestration
Adding supervisor agents when a simple chain would work. More agents = more latency, more cost, more failure modes.
### 2. God Agents
One agent with 50 tools instead of specialized agents. Makes prompts bloated, increases hallucination risk.
### 3. Synchronous Waterfalls
Agent A waits for B, which waits for C, which waits for D. Total latency = sum of all agents. Parallelize when possible.
### 4. No Error Boundaries
One failed tool call crashes the entire workflow. Isolate failures per-agent.
## Framework Comparison
Popular orchestration frameworks and their trade-offs:
[table]
## Conclusion
Agent orchestration is less about the LLM and more about operational patterns: routing, error recovery, state management, observability. Start with simple chains, add complexity only when needed. Instrument everything. Use centralized tool registries. And never trust an agent to do the right thing on the first try.
### Build Better AI Tools
DevKits provides developer tools for JSON formatting, Base64 encoding, regex testing, and more — all free and privacy-first.
<a href="/">Try DevKits →</a>
---
*Originally published at [aiforeverthing.com](https://aiforeverthing.com/blog/agent-orchestration-patterns)*
Top comments (0)