One agent handles simple tasks. Complex workflows need a team. Bedrock's multi-agent collaboration lets a supervisor agent break down problems, delegate to specialists, and combine results. Here's how to build it with Terraform.
In the previous posts, we deployed a single Bedrock agent with action groups. That works for focused tasks. But real workflows are multi-domain: a customer support request might need to check an order, look up a policy, and escalate to a specialist. A single agent with 15 action groups performs poorly because the model struggles to select the right tool from too many options.
Multi-agent collaboration solves this. You create specialized agents, each focused on one domain, and a supervisor agent that routes requests, delegates tasks, and combines results. The supervisor reads the user's intent and decides which specialist to call. Each specialist stays focused and performs better than one overloaded generalist. Now GA on Amazon Bedrock. π―
ποΈ How Multi-Agent Collaboration Works
User: "Cancel my order #1234 and refund to my original payment method"
β
Supervisor Agent (analyzes intent, plans execution)
β delegates to
ββββββββββββββββββββββββββββββββββββββββββββ
β Order Agent β Payments Agent β
β (cancels order #1234)β (processes refund)β
ββββββββββββββββββββββββββββββββββββββββββββ
β results flow back to
Supervisor Agent (combines results)
β
User: "Order #1234 has been cancelled. A refund of $89.99
will be processed to your Visa ending in 4242
within 3-5 business days."
The supervisor doesn't do the work itself. It plans, delegates, and synthesizes.
π§ Two Collaboration Modes
| Mode | How It Works | Best For |
|---|---|---|
| Supervisor | Analyzes every request, breaks down complex tasks, coordinates multiple agents | Complex multi-step workflows |
| Supervisor with Routing | Routes simple requests directly to a specialist; falls back to full supervisor mode for complex queries | Mixed simple/complex traffic |
Supervisor with routing is more cost-effective for production traffic where most requests are straightforward. The routing mode skips full orchestration for simple queries, reducing latency and token usage.
π§ Terraform: Build the Team
Step 1: Create Specialist (Collaborator) Agents
Each collaborator is a standalone agent with its own model, instructions, and tools:
# agents/collaborators.tf
variable "collaborators" {
description = "Map of specialist agents"
type = map(object({
instruction = string
collaboration_instruction = string
model_id = string
}))
}
resource "aws_bedrockagent_agent" "collaborator" {
for_each = var.collaborators
agent_name = "${var.environment}-${each.key}"
agent_resource_role_arn = aws_iam_role.agent.arn
foundation_model = each.value.model_id
instruction = each.value.instruction
agent_collaboration = "COLLABORATOR"
prepare_agent = true
idle_session_ttl_in_seconds = var.idle_session_ttl
}
resource "aws_bedrockagent_agent_alias" "collaborator" {
for_each = var.collaborators
agent_alias_name = "${var.environment}-live"
agent_id = aws_bedrockagent_agent.collaborator[each.key].agent_id
}
agent_collaboration = "COLLABORATOR" marks these agents as available for a supervisor to call. Each collaborator can have its own action groups, knowledge bases, and guardrails.
Step 2: Create the Supervisor Agent
# agents/supervisor.tf
resource "aws_bedrockagent_agent" "supervisor" {
agent_name = "${var.environment}-supervisor"
agent_resource_role_arn = aws_iam_role.supervisor.arn
foundation_model = var.supervisor_model.id
instruction = var.supervisor_instruction
agent_collaboration = "SUPERVISOR"
prepare_agent = false # Prepare AFTER collaborators are associated
idle_session_ttl_in_seconds = var.idle_session_ttl
}
prepare_agent = false is critical for supervisors. A supervisor agent cannot be prepared until collaborators are associated. If you set this to true, Terraform fails because the supervisor has no team members yet.
Step 3: Associate Collaborators with Supervisor
# agents/associations.tf
resource "aws_bedrockagent_agent_collaborator" "this" {
for_each = var.collaborators
agent_id = aws_bedrockagent_agent.supervisor.agent_id
collaboration_instruction = each.value.collaboration_instruction
collaborator_name = each.key
relay_conversation_history = "TO_COLLABORATOR"
agent_descriptor {
alias_arn = aws_bedrockagent_agent_alias.collaborator[each.key].agent_alias_arn
}
depends_on = [aws_bedrockagent_agent.supervisor]
}
collaboration_instruction tells the supervisor when to use each collaborator. This is the routing logic. Be specific: "Handle all questions about order status, cancellations, and shipping. Do not handle payment or refund questions."
relay_conversation_history = "TO_COLLABORATOR" shares the conversation context so collaborators have full context when handling their part.
Step 4: Prepare the Supervisor
After all collaborators are associated, prepare the supervisor:
# agents/prepare.tf
resource "null_resource" "prepare_supervisor" {
triggers = {
collaborators = jsonencode(keys(var.collaborators))
supervisor = aws_bedrockagent_agent.supervisor.agent_id
}
provisioner "local-exec" {
command = <<EOF
aws bedrock-agent prepare-agent \
--agent-id ${aws_bedrockagent_agent.supervisor.agent_id} \
--region ${var.region}
EOF
}
depends_on = [aws_bedrockagent_agent_collaborator.this]
}
resource "time_sleep" "wait_for_preparation" {
depends_on = [null_resource.prepare_supervisor]
create_duration = "15s"
}
resource "aws_bedrockagent_agent_alias" "supervisor" {
agent_alias_name = "${var.environment}-live"
agent_id = aws_bedrockagent_agent.supervisor.agent_id
depends_on = [time_sleep.wait_for_preparation]
}
The time_sleep gives the supervisor time to reach the PREPARED state before creating the alias. In production, consider polling the agent status instead of a fixed delay.
π Environment Configuration
# environments/dev.tfvars
supervisor_model = {
id = "anthropic.claude-sonnet-4-20250514-v1:0"
display = "Claude Sonnet 4"
}
supervisor_instruction = <<-EOT
You are a customer support supervisor. Analyze each request
and delegate to the appropriate specialist. For complex requests
that span multiple domains, coordinate between specialists and
combine their results into a clear response.
EOT
collaborators = {
order-agent = {
model_id = "anthropic.claude-sonnet-4-20250514-v1:0"
instruction = "You handle order lookups, cancellations, and shipping status. You have access to the orders database via action groups."
collaboration_instruction = "Handle all questions about order status, cancellations, returns, and shipping. Do not handle payment or billing questions."
}
payments-agent = {
model_id = "anthropic.claude-haiku-4-20250414-v1:0"
instruction = "You handle payment processing, refunds, and billing inquiries. You have access to the payments API via action groups."
collaboration_instruction = "Handle all questions about payments, refunds, billing, and payment methods. Do not handle order status questions."
}
policy-agent = {
model_id = "anthropic.claude-haiku-4-20250414-v1:0"
instruction = "You answer questions about company policies, warranty terms, and return policies based on the knowledge base."
collaboration_instruction = "Handle all questions about company policies, warranty, and return rules. Refer to the knowledge base for accurate answers."
}
}
Model selection per agent. Use a powerful model (Sonnet) for the supervisor that needs to plan and coordinate. Use a faster, cheaper model (Haiku) for simple specialist agents that execute focused tasks. This optimizes cost without sacrificing quality.
π§ͺ Invoke the Multi-Agent System
From your application, you invoke the supervisor. It handles delegation internally:
import boto3
client = boto3.client("bedrock-agent-runtime")
response = client.invoke_agent(
agentId=supervisor_agent_id,
agentAliasId=supervisor_alias_id,
sessionId="user-session-456",
inputText="Cancel my order #1234 and refund to my original payment method"
)
for event in response["completion"]:
if "chunk" in event:
print(event["chunk"]["bytes"].decode(), end="")
Your application calls one endpoint. The supervisor decomposes the request, delegates to the order agent and payments agent, gathers results, and returns a unified response. Zero orchestration code in your application.
β οΈ Gotchas and Tips
Prompt quality is everything. The most common failure mode is the supervisor trying to handle everything itself instead of delegating. Write clear supervisor instructions that explicitly say "delegate to specialists" and clear collaboration instructions that define non-overlapping responsibilities.
Sequential collaborator association. When adding multiple collaborators, each association triggers a preparation step. Adding them too fast can fail because the agent is in a PREPARING state. The depends_on chain in the Terraform handles this, but be aware of it for manual operations.
Mixed-model teams. Use expensive models for complex reasoning (supervisor, planning agents) and cheaper models for straightforward tasks (lookup agents, policy agents). The cost difference is significant at scale.
Conversation history sharing. Enable relay_conversation_history for collaborators that need context from previous turns. Disable it for stateless lookup agents where the context adds unnecessary tokens.
Adding new specialists. To add a new collaborator, add an entry to the collaborators variable and run terraform apply. The supervisor is re-prepared automatically and can immediately delegate to the new agent.
βοΈ What's Next
This is Post 3 of the AWS AI Agents with Terraform series.
- Post 1: Deploy First Bedrock Agent π€
- Post 2: Action Groups - Connect to APIs π
- Post 3: Multi-Agent Orchestration (you are here) π§
- Post 4: Agent + Knowledge Base Grounding
Your single agent is now a team. A supervisor that plans, specialists that execute, and Terraform that makes the whole team a variable change away from a new specialist. Scale your AI by scaling your team. π§
Found this helpful? Follow for the full AI Agents with Terraform series! π¬
Top comments (0)