Gunnar Grosch for AWS

Posted on Jan 4

DEV Track Spotlight: Building Scalable, Self-Orchestrating AI Workflows with A2A and MCP (DEV415)

#aws #ai #mcp #serverless

Together with Allen Helton (AWS Serverless Hero and Ecosystem Engineer at Momento), I had the privilege of demonstrating how to build autonomous multi-agent systems that actually work in production.

The session tackled a challenge many teams face: how do you move beyond single-agent demos to build scalable, reliable multi-agent systems that can handle complex workflows autonomously? The answer lies in structured protocols, disciplined architecture, and treating AI agents like the distributed systems they truly are.

Watch the Full Session:

Why Self-Orchestrating AI Workflows Matter

Traditional workflow orchestration relies on predefined paths and static logic. But AI agents reason dynamically based on context, making decisions that can't be predicted at design time. Allen captured this perfectly: "Agents don't follow linear playbooks. Their reasoning jumps based on context, not code. Every task can take a different path. Static wiring can't predict those branches."

The solution? Let agents choreograph themselves. Instead of a central orchestrator dictating every step, agents discover each other's capabilities at runtime, delegate work autonomously, and adapt their workflows based on real-time context.

This approach unlocks horizontal scalability. Each agent runs independently, processes events asynchronously, and collaborates through standardized protocols without bottlenecks or single points of failure.

SwiftShip Demo: Starting Simple

Before diving into the architecture, we demonstrated SwiftShip with a simple scenario. A delivery driver reported a customer not being home. This intentionally simple scenario showed how a single agent can react to real-world events, interpret what they mean, and take appropriate next actions.

The Triage Agent automatically evaluated the situation, scheduled redelivery, and notified the customer without human intervention. This single-agent workflow showed the basics: an agent receives an event, reasons about it, and takes action.

But the real power emerges when multiple agents need to collaborate.

The Foundation: A2A and MCP

Two protocols form the backbone of self-orchestrating workflows:

Agent-to-Agent (A2A) Protocol

A2A provides a standardized way for agents to discover and collaborate. Each agent publishes an agent card that defines its capabilities, the version of each capability, and the input schemas it expects.

Think of agent cards as API documentation for AI agents. Each card is a JSON document that describes what capabilities the agent offers, what input schemas it expects, where to invoke the agent, and what version of each capability is available.

This enables dynamic discovery. Agents don't need hardcoded integrations. They find collaborators at runtime, validate schemas, and make typed invocations that ensure reliable communication.

Model Context Protocol (MCP)

While A2A handles agent-to-agent communication, MCP standardizes how agents access tools and context. Instead of passing raw text or unstructured data, agents call MCP tools that expose typed validated interfaces.

MCP ensures every interaction follows a schema. This removes ambiguity, improves model reasoning reliability, and keeps system behavior stable when multiple agents use the same data sources.

The Agent Loop: Four Phases of Autonomous Execution

Every agent in our architecture follows the same execution model, regardless of where it runs. Whether in Lambda, AgentCore, containers, or EC2 instances, the pattern remains consistent.

Phase 1: Compose Prompt

The compose phase transforms an incoming event into a deterministic input package for the model. This involves loading context through MCP tools, reconstructing prior task state from Amazon Bedrock AgentCore memory, and building a system prompt using the RISEN framework (Role, Instructions, Steps, Expectations, Narrowing).

Phase 2: Query LLM

The query phase sends the composed prompt to the language model. Allen emphasized an important point: "Model selection determines stability, determinism, and correctness."

Different agent types need different models. Supervisors coordinate work and need multi-step reasoning (Nova Pro, Claude Sonnet, GPT-4). Workers execute specialized tasks and optimize for cost and latency (Nova Lite, Claude Haiku, Gemini Flash).

The model returns structured output validated against schemas, ensuring predictable responses.

Phase 3: Execute Action

The execute phase turns model output into real effects in the system. This includes schema validation of model output, performing actions like API calls or database updates, using idempotency keys for safe retries, and A2A invocation for handoffs to other agents.

Phase 4: Continue the Loop

The continue phase determines whether the agent needs to keep working. The agent evaluates whether it has enough information to finish.

Supervisors decide whether to spawn more agents, wait for results, retry, or finalize. Workers decide whether to retry, refine output, or return results.

Complex Multi-Agent Workflow: The Catastrophically Destroyed Package

After the simple demo, we moved to something more interesting. This is where it gets exciting: moving from a single autonomous agent into a full, choreographed multi-agent workflow.

Allen created an intentionally exaggerated scenario in the driver notes field: "Package severely damaged during transit. Contents caught on fire, then were run over by my car, and the car behind me. Then the ashes caught on fire. This was my fault."

My response during the demo: "Typical delivery event." Allen's reply: "Yeah, we see these a lot at Swift Ship actually, it's kind of a problem."

This absurd scenario perfectly demonstrated the power of multi-agent systems. The agents didn't need predefined rules for "package caught on fire twice and run over by multiple vehicles." They reasoned about the free-text input, understood the severity, and coordinated the appropriate response: refund processing, inventory allocation, and replacement order creation.

The workflow involved:

Driver reporting catastrophic package damage through a mobile interface
Triage Agent analyzing severity and determining next steps
Payment Agent processing a refund
Warehouse Agent checking inventory for replacement
Order Agent creating a replacement order

Each agent operated autonomously, making decisions based on its domain expertise while collaborating through A2A protocol. No amount of traditional if-then logic could have anticipated this scenario, but the multi-agent system handled it gracefully.

Architecture: Serverless Multi-Agent Systems

Our reference architecture uses three core services:

AWS Lambda for Agent Execution

Each agent runs as a standard Lambda function. Each Lambda execution is isolated, which means every Agent Loop starts with a clean state.

This isolation is crucial. It prevents state leakage between invocations and ensures each agent loop starts fresh.

Amazon Bedrock AgentCore for Memory and Observability

AgentCore provides persistent memory for task state, giving us a consistent view of what has already happened.

AgentCore also delivers built-in observability, tracking model invocations, costs, and performance metrics without custom instrumentation.

Momento Cache for Fast Coordination

Momento serves as the fast coordination layer, using Momento cache for lightweight state that gives us low latency access to shared information.

Momento Topics enable real-time event streaming, allowing the UI to visualize agent workflows as they execute.

Building Deterministic Agents: Agent Cards

Agent cards are the foundation of A2A discovery. They define what an agent can do, how to invoke it, and what inputs it expects. Here's the complete agent card for the Order Management Agent:

{
  "name": "Order Management Agent",
  "description": "Manage shipping orders, statuses, and order operations for SwiftShip Logistics",
  "url": "https://api.swiftship.example.com/order-agent",
  "capabilities": {
    "streaming": false,
    "pushNotifications": false
  },
  "skills": [
    {
      "id": "change-order-status",
      "name": "Change Order Status",
      "description": "Update order status including delivery failures and status transitions",
      "examples": [
        "Change order ORD-12345 status to delivery_failed",
        "Update order status to in_transit",
        "Mark order as delivered"
      ],
      "tags": ["order-management", "status-update", "delivery"]
    },
    {
      "id": "duplicate-order",
      "name": "Duplicate Order",
      "description": "Create replacement orders for failed deliveries with optional customer and address overrides",
      "examples": [
        "Duplicate order ORD-12345 for redelivery",
        "Create a replacement order with new address",
        "Duplicate order with updated customer information"
      ],
      "tags": ["order-management", "redelivery", "replacement"]
    }
  ]
}

This card tells other agents exactly what the Order Management Agent can do. The Triage Agent can discover this card at runtime, see that it has a duplicate-order skill, and invoke it with the appropriate schema. No hardcoded integrations required.

Building Deterministic Agents: System Prompts with RISEN

The RISEN framework (Role, Instructions, Steps, Expectations, Narrowing) structures system prompts to ensure deterministic behavior. Here's the complete system prompt for the Order Management Agent:

const systemPrompt = `## Role
You are the Order Management Agent for SwiftShip Logistics, specializing in order status updates and order duplication for redelivery scenarios.

## Instructions
Manage shipping orders using two core tools:
- changeOrderStatus: Update order status including delivery_failed states
- duplicateOrder: Create replacement orders with optional customer/address overrides

## Steps
1\. Validate the requested operation and required parameters
2\. For status changes:
   - Verify order exists
   - Update status with appropriate reason
3\. For order duplication:
   - Retrieve original order details
   - Apply any customer or address overrides
   - Create new order with updated information
4\. Provide clear confirmation with order details

## Expectation
Execute order operations reliably while maintaining data integrity and providing clear confirmation of all changes.

## Narrowing
- Only modify orders that exist in the system
- Status changes must use valid status values
- Order duplication requires a valid source order`;

This prompt demonstrates each element of RISEN:

Role: Defines the agent as the Order Management Agent for SwiftShip Logistics
Instructions: Specifies the two core tools and their purposes
Steps: Explicit numbered procedures for validation and execution
Expectations: Clear outcomes for reliability and data integrity
Narrowing: Focused constraints on what the agent can and cannot do

The RISEN structure removes ambiguity. The model knows exactly what it is, what it should do, how to do it, what success looks like, and what it should never attempt.

Building Deterministic Agents: Implementation

Here's how we implement the Order Management Agent using the Momento A2A framework:

// api/functions/agents/order.mjs
import { createAgent } from 'momento-a2a-agent';
import { buildRequest } from '../utils/api.mjs';
import { converse, convertToBedrockTools } from '../utils/agents.mjs';
import { changeOrderStatus } from '../tools/change-order-status.mjs';
import { duplicateOrder } from '../tools/duplicate-order.mjs';

let agent;

const getAgent = async (baseUrl) => {
  if (!agent) {
    const agentParams = {
      // Agent card shown in previous section
      agentCard: { /* ... */ },
      skills: [ /* ... */ ],
      options: {
        defaultTtlSeconds: 3600,
        registerAgent: true
      },
      handler: agentHandler,
      ...process.env.MOMENTO_API_KEY && {
        cacheName: 'mcp',
        apiKey: process.env.MOMENTO_API_KEY,
      }
    };

    agent = await createAgent(agentParams);
  }

  return agent;
};

export const handler = async (event) => {
  try {
    const { request, baseUrl } = buildRequest(event);
    const agentInstance = await getAgent(baseUrl);
    const response = await agentInstance.fetch(request);

    const body = await response.text();
    const headers = Object.fromEntries(response.headers.entries());
    return {
      statusCode: response.status,
      headers,
      body,
    };
  } catch (error) {
    console.error('Order agent error:', error);
    return {
      statusCode: 500,
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ message: 'Something went wrong' }),
    };
  }
};

const agentHandler = async (message) => {
  // System prompt shown in previous section
  const systemPrompt = `## Role...`;

  const tools = convertToBedrockTools([changeOrderStatus, duplicateOrder]);

  const context = {
    tenantId: 'example-tenant',
    sessionId: 'session-' + Date.now()
  };

  const response = await converse(process.env.MODEL_ID, systemPrompt, message, tools, context);
  return { message: response };
};

The key components here are:

Agent Registration: The createAgent function registers the agent with Momento's A2A framework. The registerAgent: true option makes it discoverable by other agents through the agent card.

Lambda Handler: The handler function is the entry point for AWS Lambda. It builds the request, gets the agent instance, and processes the incoming message through the A2A framework.

Agent Handler: The agentHandler function implements the agent loop. It composes the system prompt, converts tools to Bedrock-compatible format, sets up the context (including tenant isolation), and invokes the conversation engine.

Tool Integration: The convertToBedrockTools function transforms our tool definitions into the format Amazon Bedrock expects, with schema validation and typed inputs.

Context Management: The context object includes tenant isolation (tenantId) and session tracking, ensuring secure multi-tenant operations and conversation continuity.

Tool Implementation: Deterministic Business Logic

Tools are what transform agents from chatbots into autonomous systems that actually do things. Here's the complete refund processing tool:

// api/functions/tools/process-refund.mjs
import { z } from 'zod';
import { DynamoDBClient, PutItemCommand } from '@aws-sdk/client-dynamodb';
import { marshall } from '@aws-sdk/util-dynamodb';
import { randomUUID } from 'crypto';

const ddb = new DynamoDBClient();

export const processRefundTool = {
  isMultiTenant: true,
  name: 'processRefund',
  description: 'Process a refund for demo scenarios (simplified refund processing)',
  schema: z.object({
    orderId: z.string().min(1).describe('Order ID to process refund for'),
    refundAmount: z.number().positive().describe('Refund amount (must be positive)'),
    reason: z.enum(['delivery_failed', 'damaged_package', 'customer_request']).describe('Reason for the refund'),
    scenarioId: z.string().optional().describe('Demo scenario identifier for A2A event tracking')
  }),
  handler: async (tenantId, { orderId, refundAmount, reason, scenarioId }) => {
    try {
      if (!tenantId) {
        console.error('Missing tenantId in refund processing');
        return 'Unauthorized: Missing tenant context';
      }

      const refundId = `ref_${Date.now()}_${randomUUID().slice(0, 8)}`;
      const now = new Date().toISOString();

      const refundRecord = {
        pk: `${tenantId}#refunds`,
        sk: `refund#${refundId}`,
        GSI1PK: `${tenantId}#orders#${orderId}`,
        GSI1SK: `refund#${now}`,
        refundId,
        orderId,
        refundAmount,
        reason,
        status: 'completed',
        processedAt: now,
        currency: 'USD',
        ttl: Math.floor(Date.now() / 1000) + (90 * 24 * 60 * 60)
      };

      await ddb.send(new PutItemCommand({
        TableName: process.env.TABLE_NAME,
        Item: marshall(refundRecord)
      }));

      return `Refund processed successfully: $${refundAmount} for order ${orderId} (Reason: ${reason})`;

    } catch (error) {
      console.error('Refund processing error:', error);
      return 'Refund processing failed';
    }
  }
};

Notice there's no LLM in this code. It's just deterministic business logic that the agent can invoke through schema-validated tool calls. The Zod schema ensures the agent provides valid inputs, and the tool returns a simple string response that the agent can interpret.

The isMultiTenant: true flag is critical. It tells the framework to inject the tenant ID from the authenticated context, preventing agents from accessing data across tenant boundaries. The agent never sees or controls the tenant ID, it's provided by the infrastructure.

Domain Isolation: The Key to Scalable Agent Systems

Allen emphasized a critical principle: "What we need to do is we need to scope agents down to specific domains... One single agent is not allowed to cross those domain boundaries."

The SwiftShip architecture demonstrates this through clear domain separation:

Triage Agent: Entry point and supervisor that classifies exceptions
Payment Agent: Handles refunds and financial transactions
Warehouse Agent: Manages inventory and stock allocation
Order Agent: Updates order status and creates replacements

Each agent has specialized tools for its domain. The Payment Agent can process refunds but cannot access inventory. The Warehouse Agent can allocate stock but cannot modify orders. This isolation prevents cross-domain hallucinations and improves security.

Allen warned against the opposite approach: "There's a new term on the streets. They call it a mono-agent, and it's an agent that does everything... Think that's a good thing? Nah, it's not."

Mono-agents become unpredictable at scale. They try to reason across too many domains, leading to confused outputs and unreliable behavior.

Production Best Practices: Treating Agents as Distributed Systems

Deterministic Choreography

When we move from a single agent to a multi-agent workflow, the system only stays predictable if every interaction is deterministic.

Key practices include:

Clear prompts using RISEN framework
Idempotent actions with task IDs
Atomic APIs that represent complete work units
Predictable error handling with compensating actions

Microservice Parallels

During the session, we drew parallels to established distributed systems patterns. When you look at a multi-agent system at scale, the interesting part is how familiar the behavior becomes. Even though these agents use LLMs for reasoning, they behave much closer to distributed microservices than most people expect.

Patterns that translate directly:

Service discovery becomes agent discovery
Circuit breakers become agent health checks
Load balancing becomes task distribution
Retry logic becomes agent loop continuation

Observability and Tuning

Allen highlighted an important reality: "You cannot optimize a multi-agent system without metrics. And Bedrock gives you cost-performance observability out-of-the-box."

The architecture includes:

Amazon Bedrock Observability Dashboard for tracking model invocations, costs, and latency
Momento Topics for real-time workflow visualization
A2A tracing for understanding agent interactions

You need the ability to tune for cost, latency, and precision, the same way you'd tune microservices for throughput and resource usage.

Key Takeaways

The core principles from the session:

Agents, not workflows, drive the system. You don't design flows. You design behaviors. Each agent has clear responsibilities and makes autonomous decisions within its domain.

A2A enables real collaboration. Instead of a central orchestrator, agents discover each other at runtime, delegate work, and launch new autonomous loops without bottlenecks.

MCP makes context reliable. Every agent uses tools and schemas that return deterministic, typed data, removing ambiguity from model reasoning.

AgentCore and serverless infrastructure give you scale with almost no operational burden. Lambda handles execution, AgentCore manages memory, and Momento provides fast coordination.

Determinism is what makes autonomous systems production-ready. Clear prompts, idempotent actions, and schema validation ensure predictable behavior even when models reason differently.

These are event-driven systems, design for idempotency and compensating actions. Agents may retry, events may arrive out of order, and workflows may need to roll back. Build for these realities from day one.

Allen's closing advice captured the essence: Multi-agent systems require the same operational rigor as traditional distributed systems. The LLM provides reasoning, but the architecture provides reliability.

Getting Started

The complete SwiftShip demo is available on GitHub: https://github.com/nullchecktv/swiftship-demo/

The repository includes:

Full agent implementations with RISEN prompts
A2A agent card definitions
MCP tool integrations
Infrastructure as Code with AWS SAM
Real-time UI with agent workflow visualization

Whether you're building logistics automation, customer support systems, or any complex workflow that benefits from autonomous coordination, these patterns provide a production-ready foundation.

About This Series

This post is part of DEV Track Spotlight, a series highlighting the incredible sessions from the AWS re:Invent 2025 Developer Community (DEV) track.

The DEV track featured 60 unique sessions delivered by 93 speakers from the AWS Community - including AWS Heroes, AWS Community Builders, and AWS User Group Leaders - alongside speakers from AWS and Amazon. These sessions covered cutting-edge topics including:

🤖 GenAI & Agentic AI - Multi-agent systems, Strands Agents SDK, Amazon Bedrock
🛠️ Developer Tools - Kiro, Kiro CLI, Amazon Q Developer, AI-driven development
🔒 Security - AI agent security, container security, automated remediation
🏗️ Infrastructure - Serverless, containers, edge computing, observability
⚡ Modernization - Legacy app transformation, CI/CD, feature flags
📊 Data - Amazon Aurora DSQL, real-time processing, vector databases

Each post in this series dives deep into one session, sharing key insights, practical takeaways, and links to the full recordings. Whether you attended re:Invent or are catching up remotely, these sessions represent the best of our developer community sharing real code, real demos, and real learnings.

Follow along as we spotlight these amazing sessions and celebrate the speakers who made the DEV track what it was!

DEV Community