Kazuya

Posted on Dec 11, 2025

AWS re:Invent 2025 - Build, deploy, and operate agentic architectures on AWS Serverless (CNS359)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Build, deploy, and operate agentic architectures on AWS Serverless (CNS359)

In this video, Dhiraj Mahapatro and Heeki Park demonstrate building agentic applications using AWS Serverless services. They explain the evolution from basic LLM inferencing to tool use and agents, using a retail customer support scenario where customers face double charges and missing items. The presenters cover implementation patterns with Strands Agents SDK, comparing imperative workflows in Step Functions versus declarative agent approaches. They introduce open standards including MCP (Model Context Protocol) for agent-to-tool communication and A2A (Agent-to-Agent) for inter-agent collaboration, showing how to combine these with event-driven architecture using EventBridge. Deployment options are explored across Lambda functions, ECS Fargate, and AgentCore Runtime, with practical examples of decomposing monolithic agents into domain-specific microservices for orders, payments, and inventory. The session emphasizes security considerations, identity propagation through AgentCore Identity, and architectural patterns including serialized pipelines, orchestrator patterns, and swarm architectures for production-scale agentic systems.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Building Agentic Applications with AWS Serverless

Good morning everyone. Thank you for being here, and happy to have you all for this session CNS359. My name is Dhiraj Mahapatro. I'm a Principal Specialist Solutions Architect at AWS, and with me I have my name is Heeki Park. I'm also a Solutions Architect here at AWS covering Generative AI.

All right, so we're going to kick it off today. Hopefully you're excited. We're getting started here at re:Invent. We're going to talk about building, deploying, and operating agentic applications using AWS Serverless, and so we have a lot we want to cover, so we're going to dive right into it. This is just a quick overview of our agenda, but again we're going to dive right into it.

Just a couple of quick assumptions here. This is a 300 level session. There are some assumptions around some knowledge. We won't cover a lot of the basic definitions. We're going to make some of those assumptions. We're also going to start with some principles again as architects with your architect hat on here. We're going to want to talk about some of those principles and then we're going to apply some of the pragmatic deployment thinking here.

Customer Use Case: A Retail Order Problem

Okay, so the first thing that we want to think through is as we think about designing single agent systems we're going to then transition into multi-agent systems, but before we do that, why don't we start with a customer example, and we're going to use this example throughout this talk. So you imagine we just went through Black Friday. You're shopping, perhaps gift shopping. I was charged twice for my order. In fact, there should have been two items. I only received one, and I need it actually tomorrow for a gift, right?

And so here is they were charged twice. Here's issue number one that instead of being charged once, as you would expect, the payment system perhaps accidentally charged you twice for some reason. Also, there you did not receive all of the items in the order and then of course there is this rush, this impending need to get it tomorrow because you need to actually give this as a gift. So we're going to think through these problem statements throughout this talk.

So for the serverless aficionados here, we may build this with a serverless set of serverless APIs. This is your retail application. Again, you can imagine using domain driven design we may have some different bounded context. You'll have a number of different APIs here, right? So maybe your orders API is going to manage the life cycle of that customer purchasing process. You might use inventory management to look in your warehouse, see what's available again. Maybe on Black Friday you may have sold out of some of those. So you want to be able to notify the front end that some of these items perhaps are not available.

Again, similarly with payments, we want to maybe do direct payments or maybe even invoicing of your customers and then of course the support API for maybe your support organization when this request comes in, we can then use this to go and triage what perhaps went wrong. Okay, so with this we think about this customer experience and what are we going to try to solve for, right?

These are some of the challenges that we're thinking through is as a customer support agent, maybe I have to respond to these and I have to look in multiple disparate systems. Maybe there's a payments portal internally. Maybe there's something around inventory management. You can imagine trying to build a dashboard or some type of support system that allows me to see all of these disparate sources of information all in one place.

Furthermore, of course you want a good customer experience, so we don't want the customer to be sitting and waiting for a long period of time. So we want to be able to address this customer issue as quickly as possible and also you want to make sure that your support personnel that they get quick guidance as well so that you don't have your customer waiting for a long period of time before they get feedback of, hey, what's the next potential resolution.

Integrating LLMs: From Search to Dynamic Tool Use

So can we do better? Can we do better? Now you're thinking, I could probably do this just by having better, you know, unit test integration tests within my application itself, but even we were just talking with some folks in the front here that there are ways to perhaps get insight into data quicker using perhaps an LLM. So in this scenario, in this scenario, perhaps you integrate an LLM. We're going to invoke a model, provide it with some amount of data, and get some type of reasoning insight back from this model, right, and we're going to talk a bunch about how do we do this in a way that makes sense for our application.

So the quick naive way we want to think about it is we want to provide context to this application and to this LLM. The first way we may think about doing that is through this idea of search. Here the key thing is you're going to have a prompt, so maybe you have an application, you're going to construct a prompt through this application. You will then perhaps go and get some information out of a knowledge source, so this could be some type of semantic knowledge store and maybe this is a prior history with this customer, maybe this is the inventory data warehouse, and maybe this is a finance payments API.

We're going to then go and extract that knowledge and then submit an enhanced prompt to the LLM. The key thing that you want to think about here is the fact that the application is orchestrating the flow of information retrieval. So it is the application that is going to parse the prompt. It is going to go and say, okay, we need to get some set of data out of this knowledge source and then submit that to the LLM.

So as a developer building this type of application, the developer needs to write and orchestrate all of this logic. Perhaps you may also think maybe we want to do this a little bit more dynamically. Maybe as a developer I don't know the actual workflow, so perhaps there's a way we can do this where we can have the LLM make a determination of what the next step is without writing imperative code.

So here you have that same prompt, the same application. And what we're going to now do as part of this prompt is to submit a list of available tools that this LLM can then make a decision about. The LLM can then come back and say, hey, I like tool A and B. I'm going to go make a request. Can you application go and fetch this data on my behalf? Now the application on behalf of the LLM will go and fetch the data from that same knowledge source and then we'll submit that enhanced prompt. So the key difference here is maybe as a developer I've actually simplified the type of code that I'm writing or the amount of code that I'm writing and offloading that logic to the LLM because maybe this is something we want to do a little bit more dynamically.

Understanding Enhanced Prompts and the Agentic Loop

Okay, so we talked a little bit about this enhanced prompt, and you may be wondering what is that, what exactly is that enhanced prompt? Well, great question. Here's perhaps a JSON document that shows examples of pieces of information that may be in this prompt. So of course there's a system prompt that then defines here is the persona, the set of capabilities, and maybe some boundary behaviors that we have for this particular agent. And then of course at the bottom then is the human prompt that was submitted, but the key thing here is all of this additional context at the top is something that the application is going to provide in addition to the human prompt.

So these are all things that are perhaps hidden from the user, maybe in our case the customer support representative, but these are all included as part of that enhanced prompt. And the key thing here then is this list of tools. So we talked before about there's this list of tools that we're going to submit to the LLM to then give it an opportunity to make a decision about what additional information the LLM believes it needs in order to make a better choice or a better insight into the data that we want to have.

Okay, so if we then bring this back to our traditional retail application, we then have this application that can go and make these tool calls. We can then essentially wrap our serverless APIs as tool calls in order to then provide these enhanced prompts to the LLM. So this traditional retail application is going to recursively make calls to the LLM. The LLM maybe requests data, then makes these tool calls, and then kind of has this iterative loop. This is something that the developer is going to write this recursive loop for until it comes to some type of base condition that says we are complete and we are good to then return some type of response back to the end user.

Okay, so of course this is now we're talking about agentic applications and what is the difference between what we did before and perhaps what is an agent today and the key difference here is you'll see at the top there is this agentic loop where it repeats as needed. So what does that look like? So again we have this prompt, we have our agent now where it invokes this model. Again with that list of tools, it gets that response back again we went through this, we execute a tool, we get that result, and then of course we repeat as frequently as needed, right?

And so it is now the agentic framework that is going to be handling what, how many times do I actually need to loop. Maybe I need to do this once or maybe I need to do this three times. So this is something where we're going to think through this tool execution. So we're going to be continuously retrieving context from our different knowledge sources via these tool calls so that the LLM has sufficient amount of information or data or context in order to make a good decision. Okay, and then of course we then return that final result.

Decomposing Monolithic Agents into Domain-Specific Microservices

So one thing that we may want to think through then is we had that serverless API, we perhaps built it as a single monolithic agent, which is okay to start with, but we may also want to start thinking about do we want to potentially decompose that similar to what we had done before when we had monolithic APIs, monolithic large services which again in this audience you probably know all of the challenges around scaling, around deployment, around agility. We can start to apply some of those same principles as you start to think about agents. It's okay to start with a monolithic agent, but maybe as you start to build out the use case, maybe you want to start to decompose it again, maybe into agentic microservices.

And here you see we take each of those serverless APIs that we had before, which again you had done that decomposition previously, and now we start to wrap them around with agents. Now you may be thinking, okay, kind of makes sense, but why might we want to do that? So one reason you may want to think through is as you build these domain agents you can imagine you're going to have lots of the business coming back and saying we want additional capabilities.

Maybe developers realize we can extend the capabilities and provide better customer value with these different domain level agents. And so here we have the Inventory Agent that is using that serverless inventory API, but now we start to extend it and say, can we add additional capability? Maybe we're going to add things like find alternative product recommendations based on the information that this domain agent already has. So this is where you can start to build out additional capabilities within a certain scope of, let's say, back office or line of business capabilities.

From LLM Inferencing to Agents: Implementation Evolution

Okay, so with that, I'm going to hand it over to Dhiraj. Thank you, Heeki. So in a nutshell, the principles that Heeki talked about actually is the foundational of how you should think about agents and how it has evolved. So we started with LLM inferencing, something like RAG, which provides context. Then we talked about tool use where you can imperatively call different tools, and then we talked about agents. Now my goal here is to show how this is implemented and how you can see that this evolution going from like basics to actually using an agent and developing an agent.

So let's go back to an example that Heeki talked about. You have a serverless application, you're talking to, let's say, Bedrock models, and the user is calling for the same problem that Heeki brought up. He was charged twice for the order. So how an agent or how do you, how your application will handle it today? Think about calling an LLM. You have to provide all the context like you are an order assistant. These are the tools that you have at your disposal, and this is the question of the user problem that the user has. Now you have to go and solve this. That's where the LLM figures out based on the information that you have provided on the order. First I have to go and get the order details, right? And it's the ownership of the application, the serverless application to go and somehow fetch the order details. It can go and call an API, invoke a Lambda function, and get that responses back, and the main part here is once the responses received, it has to be fed back to the LLM as an additional context. That's how the LLM gets additional context and can work on the next steps.

So as the next steps, it can go and say, okay, I have the information now I have to go and get the invoices, and once you got the invoice, the results, and it has to feed that information back to the LLM. And finally, once all of those things are figured out, then it can start the refund process. Once the refund process is complete, the final response will be sent to the user saying that, okay, the refund was applied. So you see that the agentic loop that Heeki was talking about, without using an agent, we are already talking about a loop here. So there is a loop where your application has to talk to LLM. And LLM has to respond and you have to feed or update new context.

Now, I heard about Step Functions. Like if you can do this in Step Functions, right, in a workflow. In a workflow, if you want to build this, these are multiple steps. The most important part is the choice state here, if you can see the choice state. So every time a tool is executed, you have to figure out in the choice state which tool to use and programmatically you have to go and execute that tool, feed that information back to the model so that loop continues in a recursive manner, right? So we see this as a pattern. Right, you can do the same thing with Lambda functions also instead of using a Step Function, if you want to write that code in a Lambda function, you can do that. But overall we see this pattern that we have Lambda functions, Step Functions, or your application code that is running in Fargate or EKS. It has to follow the same pattern like calling LLM, calling tools, and it has to go in a loop, right? So that is the pattern that we have to build on top of it.

But let's see what are the benefits of this pattern. If you go with this tool use with Lambda functions or Step Functions, you can call the existing APIs. If you take Amazon Bedrock, you can call Converse API, Invoke API from those services like Lambda functions, and Step Functions as native integrations with Bedrock API. But the important part is the choice state that I highlighted. You have to imperatively write the code to actually figure out which tool to use, right? So that's a thing to remember here. That allows you to build predefined paths for execution, so in most of the cases you'll have to create a workflow to create the predefined paths.

But what if you want a workflow or a process or a loop where things can change dynamically based on the new context, in this case, if you have to change anything in the step function, you have to add a new branch to the choice state and build, develop, test, and then deploy and then make the changes. So that's a key thing to remember. Is there a better way to handle this tight coupling? So if you think about it, there is a tight coupling. Whatever branches you have from choice state, those are the ones that will be executed, and that is how much you are limited with the LLM use cases. So how can we make it better so that you don't have the tight coupling, but automatically that code execution happens based on which is the right tool that LLM talks about?

Building Agents with Strands: A Declarative Approach

That's where agents come into the picture. So the drastic difference between you writing the imperative code compared to what an agent can do is the agent has the innate capability of figuring out which tool to invoke without you writing the choice state. So here is an agent example where we use Strands Agents. Anybody using Strands Agents yet? Okay, cool. Strands Agents is an open source Strands Agents SDK which we announced a couple of months ago, and that's an easy way to get started and we'll see how it works.

So now the question is, how should I develop an agent? We know how to build Lambda functions, the functions, but how should I develop an agent? It's as simple as running another library you want to run in a Lambda function or any other compute. So Strands Agents is a very lightweight open source SDK to build agents, and the most important part that is critical to Strands Agents is it is model driven and it relies on the agentic loop capability that he and I have been talking about. We'll see how it works.

But how to get started? With Strands Agents, you get started very simply in a couple of lines. This is a Python code where you import Strands Agents as the library. You import some tools. There are some predefined tools that you can import. You can define an agent like it is done here, and you can provide a prompt to the agent. Now, one thing that you don't see here is I have not mentioned about the LLM. So Strands Agents has this opinionated way of using a particular LLM on Bedrock. If you don't provide that, if you want to experiment, you can easily start with this, and it uses a particular model. I think it uses Claude Sonnet 4.5 right now on Bedrock, but this is an executable code right now. If I want to run this as a Python class, it'll run and give me a response.

Now, how would the order agent look like? So I showed the step functions workflow that was the order agent, sorry, orders workflow. If I want to run the same thing in Strands Agents SDK for orders, this is how it would look like in a single Python class. First, you import the Strands Agents from the SDK. And then if you see, the most important part here is I have just Python methods here, and they are decorated with tools. And that decorator is from Strands Agents, which means you can make any of your Python methods act as tools when you're working with Strands Agents.

And then you create a boto session, define which region you want to operate or use, define the model. In this case, I'm going one step ahead instead of relying on the default model selection that Strands has. I want to go and use Claude Haiku 4.5. And I define the token counts, I defined the session, and now I have everything. I create the agent and if you see line number 111, I'm providing all the tools which are just the method names that I have defined up above, and then I call the agent.

So the drastic difference between the workflow and this one is nowhere in this code I have written an if then else logic or a switch statement. Which means when the new context comes or the responses come from LLM, the agent's responsibility is to figure out whether it has to call get orders detail or get invoice or start refund process in whichever order it has to be invoked, so there is no imperative code. So we move from writing imperative code to actually agent working in a declarative manner and getting things done.

So that's one of the major benefits. Let's see what else. You get in-built error handling. All the best practices of how agents should work is available in Strands Agents SDK, and this is not limited to Strands Agents. You can experiment with any agentic framework, but Strands Agents is something that I love to work with. It's very lightweight, so it's very easy to work with.

Most importantly, Strands is model and compute agnostic. While I used a Bedrock model in this example, Strands allows you to work with any model provider, not just Bedrock. It can be Anthropic directly, it can be OpenAI directly, or it can be a model that is running on SageMaker as well. Plus, since it is just an SDK, you can run it on any compute. It can be a Lambda function, it can be Fargate, it can be EKS, or it can be on your EC2 instances. Finally, as I mentioned earlier, the major comparison is instead of writing imperative workflows which are tightly coupled based on the use case, this is much more declarative. You can define a new tool by writing a new method and providing that tool in the agent's tools array, and then you're done. The LLM and agent will work together to figure out which tool to invoke and in which order.

Running Agents at Scale: The Challenge of Multiple Protocols

Now that we have a good understanding of how a single agent as a unit of work can be built, we have to figure out how you can run those agents at scale. At the end of the day, when you run it in production, it will not be one agent running on your machine. It has to be multiple agents and multiple tools. When you want to run agents and tools at scale, this kind of chaos is what you will aim for or look for. The reason I say chaos is because we have seen the world when it comes from monolithic to microservices, and we found a way how to build and operate microservices. There will be agents in your department, your agents, and there will be tools that you want to use which are very local to you. There will be agents and tools in your organization, and there will be agents and tools that are externally available. At the end of the day, you don't want to repeat yourself. If you have tools that are working for you, your agents should be able to invoke those tools instead of you creating those tools again. This is how a production setup would look like. Our goal is to be here and then run at scale.

For that, we have some standards that we can rely on and make agents work with tools and other agents seamlessly. So let's see. Again, going back to this, my agent can call a tool that is inside the department. My agent would also need to call a tool in the organization, and it should be able to call an external tool. So how do we make this work? If we don't use any standards, the agent and tool integration would look like this. You have agents, and if I have to use a tool that is supported only for SQL or it's a database-specific tool, then the agent has to write SQL or has to support SQL. That's the standard of the protocol that has to be used to integrate with the database. If the agent has to work with a GraphQL endpoint, then the agent has to learn what GraphQL is and has to integrate with the GraphQL endpoint. Same with RESTful endpoints.

Now you can see what is happening. Agent builders who are building agents are responsible for supporting all those different protocols. And on the tool side, whoever is building tools, they are restricted to the scope of tool vendors. If a vendor is only specifically working on databases, they have to now think about their user base to grow, their agent clients to grow. They have to support GraphQL or REST endpoints, whatever. You can see there is a tight coupling happening when you think about different protocols and where agents have to work with different tools. Those different entities, the agent builders and the tool builders, have to support different ways of communicating, and that's how they can operate together. So that's a tight coupling.

Model Context Protocol (MCP): Unifying Agent-Tool Communication

That's where Anthropic came up with MCP. Has anybody heard about MCP, Model Context Protocol? What it does is it actually unifies all of those different standards and says, okay, you don't have to use different standards when you want to have agent to tool integration. We'll use MCP as the protocol. For that, you need an MCP client which will be used by the agent and an MCP server that has to be hosted by the tool provider. Now what happens is the agent builders will only need to support a single standard, that is MCP. And the tool builders or the tool providers will have to make sure that whatever capability they have is exposed as an MCP server or MCP capability. So how does it look like in the agentic loop? Whenever you have agents calling tools, the agent will work as an MCP client and the tool providers will be the MCP servers.

So a single MCP client using Strands Agent would look like this. If you see here, I have imported Strands Agent and MCP Client from Strands Agent because Strands Agent supports MCP Client natively. If you see line number five, I'm actually calling an external or remote MCP server whose job is to go and scrape the web. It's a web crawler and it should give some responses back. Then I get all the tools that the MCP server has to offer. I provide that tool to the agent and I ask a simple question like, "What are the top five best rated cities in the USA grouped by breweries, and provide the answer in a tabular form?"

Anybody from Denver here? No one? Okay, because Denver always comes up in the top when it comes to top five breweries. This is the answer, and I executed this as just a Python class. You can see where I've highlighted it executed the tool that is provided by the remote MCP server and it provided the response. It scraped through different pages and provided the response in a tabular format. So this is a simple example of invoking remote MCP servers.

Now, if you are building MCP servers, how would it look like from an MCP server or a tool provider side? The easy way to start building an MCP server is using FastMCP. You can import FastMCP and create a basic FastMCP server. In this case, I'm just creating a math server which just does the addition of two numbers, and I've defined these are the tools. One of the tools is adding two numbers, and I've decorated that as MCP tool and I'm just running that MCP server at port 8000. When I try to run this up, the server comes up, it's up and ready, and it should be able to take calls from MCP clients.

Agent-to-Agent Protocol (A2A): Enabling Inter-Agent Collaboration

So now we saw from the MCP client side how you should be building MCP clients, and the MCP server side where the tool providers should be building MCP servers. Coming back to the same running at scale in production, this is how we ended up with any tool communication that happens from an agent will be over MCP as a protocol, a standard. But what if an agent wants to talk to an agent? There can be a case where agents will try to communicate with an agent instead of a tool. Let's see what happens if we do the same thing without a standard and go from there.

Let's go back to the same use case, like I was charged twice. I have an Order Agent that is run on Strands, which is an MCP client. It has its own tool, it talks to its LLM, it is an agentic loop. Think of that orange dotted boundary as the bounded context of the Orders Agent. Now, in order to get a proper refund or payments, it has to work with the payment domain that has the same setup or a similar setup with an agent talking to an LLM and a bunch of tools that are specific to payments.

Now, if I don't use any standard, then the Order Agent has to directly talk to the tools that are in the Payments domain, which means we see a scope creep. Now the Order Agent has to understand what are the tools available in the Payments domain. If you think about it, this is the same, we are back to square one when we think about domain-driven design and building microservices. I think everybody here will be aware of domain-driven design and why we should think about bounded context. So there is a scope creep. You run tools from different domains, so you don't know. Our Order Agent will not know what tools it has to get from the Payments domain, what are the authentication and authorization requirements it has to work through.

There is an ownership leakage because the Payments domain has to own its tool and somehow has to make sure that it is exposed to other agents in a way without other agents directly invoking the tools, if you think about it. Last but not the least, if the Payments Agent is actually continuously revising its tools or how to work with tools, then the Order Agent has to keep up, otherwise it might fail.

How would it look like if we go one step up and do inter-agent communication with MCP? Because we figured out MCP is a standard and an agent can talk to tools, why not have the Payments Agent work as a tool for the Orders Agent? That's doable. It's better than not having a standard because it relies on MCP as the protocol. But the Payments Agent is used as a tool,

which means the order agent has to discover what this agent is capable of or what are the tools that are available for this agent. The agent should somehow be able to figure out that it has tools A, B, and C to work with. It needs to reach out using MCP, and then it has to delegate all of its MCP calls to its tools. And then, if you look at it, the payments agent must act as an MCP server for the orders agent, and it has to act as an MCP client for its own tools, right?

Can we do it better? That's where Google came up with Agent-to-Agent, which is A2A. It's a standardized protocol for AI agents to discover capabilities of existing or other agents, exchange tasks, and collaborate on complex workflows. We'll see how it would look like if you want to start building with A2A. If you take the orders agent and payments agent, the first thing that the order agent will ask the payment agent, if the payment agent is the A2A server, is give me your capabilities, give me your agent card. The agent card will have all the details: the name, the version, the description of the agent, and the skills.

The agent will have skills and capabilities, and that is kind of an abstraction of how this agent works with its own tools. From those skills and capabilities, what are the supported modalities, what are the ways to authenticate and authorize, all of those will be present in the agent card. Now the order agent has the agent card, so it can figure out how to work with the agent. It'll create a task for the payments agent, and then it'll process the task. In the meantime, if there is more information required from the payments agent as an A2A server or from the order agent as an A2A client, they'll be able to communicate. And finally, the results will be returned back to the order agent.

So now, evolving from just using MCP, this is how it will look like if you use A2A. It's similar to the concept of using choreography in between domains and orchestration inside your domain. So you're using MCP when the agent has to talk to a tool, and you're using A2A when the agent has to talk to an agent. So what are the improvements? Now your payments agent is a specialized agent, and it has capabilities to do everything that it has to do related to payments and exposes an agent card.

Now, the payment agent doesn't leak the domain knowledge because the agent card is taking care of how external agents should be interacting with the payments agent. And then A2A as a standard, as a protocol, supports both synchronous and asynchronous communications. So it can happen where the payments agent can asynchronously call or send some responses, and the order agent can provide a webhook or something and it can get that information back without waiting on the payments agent response.

Event-Driven Architecture with MCP and A2A

So this is an overall picture of how Agent-to-Agent will work. Now can we go one level up? Can we bring event-driven architecture with A2A and MCP? I'll go back to the same example. You have this is where we left off in the previous slide, like the agentic loop of orders agent and payments agent and A2A for inter-agent communication and MCP for communication inside the domain. And we have this request coming in that I was charged twice for my order. Can we use event-driven architecture? I believe we can, and that will allow you to get more benefits.

What if that request comes as an API call and you have an event producer that accepts that API call with API Gateway and Lambda functions, and then they emit events to an event broker like EventBridge? Now you have an event going to the orders agent saying that order support is placed. The order agent can work with the payments agent using A2A and using MCP for its own work, and then ultimately send an event back saying that the order support is completed. Now that event can be captured by the broker again, and now I have another event consumer whose work is to notify the end user, which can use WebSocket and Lambda and then ultimately notify the end user.

So what is the benefit of using this? We talked about MCP, we talked about A2A standard, but why EDA? Why event-driven architecture?

So the main benefits apply to every application that we build using EDA. It's loose coupling. So when you bring loose coupling with intelligent coordination, you build something really powerful. The agent is autonomously working and doing its work, and at the same time you have the capability of emitting events to the agent and not waiting for the agent to send responses back, which means you can scale really well without depending on or failing when the agent fails.

You can get context of event handling. So MCP provides a standardized way for agents to access contextual information when it comes to databases, APIs, and so on, but an agent receiving a customer inquiry event can automatically fetch those contexts and work independently of other agents. As I mentioned, you can scale agents because the events that we talked about, like order placed or order support placed, I just showed you the order agent working, but what if there are different other agents that are actually working on that event? They can start working on that particular event type without touching the order agent or without doing anything with other agents, so you can parallel invoke different agents when it comes to event-driven architecture.

The other important aspect is the asynchronous intelligence. So when something happens inside your system, for example, let's say a support ticket was created or a payment was requested or an order was misplaced, when you convert that to an event, you don't have to invoke an agent synchronously anymore because now that event will be actually invoking, or the agent will come back to life when the event is received. So it can work asynchronously and do a lot of intelligent automation for you.

Overall, you get observability and auditability because when you use an event broker, you are capturing all the events that are going to the broker. And if an agent fails, you can replay those events and you can retrigger those agents and make them work. So that's another capability. And it's resilient because it's decoupled, and if the payment agent fails or if the order agent fails, you'll still be getting those events. And you can gracefully degrade by bringing up another agent or another application to work on those events. So overall these are the benefits of using all those open standards with event-driven architecture.

If you have to add a new agent, you don't have to touch anything around what you have already set up. If I want to add a new agent to this setup which has to work on the order support placed event, I don't have to touch anything here. I'll just add that agent, subscribe to the new event, and keep on going. So my blast radius will be smaller. I don't have to test everything.

Deploying Agents on AWS: Serverless Patterns with Lambda and Fargate

Cool. So now that we know how you build the patterns, event-driven architectures, the standards, and so on, the next question is how will I run it on AWS, and that's where I have Heeki to talk more about how you can deploy agents and tools on serverless compute. Cool, thank you. So far we talked about tool calls and these agentic frameworks. This allows us to use the LLM to reason through completing these tasks or requests on behalf of your end users.

We also want to then use these agentic frameworks and these tool calls to be able to get additional data context so that the LLM has the appropriate amount of information to adequately process whatever this request might be. And Dhiraj talked about a bunch of these open standards, ways that we can make sure that developers of different teams are doing this in a uniform way that makes it simpler for you to then operate in production. So let's think through a little bit of how we then do this using some serverless capabilities here on AWS.

And so we're going to think through some patterns here. We're going to go through serverless functions with Lambda. We'll do some serverless containers with Fargate, and then of course we'll do a quick dive into AgentCore which went GA about a month or two ago, and we'll look through some of these deployment patterns and how you can actually apply this into your organization. Before we do that, let's take a quick high level view of what it looks like as a developer as you're building out these capabilities, and the first thing is that foundation capability which Dhiraj spent a bunch of time talking through is you can select whatever agentic framework you're going to use. Of course here at AWS we love Strands, but we've also seen some of our customers choose alternate open source frameworks. And the beauty here is that you can take whatever framework that you're going to use.

Again, I'll just talk about Strands, and you can develop that locally, do a lot of testing locally, then you can package it up in whatever artifact is required for your compute target, which we're going to talk about momentarily. So you have that same experience of developing locally and also deploying into production.

And then of course, when you're exposing these, whether these are in private VPCs, again maybe you have a requirement that's internal only. I think we were talking about that earlier with some of the audience where you have internal only private network capabilities, or maybe you're actually providing this as a service to your end customers, so necessarily these become public endpoints, right? And so if you're going to be doing this with Fargate, of course you're going to perhaps use an ALB, and that will become the ingress mechanism for all requests coming into your agent application. How you'll be exposing that in this scenario is just going to be as a simple HTTP web interface.

Now if you're going to be doing this with Lambda, perhaps you're going to have API Gateway in front of it. You're going to have all of the nice things that API Gateway provides you like an authorizer, WAF integration, CloudFront integration, etc. And now what you'll use is you're going to use that Lambda handler that we love and integrate that with that HTTP interface for your agentic application.

And then last but not least, we'll cover a little bit of AgentCore in a moment. That is similar to your Fargate scenario where you're going to have your regular agent application with AgentCore. Then we're going to actually just use imports and dependencies, and then we're going to actually do a little bit of annotation, which again we're going to show you momentarily.

The beauty here is that these are all different options depending on what your organization necessarily requires. I've talked to some customers where some of them say, you know what, all of our artifacts are going to be container artifacts because we've built out our CI pipelines to do all of our testing, security scanning, all of the governance mechanisms based on a container-based artifact. If that's the case, then you may actually make certain choices about how you're going to be deploying your artifacts. Alternatively, if you love Lambda and you love that zip artifact deployment, that could be one approach that you're going to be using, and we're going to talk through some of these patterns, and we'll go through these one by one.

So of course the first one would be if you're using ECS Fargate. Again, you have a pipeline based on containers. Again, you're going to standardize on ECR or Elastic Container Registry. So your agent application, again you're going to build that Dockerfile. You'll then bundle that as a container image locally, push that into your ECR image, into your ECR repository, and then deploy that into Fargate right with the ELB ALB in front of it. And in this scenario, it's just that like Dhiraj had shared before, this is that FastAPI and Strands application or maybe that FastMCP application.

Next, of course then is, as this audience is probably familiar with, your Lambda function. You have both your zip and container deployment artifact, and this then allows you to get that API Gateway to Lambda deployment architecture. You can similarly do the same thing if you're doing MCP servers. So we spent a bunch of time talking about agentic frameworks, whether you're using Strands or otherwise. You could do similar. Again, Dhiraj showed an example of being able to do these MCP servers as well. That same architecture can actually apply in this serverless paradigm.

So again, whether you're doing it with API Gateway and Lambda, again you have those same artifacts, or you can do that into your Fargate. So the nice thing again here is that you have that local developer experience building out that framework again with your local MCP servers. You could then do things like MCP Inspector to connect to that MCP server locally and to do some amount of testing before you do that deployment.

Similarly, so Dhiraj shared before about that order and payments agent scenario. So if we're going to actually apply that scenario and do this now with API Gateway and Lambda with all of these serverless capabilities that we already love, in this scenario, the order agent acts as the client to the payments agent. So you can imagine having just a client that connects to the API Gateway endpoint for the payments agent, and that then acts as the A2A server. So across domains, across these bounded contexts, you have these API calls that are going from the A2A client in the order agent Lambda function to the API Gateway endpoint in the payments agent.

And again, as Dhiraj then mentioned, if you want to make these internal tool calls, again, these can just be MCP calls from each of the respective Lambda functions. So you can imagine the order function. I've written some of my code, the code that Dhiraj was showing before in my Lambda function has that MCP client, makes those tool calls within the order domain, similarly on the payments agent side as well. So the nice thing is that we're going to be applying all of these serverless deployment capabilities and using those open standards that Dhiraj had already been talking about. So we talked about all of the things that we already love on serverless.

AgentCore: Fully Managed Serverless Compute for Agents

Another option that you have then is can we do this at a higher level of abstraction? Can we have some more fully managed services? And this is where AgentCore comes into play, and we'll do a quick primer here again. There are going to be a lot more sessions here at re:Invent where you can go a lot deeper.

There are kind of a suite of services that are available to you to make the deployment of your agents easier. The first, of course, is AgentCore Runtime. AgentCore Runtime is really your serverless compute for running your agents, and this allows you to again bring any open source framework against Strands or otherwise. It can interact with any model. Again, we of course here prefer you to do it at Bedrock, but you can then choose any other provider for your models.

There's an AgentCore Gateway, perhaps a little bit analogous to API Gateway, but AgentCore Gateway is built specifically as a tool aggregator. So this becomes your, similar with API Gateway, you use API Gateway as the front end for all of your APIs. AgentCore Gateway becomes the front end for all of your tools, so it provides very similar capabilities around auth and a lot of that front end capability. It also allows you to do things as both public tools or private tools. So a lot of those same capabilities that you expected out of API Gateway are there available in AgentCore Gateway.

These are some of the managed tools. So again, Dhiraj showed like the calculator tool example, common use cases that we see for people who are building agentic applications and the type of tools that they need. Again, Dhiraj showed that browser one with the Denver pub breweries, but also there's code interpreter. We've seen a lot of customers who say we're going to use these LLMs to actually generate code on my behalf. However, given that this is generated code, it is untrusted code not written by your developers. You don't necessarily want to run this in your environment that has exposure to the rest of your production environment.

So we've seen a lot of our customers say we want to run untrusted code in an isolated environment that has no access to any of my other production data. Code interpreter is a managed capability that gives you isolation of running untrusted code. So those are just some of the managed tools that are available to you.

Then of course there's AgentCore Identity. One of the biggest things that Dhiraj and I, as we talk to our customers around agents and even MCP, is around what are the security implications of building out these agents. How do we ensure that if my customer support agent representative is making this request to my agentic application and then is making downstream tool calls, how do I ensure that that agent is not elevating privileges, is not getting access to data that he or she should not have access to? And so AgentCore Identity is one of those mechanisms that simplifies the chaining of these calls so that the original principal or the original user making these calls, that identity or those permissions are propagated down even down to the tool calls.

And then of course we didn't, we're not going to cover this in much depth, but as you continue to build out these agentic applications, what we're starting to see is, you know, we talked a bunch about how do we provide context to the framework, how do we ensure that it has the appropriate level of data. Well, there's also that chat interaction that you have with your agentic application. So you can imagine I'm a customer support representative, you know, Heeki just gave this support ticket around, you know, this order that was double charged, did not ship all my items, and I need it by tomorrow.

You can imagine perhaps I had a prior support session where I was talking about an issue I had perhaps with payments, maybe there was an issue with my credit card. You can imagine some of that information may be relevant for this newer, well now it's present, but it could have been a future interaction. So these are interactions we may want to save and store where we can do things like short term and long term memory in order to enhance that personalization experience with these different end user interactions.

And of course, as you can imagine with any of these capabilities, we want to make sure that observability is tied into this entire stack. And so this is where we standardize on OpenTelemetry format to ensure that your traces, metrics, and logs are all shipped in a way that can be ingested into your enterprise observability capabilities. So this is a quick overview of what AgentCore allows you to do.

What we're going to do is we're going to spend a little bit more time on AgentCore Runtime and Gateway as serverless options for running your agents and your MCP servers. So similar to the deployment patterns here, fortunately with AgentCore Runtime you have the same deployment capabilities. So again in your organization, if you've built a lot of your CI capabilities around deploying to Lambda functions as zip artifacts, you can leverage those same deployment pipelines in order to be able to deploy zip artifacts to AgentCore Runtime. Again, the same would apply if you're doing this based on containers.

Similarly, if you're building out MCP servers, I know that there are a lot of folks out there either consuming existing public MCP servers or saying they have a bunch of internal data and want to be able to expose this data. Maybe you want to build out an agentic data platform that allows certain enterprise data to be exposed to these agents internally in order to be able to give good internal use cases. This could be something where you build out an MCP server on top of data. Again, we want to ensure that security is implemented in a way that protects that data so that there is not any privilege escalation or exposure of data to entities that should not have access to that data, and then you could again deploy these MCP servers into AgentCore Runtime.

Similarly, as we talked about with AgentCore Gateway, Gateway becomes that front end for all of your tool capabilities. So again, in your organization you may have Lambda functions already built with some amount of business logic that you want to then expose that business logic as a tool call to your agents. Well, you don't actually need to do anything with that Lambda function. You don't need to refactor it to make it a tool. You can actually put AgentCore Gateway in front of it and provide it with some additional metadata. So when you're putting AgentCore Gateway in front of a Lambda function, you provide it with that description that Dhiraj had showed before of the tool description, and then that will then get passed as part of that tool list to the agent. So now the agent has information about this particular Lambda function and whether or not it needs to be invoked.

Similarly, if you have OpenAPI schemas, lots of folks in this room, I imagine, are building out your microservices strategy. You've defined your APIs using OpenAPI schema. You could then import those OpenAPI schemas, and they can then be exposed as tools via AgentCore Gateway. So now we go back to that example where we were doing this before with API Gateway and Lambda, where maybe that Lambda function was my Agent-to-Agent client and maybe API Gateway was exposing that Agent-to-Agent server. Well, in this scenario now we can replace some of those serverless capabilities with AgentCore Runtime and AgentCore Gateway. So we're trying to make this simpler, and the nice thing here then is with both the Agent-to-Agent calls across the agents but also the MCP calls within the agent, we actually do a lot of the security capabilities with AgentCore Identity.

I spent a lot of time talking to customers around how do we do OAuth 2.0 with my MCP tool calls and how do I scope that down to make sure that the end user request with a certain set of scopes can only make these tool calls. Similarly with Agent-to-Agent calls, when we're making a call out to a different agent, we want to ensure that we're using the right identity provider and making sure those OAuth 2.0 flows are implemented throughout that process. So you know, with AgentCore, what we're trying to do with this service is to make a lot of these capabilities more fully managed. Again, in this audience, you guys love serverless. We're trying to give you that serverless experience as you're building out your agents and your MCP capabilities.

Multi-Agent System Architectures: Orchestration and Async Patterns

So we covered a bunch of these practical deployment patterns. Now we want to think through again, I had that earlier slide of how do we start to decompose perhaps that monolithic agent and go out to these multi-agent systems. Dhiraj talked about using these open protocols in order to be able to have the different communication patterns, and then Dhiraj also talked about some of these event-driven architecture patterns. So let's think through that a little bit more as we get close to wrapping up. We have these decomposed agents: orders, inventory, payment, support agents. We've talked about different deployment patterns, so maybe some of these, maybe one line of business likes using Application Load Balancer and Fargate. Another line of business likes API Gateway and Lambda, and maybe another group is like, hey, we're going to kind of cut our teeth on AgentCore. And so again, you can make your choices about where you want to deploy these.

We also talked about this orchestration, right, and so those three common problems. So the first might be that double charge. Again, perhaps you could have just solved this by just doing better unit integration testing and rooting out the root cause in those serverless APIs, but also you can then augment and create an agent that is able to do some of that root cause analysis on your behalf. Similarly, for the order that included two items and only shipped with one, maybe you're going to actually make a call out to the Orders Agent and go and try to make a determination as to why it didn't ship. It may also even talk to the Inventory Agent. Maybe it'll go and say actually there was some race condition where the front end exposed the fact that we had one item left, but actually the reality was that item had already shipped and we actually were out of inventory. Maybe there was some type of race condition. So this type of orchestration can then actually be done all by this Orchestrator Agent.

The orchestrator agent can actually handle all of this on your behalf. Now, again, we talked about those challenges of your customer support agent who wants to get faster response time and wants to be able to resolve issues in a quicker way without having to do a lot of manual work across a lot of different dashboards. Well, this orchestrator agent can actually do a lot of this on your behalf and will actually simplify that and get you that better customer experience.

As Dhiraj talked about, maybe instead of an interactive customer support session, you could be in a scenario where maybe this is actually a retail planning exercise where you're doing quarterly purchasing decisioning where you actually want to make a determination about how much inventory we need to buy across different categories. We just went through Black Friday, but now we're going into the Christmas shopping period. What type of inventory do we need? So this could be something now where you build out this planning agent. Maybe this is actually an async agent going and working in the background and saying we're going to go and collect data from a variety of different data sources.

Actually, we're going to do this using some of the async patterns that Dhiraj talked about before where we can actually fire off. Maybe we're going to go and do research in our inventory database. We're going to do some analysis on what is the history of order patterns over the last 36 months, over the last 48 months, and then we're going to actually collate that with maybe different orders information. So we can start to get and build out some of these reports that maybe your inventory planning or your retail planning team can then leverage.

The other benefit here, again, as Dhiraj talked about, is that you can imagine maybe some of these agents have different scaling requirements. Maybe the inventory agent is one that is far busier and maybe that actually needs to scale out far more rapidly, whereas maybe the support agent, maybe you've actually built your system so well that the number of support requests is actually reducing. So maybe that's something that doesn't need to scale as rapidly. And so this allows you to then, similar with the microservices world, scale your resources based on wherever the requirements may be.

Comparing Agent Architecture Patterns: Trade-offs and Use Cases

Okay, so let's talk a little bit about pros and cons of some of these different architectural patterns that we've been talking through here. So if we start with that single agent architecture, again, similar with monoliths, it's just easier to develop. It's easier to debug. You get clearer reasoning traces and possibly, I say possibly, fewer compute resources. Again, the idea being that because I don't need to make these network calls, I don't need to do a lot of waiting, these might be certain benefits. It just is simpler to be able to build out these capabilities.

However, for those that are already experimenting, you're probably aware when you're interacting with LLMs there is this notion of a context window. That data that we've been collecting and enhancing that prompt with is a finite amount of data that we can provide to these LLMs. So you may have heard there are some models at this point that can handle a million token context windows, and so you could in theory exceed that context window and you would then have to do some work around compressing whatever information that you want to provide to that context. The other challenge is if you put too much context that is not relevant to that particular request, you may be in a scenario where it takes longer for the LLM to respond and it's actually more expensive than it need be. And so this is where you may want to start thinking about do we want to decompose into these finer grained, smaller agents that have, maybe they're going to be faster and actually perhaps cheaper to run with these models.

So there are a couple of multi-agent patterns that we're seeing some of our customers build out. So the first is this serialized pipeline. So this could be something like we're doing a content generation, right? So you can imagine in content generation you're going to do some amount of research. You're going to then maybe build out a draft, and then you're going to maybe build out a final, maybe a final production copy, and then maybe you're going to do some feedback loops. You can imagine this is a scenario where this is just simple to model. I realize I'm running out of time here, so I'm going to speed up along. But there are downsides similar to Christmas lights. If there is a problem at one point in the stage, you can actually end up bottlenecking there.

The next is the orchestrator pattern which we've already talked about kind of at length here. This allows you to do decomposition, allows you to actually orchestrate some of these patterns, but again there is some communication overhead as we had just mentioned. And then the last pattern here is around a swarm architecture, right? And so the idea here is you may have a scenario where you want to do actually a broad research. You don't have a finite path to that end result where you maybe actually want to send out a large number of agents or a number of agents to go and do research in parallel.

It's kind of like the scatter gather pattern where you're going to go out and do research. You're going to bring that back and try to synthesize what a number of these agents have done. The challenge here though is that in this type of swarm architecture it is a lot more difficult to reason about some of the behavior and it is somewhat harder. So again, make sure that your use case aligns to whichever pattern makes the most sense. Again, I realize I'm coming up on time. I'll just skip to the last one.

Security, Governance, and Key Takeaways

We talked a little bit about security, so this is one where you want to build out your agents in a way that propagates security context appropriately. You also want to make sure that you have steps along the way where as an end user comes into the agent, the agent comes into the MCP server, and the MCP server makes the downstream API calls. You want to make sure that those are all separate sets of tokens. You don't want to use the same token throughout. And of course with governance you want to make sure that you are using the same type of governance policies and mechanisms that you're deploying into production that you've already been using.

All right, so just a couple of takeaways as we try to wrap up here. A lot of the things that you've already been doing with serverless APIs in production you can actually continue to apply those as you're building out agents into production as well. You want to use agents in a way that really bridges data and gleans insight out of that information. Again, you don't want to use this kind of as this low-level shim across your data. You want to use this as higher-order insight gathering, and serverless again can be a way to accelerate your path to production.

And of course, as we talked about, a lot of these patterns allow you to scale your agents as you build out your particular use cases. Again, there are a number of key considerations. I think we're out of time at this point, but certainly just make sure that you're documenting along the way, build and be scrappy, and then again document and provide feedback so that your internal platform teams, your internal developer teams are able to learn from the things that you are learning within your organization.

We do have a number of resources. Again, you can take a quick picture of that one and find that afterwards. There are a number of other sessions that we want to encourage you to participate in, and so these are some of the other ones that we recommend you go and check out. And of course there are other serverless resources. This deck will be provided afterwards as well, so I recognize there are a bunch of people trying to take pictures.

And again, Dhiraj and I do want to thank you. You can find our socials here so you can connect with us afterwards. Thank you everyone, and please do fill out the survey. We appreciate it.

; This article is entirely auto-generated using Amazon Bedrock.