Kazuya

Posted on Dec 5

AWS re:Invent 2025 - Agentic AI for member-owned financials: Systems that serve (AIM337)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Agentic AI for member-owned financials: Systems that serve (AIM337)

In this video, AWS Solutions Architects demonstrate an agentic AI loan origination system built for member-owned financials using Amazon Bedrock AgentCore and Strands Agents framework. The solution employs a multi-agent architecture with A2A (agent-to-agent) protocol, featuring a Supervisor Agent that orchestrates specialized agents for document validation, credit risk assessment, and compliance checking. Key components include AgentCore Runtime for serverless compute, Agent Memory for multi-turn conversations, AgentCore Gateway as a managed MCP (Model Context Protocol) server, and Code Interpreter for dynamic code execution. The demo shows automated document processing using Bedrock Data Automation, credit risk evaluation with ML models, and compliance validation with debt-to-income calculations. The system generates comprehensive PDF reports and uses OAuth 2.0 for secure agent communication. Built using Quito IDE with spec-driven development, the solution emphasizes observability through OpenTelemetry integration with CloudWatch for complete audit trails and regulatory compliance.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Agentic AI for Member-Owned Financials and the Loan Origination Demo

Hello. Good morning, everyone. I hope everybody's doing well. Thank you for joining our code talk today. Today, let's explore how member-owned financials can use agentic AI to improve their member experience. I'm Murali Ramanathan, Senior Solutions Architect at AWS. I'm Mrudhula Balasubramanyan. I'm a Principal Solutions Architect. We work with AWS on nonprofits and financials. So how many of you here are from member-owned financials? Awesome. Is it credit unions, insurance? So how many credit unions? Just one. How about insurance? Awesome. You guys outnumber any other member-owned organization. Which one is that? Oh nice. Okay, cool. Welcome all. You're definitely hardcore if you're showing up for the very first session of re:Invent. Hopefully we can get you off to a great start and provide you with a lot of value from this session for your time.

In today's session, let's explore the art of the possible where we'll show how we can use agentic AI for a loan origination system. This will enable member-owned financials to process loan applications much more quickly, thereby increasing member satisfaction and experience. The solution we're going to show today is built on Amazon Bedrock AgentCore, which Murali is going to talk about the features of. Awesome. So how many of you here have heard of Amazon Bedrock AgentCore? Awesome. And have you actually tried working with it? Okay, not yet. Okay, so you're in a good place. The idea for this code talk today is to demonstrate a completely working end-to-end solution that was built using AWS Strands agents. Anybody heard of Strands? Awesome. Anybody using other frameworks, agentic frameworks? You can just call it out. Autogen, okay. Anybody using LangGraph or Crew AI? Okay, yes, there you go.

So no matter which agentic framework you work with, and we're very aware there are many of them and probably more coming, AgentCore, which was launched last re:Invent and became generally available mid-October, is essentially a comprehensive agentic platform. The solution that we're going to show you today is an agentic loan origination system that many of you in credit unions and banks can relate to. It's an enterprise-grade, production-ready solution that was built using Strands agents and then deployed to AgentCore. Before we get into the architecture, we want to give you the basics. What is AgentCore so that when we go into the core and show you how it works, you'll understand exactly what it does.

Amazon Bedrock AgentCore: A Comprehensive Agentic Platform with Runtime, Gateway, Memory, and Observability

Bedrock AgentCore is a comprehensive agentic platform where you can essentially deploy and manage your agents. It's agnostic to the models that you use for building your agents and agnostic to the frameworks that you use for building those agents as well. It has a few key components which I'll go through quickly. There's AgentCore runtime, which is essentially the serverless component. It's the compute where you deploy your agents as container applications. Then you have agent memory, especially with agents where you might start off with a single agent and grow into multiple agents. You could have multi-agent collaborations. The idea is that you have a managed memory component that you can associate with your agent. This can be either short-term memory that enables multi-turn conversations, but also if you're working with multiple agents, you can have longer-term memory as well because this spans multiple sessions across multiple agents. So agent memory is essentially that capability of bringing both short-term memory and long-term memory to your agent applications.

Another piece we're going to be talking about that we used in our solution is Agent Core Gateway. How many of you have heard of MCP? Does that ring a bell? You want to call out what it stands for? Model Context Protocol, exactly. Agent Core supports these open source protocols, specifically the Model Context Protocol, or MCP, and that's really where Agent Core Gateway shines. It's essentially your managed MCP server that you can use as a unified interface for your agents to communicate with tools that are set up as targets behind that Agent Core Gateway.

You might have existing APIs in your organization or other applications that you could make API calls to using serverless compute like Lambda. So essentially, you can have a Lambda function running as an MCP tool behind the gateway. It supports open APIs or Smithy models on the backend, so all of those existing resources can essentially be MCP-enabled behind that gateway. That's really what it is—that MCP interface that we're going to be talking about as well.

We've covered Agent Runtime, which is the compute, the Gateway, which is that managed MCP server, and then the memory. There's also the identity piece. Any enterprise-grade application definitely requires end-to-end authentication and authorization. Agent Core Identity is that agent-managed identity service that centrally manages inbound and outbound operations. As you have your users and applications accessing an agent, and the agent is making calls to all those tools, we actually have A2A, so agent-to-agent communication, which is what we're going to be using in our multi-agent collaboration application that we're going to show you.

That inbound and outbound operation is centrally managed by Agent Core Identity, which is seamlessly integrated with the compute, which is the runtime. All of those different pieces coming together help you build a scalable, secure, enterprise-grade agentic application. You're also seeing Agent Code Browser and Agent Code Interpreter. These are tools that Agent Core offers. Agent Core Browser gives you the ability for agents to browse websites on your behalf, get the results, and process them.

Agent Code Interpreter gives you the ability for agents to run complex code at runtime itself. What that means is you don't need to write code that an agent needs to call and maintain that code. You can just define what code you want the agent to create and run, and the agent will do that. We'll see this in the example now.

One other thing I want to add quickly is that typically with agents and agentic applications, you tend to think of them as black boxes. Visibility into what those agents are doing—that observability aspect—is really important. The three pillars of observability are logs, metrics, and traces, and all of those are super important so that you can trust the system that you built. That's where Agent Core Observability comes in.

It supports OpenTelemetry, and it's integrated with AWS CloudWatch if you're familiar with that. You can essentially see at the session level, at the trace level, as well as the spans—the granular aspects of your entire trace end to end. Request flow and response flow of your agentic systems can all be seen and dashboarded using the OpenTelemetry logs and metrics that come out of Agent Core Observability. I want to point that out because that's a big piece of putting these agentic systems into production.

Architecture Overview: Multi-Agent Loan Processing with Supervisor, Document, Credit Risk, and Compliance Agents

Yes, it's a first party too. It's a first budget. Now, let me walk you through the architecture of the solution we built, then we'll show you a demo, and then we'll walk you through the code of the agents.

Here you're seeing the architecture of the loan origination system that we built using Agent EKI. Let me walk you through that from left to right. On the left, you would see the document upload, where an applicant uploads all of the documents needed for processing the loan. As soon as those documents are uploaded, they land on an S3 bucket. Once they land on an S3 bucket, a Lambda event fires off and then calls what we call the supervisor agent.

The supervisory agent is tasked with the role of saying you are a supervisory agent who will be processing loan applications. One of the tasks the supervisor agent would have is to find out what other agents are available that can work along with the supervisor agent to perform this loan application process workflow and then work with them to get the whole task done. The supervisor agent, using the A2A protocol, reads the agent cards of all of the available agents and then finds out that there are three agents it can work with: the document agent, the credit risk agent, and the compliance agent.

Once it decides what agents it can work with, it creates a workflow and then calls the document agent with the information of where the documents are located. The document agent, whose role is to validate the documents and also extract the information from the documents, knows that it can validate the information, but it does not have the data extraction capability. Then, using the A2A protocol, it reads the agent cards of all the other agents and finds out which agent has the document extraction capability, and that's the Bedrock Data Automation agent here, and then passes that information to that agent.

That agent has been tasked with the role of saying you are only to extract information from documents using whatever tools are available to you. The document agent extraction agent then uses the agent core gateway and does a semantic search to see what tools it can use to extract data from the documents. It sees that it has access to Bedrock Data Automation, which is an AWS service that extracts data from documents and images in an intelligent way using AI.

Here we are showing the art of the possible where the Bedrock Data Automation agent is accessing the BDA, or Bedrock Data Automation, using agent core gateway, which is simulating an MCP that you would have. So in a real-world scenario, you may not even have a Bedrock Data Automation agent. You may have your own data extraction software running, and the agent which is connected to the agent core gateway as an MCP server, and your Bedrock Data Automation agent will automatically connect to it for data extraction.

Once the data is extracted, it gives it back to the document agent, and the document agent then performs all of the validation to ensure that all of the data needed for processing the loan is there. Once it performs validation, it gives a status check back to the supervisor agent along with the data saying the data validation is passed or failed. The supervisor agent, as part of the workflow it built, then decides if the data validation is passed, then it will send it further down to the credit risk agent and the compliance agent for further processing.

But if the data is not there, using the code interpreter tool, it will send an email back to the applicant saying there is information missing from your loan application. Please provide all of this information for processing the loan application further. Once all of the information is validated, then the supervisor agent, as part of its workflow, using the A2A protocol, gives the information to the credit risk agent and the compliance agent for credit risk processing and compliance checking as well.

Now, the credit risk agent, using the code interpreter as a tool, calls the credit risk machine learning model, which does the actual credit risk assessment. In this demonstration, this ML model is stored in an S3 bucket that the agent builds code on the fly using code interpreter to download it and then interact with it to perform the credit risk assessment.

In a real-world scenario, you will be using the code interpreter to actually call a SageMaker endpoint or wherever your ML model is deployed and interact with it to perform the credit risk assessment. In the same way, the compliance agent uses the code interpreter tool to build code on the fly to perform the compliance checks and then passes all of the information back to the supervisor agent. The supervisor agent then gets all of this information and uses its decision logic, which we have provided in the prompt as a standard operating procedure for the supervisor agent, and then decides to either approve the loan, put it under manual review, or disapprove the loan.

Once it performs a decision, the supervisor agent uses the code interpreter again and Amazon's Simple Email Service to send an email to the applicant informing them of the status. It also creates a PDF document of what the loan application was with all of the details and uploads it to an S3 bucket so that the underwriter can review that document and perform the final decisioning as well. This way, with this solution we are making the underwriter's job very easy. We're making sure that when the underwriter has all the right information, they can quickly assess that information and decide whether to approve or disapprove the loan based on the criteria.

Live Demonstration: Watching the Agents Process a Loan Application in Real-Time

Now let's go ahead and see a demonstration of this solution, and then we'll walk you through the code and how we built it. So when you have the credit risk ML model, this is something you can also implement with the MCP. What is the future engineering that is to be done? How is it? The agents decide what features it has to extract. In this case, we're using the MCP for simply inference purposes. We're not actually training or doing feature engineering to build the model. In this case, it's already a built model that's been deployed.

That's a good question. Basically, you can have your own model as an MCP server, and in the compliance or credit risk assessments agent's prompt, you would define that as a standard operating procedure, specifying what features you would need to pass on and what features you would be expecting back. Using that, it will build the code and then interact with the model. Prior to getting into the credit risk evaluation or credit worthiness check, there is another agent in the workflow which is the data extraction agent. In the demo you'll see if you're ready, you can just go ahead with that and I'll explain to you more.

I'm now playing the role of an applicant where I'll be uploading the documents. Because this is a demonstration, we are not going through the process of using a chat window and uploading all the information as that would take time. I'm uploading the loan application, pay stub, bank statement, employment verification, tax returns, and all of these are demo data. I have now uploaded my loan application as an applicant. Now let's see what happens in the back end.

As explained, using agent core observability from the logs, we would be seeing how the agents are processing the information. We are seeing that the supervisor agent has discovered all of those specialized agents and is starting the workflow by sending the document to the document agent for verification. The document agent then knows that it has to verify and extract the data. Now it is searching for what agents can do the document extraction. It found the right agent, which is the document extraction agent. It gave that information to the document extraction agent, and the document extraction agent is now using a tool to communicate with the Bedrock Data Automation to extract data. It'll take a minute or so for the data extraction to come back.

I want to add that all of this is being done using the A2A protocol, so that's the design pattern that we chose for this particular solution because it gives us the most flexibility in terms of demonstrating at an enterprise grade.

The document agent came back with a response, and now the document validation agent takes that process back and performs the validation.

I was saying that A2A is really good when you have a handful of agents using a few tools. But as you start adopting and building more agents, you could be building agents, so you soon might have a lot of different agents using a lot of different tools. It really becomes a huge matrix. A2A helps you because it's doing the work for you in terms of discovering the agents that are already in your enterprise using an open protocol for agent-to-agent communication, which is the one we chose.

Now the document has validated and it is passing all of this information back to the supervisor agent. The supervisor agent has seen all of that information and is now passing it to the credit risk agent or to the compliance agent in parallel. As you would see from the code here, we are not defining for any agents like the supervisor agent or the document agent which agent you should use. It is able to decide on its own using the workflow which agent it has to use for the processing.

Now you'll see here the compliance agent has been done and the credit risk agent is now processing it using the code interpreter. You'll see that it is downloading the model from S3, then it's interacting with the model to perform the credit risk assessment. All of this is building the code on the fly and then executing it. So once the data has been extracted from the forms, you now have the information that you need to send in the request to the model. Those are the features, and those are all based on prompt.

Here, the credit risk has responded that the ML model predicts a 98.33% probability of a successful loan repayment. Now the supervisor agent has got all of this information and is now processing its decision logic to say whether I should approve the loan or not approve the loan. Do you have any questions?

I know it may be a lot to take in, but if you have any questions, please let us know. Do you have thresholds set on BDA where you're saying the confidence score level needs to be asserted? You certainly can do that. I think we're using some default thresholds on the back end, but BDA, if you know, we keep calling it BDA, but it's actually Amazon Bedrock Data Automation. It's a feature of Bedrock. This is a capability where you can essentially extract data using your custom blueprints.

In this case, as Marie was showing, we have five different documents that are pertinent to a loan application. But you could have, using Bedrock Data Automation, you can essentially work with multimodal data so that can be text, images, video, or what have you. And then you can specify blueprints, which is essentially the schema or the overlay on your data so that you extract only the specific fields that you care about. And then because it's AI powered, it's going to come back with explainability in terms of the probability or essentially the accuracy of extracting that information. So you can then use your business rules to decide what the threshold should be. And that's also a good point. The threshold is the criteria that you would set to maybe incorporate a human in the loop if you feel that it's below the threshold. And typically, you know, for something like credit decisioning, you do want to have a human in the loop, especially if it's a deny or not an approval.

You want to make sure your loan officer takes a good look at it and examines the probabilities of the information that was extracted. Use that as the basis to see why it was approved or denied. Here you can see that the Supervisor Agent has completed the workflow and has given a final decision of approve. It's showing all of the workflow summary. All of these details you are seeing are on the Observability, which is on the CloudWatch logs. Anytime you want to check and trace back what happened for a particular application, what steps the agents took, and how they arrived at the decision, all of that information is there for audit purposes. You can ensure that the right decisions are being made all the time.

Now let's look at the PDF it has generated. This is the PDF it generated. We will walk through the code now. In the prompt of the Supervisor Agent, we simply said create a PDF with all of these details, and it automatically created all of these. It gives all of the information for the underwriter, such as what is the credit risk assessment, what's the compliance validation, and what are the next steps. All of those things will be there in the PDF. The underwriter can use this to make their decisions very quickly.

Security and Data Accuracy: Addressing Prompt Injection, OAuth Authentication, and Audit Trail Concerns

Do you have any questions on the solution? How do you prevent prompt injection for the uploaded documents? How do you ensure that those documents don't have any sort of injective prompts in them to attack your model? That's a good question. All of these agents use the A2A protocol, and all of them use OAuth 2.0 authentication. Within OAuth 2.0 authentication, when these agents want to communicate with each other, only if they have the right token would they be able to communicate and access those functionalities. If some other external agent wants to interact with these agents to do a prompt injection, that would not be possible.

I'll add to that because that's a very important question. We haven't used that in this specific solution per se, but this is what you can do to make sure that you don't have issues with prompt injection or any other generative AI risks and how to mitigate them. Strands Agents is the framework that we're using here that supports guardrails. If you're familiar with Amazon Bedrock guardrails, it's essentially a capability where you can specify automated reasoning checks, grounded checks, denied topics which relate to the questions being asked, and content moderation, among other things. All of those filters can be associated with the guardrail, and that guardrail itself can be associated with the model that is used in the definition of your agent. As we go into the code, I can show you the exact lines of code where you can associate a guardrailed model in the agent itself. That is supported.

There's a lot of numbers here that need to be very accurate. How are you passing that from the extraction to all the different agents with several different calls? How are you ensuring that the data that's extracted is what the models are seeing from each of those individual calls? Is that stored in memory somewhere, or how are you actually making sure that those numbers are accurate by the time it gets to the Supervisor Agent? That's a great question, and I think that also leads into the auditability of the solution. When we talked about Bedrock Data Automation, the BDA tool, which is the workhorse that's actually looking at the documents and extracting the information and giving you the probabilities, all of that extracted data gets persisted inside an Object Store, which is S3. Long after the process is done, that data is still there for you to look at anytime by your regulators, your compliance officers, or anybody else. That then becomes the source, so the agents are simply doing the orchestration back and forth. They encapsulate your business logic, but the data itself is persisting in that object store.

Building with Quito: Spec-Driven Development from Requirements to Infrastructure Setup

Now let's look at how we build this solution. One of the first things I want to mention is that we built all of the solution using Quito or Agent ID. Using Quito, we worked with Quito. You can call Quito as one of our members. It helped us build out this whole solution in a very quick manner.

How many of you have played with Quito before? Have you even heard of it? This is our latest agentic coding tool. It's an agentic AI IDE, essentially standalone. It's built from the ground up. It's not a plug-in into any IDE like Q Developer, which you might have heard of. This is a standalone IDE, and it's really good at spec-driven development. We used the IDE for developing this solution.

I want to show you the spec-driven development pretty briefly. Unlike white coding, any white coders here? Everybody, no white coders. I thought if you didn't raise a hand that means everybody's a white coder. No worries. Quito supports white coding, but we're not just looking at POCs. We're looking at how you build an enterprise-grade application. So we're following a very rigorous process of using spec-driven development.

What I'm showing you here are the assets of that process—essentially your requirements, your design, and your tasks so you can implement the solution. Part of the requirements is essentially the entire SDLC, starting with what are your user stories and what are the requirements associated with each of those. Like we mentioned, we want to use the A2A protocol, and that was our starting point. That was one of our main requirements. As a systems architect, I want all my agents to communicate using this protocol.

There are a lot of requirements there. Agent discovery is based on what Strands supports and what Agent Core supports, using agent cards. There's quite a bit, and it's pretty comprehensive. Just for each of the agents themselves, they have very specific requirements. Like the BDA agent has its own set of acceptance criteria and so on.

Going into the design, the idea is measure twice, cut once, right? Make sure you come up with the right approach. The way we did this was we gave Quito a brief description of the problem that we were solving, and Quito came up with all of these assets—the requirements, design, and tasks. But that's just the first draft. It's a pretty iterative process. You can review what it comes up with, then you can modify using your natural language prompts. So you continue to define the design until you get to a state where it feels pretty close to your use case. That's exactly what we did.

It comes up with the as-built architecture. The loan request comes to the Supervisor Agent, which is then communicating using A2A with all the specialized agents. There's lots of information when you work through this in terms of what the design should be, and then into the tasks. These are tasks that you would execute in a phased manner. Starting with the infrastructure setup, this is infrastructure beyond just Agent Core. These are basic things like setting up your S3 buckets, which is where your input data—all your loan application forms—gets stored, as well as the data that's extracted by data automation that gets persisted as well. So getting all those buckets set up is part of the initial infrastructure work.

Getting your OAuth set up is important. In our case, we used Amazon Cognito for that. You might have your own identity provider that you want to integrate with, like Okta or Microsoft Entra ID. Getting that OAuth set up is another piece of getting ready to build a solution. The other things are how do you store your secrets, like Secrets Manager, which is very important, and then your IAM roles—your permissions that each of your agents would have to interact with other AWS services. Then there are things like Amazon Bedrock and the AgentCore gateway that I mentioned, so that needs to get set up so that the agents can then access the tools that are targets behind that AgentCore gateway. Some of that infrastructure needs to be done up front, so that's phase one.

Phase two is essentially building all of your agents and then deploying them to AgentCore runtime, which is your compute. You know, a complete set of tasks which then get checked off as the key role starts implementing your solution. So this is a structured approach to having an end-to-end solution. Now let me walk you through how we build the supervisor agent.

Code Walkthrough: Constructing the Supervisor Agent with Strands, A2A Protocol, and System Prompts

So for all of these agents, we'll be explaining them basically by breaking down into four sections. The first section is what libraries you need to import. Here, you need to import the strands multi-agent A2A from A2A servers. That enables the A2A communication. Then you also need to import the A2A client and from the client tool provider so that it is able to discover the agents. Then we are also importing the code interpreter so that you can provide the code interpreter as a tool to the agents. And the other two libraries are required for running the A2A agent as servers.

Then we have the configuration section where we configure what AWS region we are using, what buckets we are using, and any other parameters we are using. We also have code, and this is the only bit of code you would see in all of this, where we are handling the OAuth tokens between the agents. When this agent needs to communicate to another agent, they need to have the refreshed OAuth token, and this part of the code would handle that. Here, as mentioned, for the client ID, client secret, and token URL used for OAuth tokens, we retrieve all of them from Secrets Manager to make sure they are not exposed.

This would be the main part of the supervisor agent, which is the prompt. In the prompt, we define what is the role of the agent and what task it's going to be doing. Then we are also going to define how we are going to be discovering the other available agents and what kind of tools you'd be accessing to discover the other agents. Then we'll also be defining as a standard operating procedure when you're selecting other agents, what are the qualities you need to look for in those agents when you want to pass on the tasks to them. As an example, if you want to use an agent for document validation, then you need to make sure that when you read the agent card, it has words like document, validation, compliance, and completeness in their agent card. So only use those agents for document validation. Here we are making sure we're guiding the agents to make sure they perform the way we intend to.

Then we are also making sure how agent communication works. As many of you ask questions, how does an agent know what data comes back and forth and how they would be interacting with other systems, as an example, an ML model—this is where we define what is the type of output you would be expecting and what is the type of output you should be sending to other agents. We would be defining all of these things here. And then specifically for credit risk assessment, what you should make sure is that you are passing the entire validated data structure to the credit risk assessment agent so that whatever agent is performing the credit risk assessment, it has got all of the data. Do not just truncate some of the data and pass it across.

We provide to the agent the decision logic, specifying how the system should determine if a loan application is approved, requires manual review, or is rejected. Once you create your decision, you use the tool to make a PDF report, and we specify what information must be included in the PDF report and what the PDF report requirements are. Then we specify uploading it to an S3 bucket, indicating which bucket you should upload it to. We also specify where you should send an email. Because this is a demonstration system we built, we made sure that all emails go to just one email address. However, in a real-world scenario, if you can specify picking up the email address from the loan application or from the conversation you had with the applicant, then send that response back to that email, it would go ahead and do that. You would also want to specify how you want your email to look, what the formatting should be, and all of that.

One of the key advantages of this approach is that if requirements change tomorrow, all you would have to do is come back and change the prompt, which is very easy to do. This is more like natural language where you define in your own natural language what you want the agent to do, rather than having it written as code and maintaining the code. You can just come and change the prompt, and it can quickly adapt to your changing business requirements.

Regarding how the model runs and whether the output is the same the next time, you're bringing out the nondeterministic nature of these agents. However, in this case, if your inputs are still the same, it's the same set of documents, and the logic inside each of these agents, like the calculation of creditworthiness, whatever business logic you have inside the credit risk agent, those are still pretty deterministic because those are just mathematical calculations. The agents themselves are not like a text-based prediction. You're using a lot of prompts to drive the work, but you are under the hood being very specific in those prompts. This is where the standard operating procedures come into play. As long as you have that clearly defined in the prompts, the calculations will follow, and the results of those are still deterministic.

If the input stays the same, the data extracted stays the same, and as long as you're still keeping that same threshold with the data extraction tool you're using for extracting data from each of those documents, the data that's input into those downstream mathematical calculations still is the same. You should not see something flip from an approval to a denial. As a best practice, if you have a DTI requirement today, you may have another DTI requirement tomorrow. How you want to adapt is that you can change it in your system prompt. You can change the decision logic based on the DTI requirement.

Regarding the business requirement, it's great from an engineering perspective. The DTA prompts don't calculate the DTI in the way you might think. That calculation happens in the Compliance Agent, where we calculate the DTI using the code interpreter. There you can define how you want the DTI to be calculated. The Supervisor Agent plays the role of an orchestrator, passing information to all the other agents who have their tasks clearly defined. For the DTI agent, it will be the Compliance Agent where you have the prompt defining how you are calculating the DTI using the code interpreter, and there you can define your thresholds.

I think we haven't shown that piece yet, but there's a specialized agent and the prompt which captures that. So in designing a system like this, how do you recommend evaluating model accuracy between the different models if you were to switch out for, let's say, a different one? That's a good question. In Bedrock, we have model evaluation where before you start building these solutions you can start evaluating a model. There are two types of model evaluations we offer. One is we have curated datasets where you use those curated datasets to have the model run against them and you can see the accuracy, or you can bring in your own data and use it against those models to see which one works best and then use it in your system as well.

Basically it requires you to have some kind of ground truth data, something that you have historically for similar profiles of customers and what the loan decisioning has been. There's definitely a lot of work involved. You need to set up that evaluation harness on your end so that when you deploy these kinds of systems you have something, a gold standard to compare to the results that are coming out of it. We can't emphasize enough the importance of testing, especially with these nondeterministic applications.

So we have an end-to-end testing framework that includes unit testing, integration testing, and all of that. If you have specific evaluation metrics and a source of data that you can point to, all of that can be incorporated as well. Regarding design patterns, we are in this case using the A2A design pattern, which is pretty standard. Strands also supports others like workflow, which are more deterministic, and the graph-based design. If you think of each of the agents as being a node, the communication between them is an edge. You can define what that multi-agent structure needs to look like in a graph. That is one of the other design patterns that's supported natively by the Strands framework. A2A is more flexible based on agent discovery that we're using, but this is one of the design patterns supported in the Strands framework as well.

We have to separately define reflection, like the ReAct pattern. These use the ReAct pattern where they do the reasoning and actioning of it. Given that I have this task as a Supervisor Agent, the task is I have to process these loan applications and I have a set of tools available to me. They use reasoning to create a workflow, then use the available tools, the agents, and the code interpreter to perform the tasks.

Yes, so I think the best way to get to your question is through reflection. This is based on the model capability as well. In this case, the model we're using is Anthropic's Claude Sonnet 4.5, which is very capable of reflection. That's the one that's the brains behind each of these agents. As newer models come along that can support even more advanced reflection or additional patterns, you can have these agent core strands, which are all model agnostic. Each agent can essentially use a Bedrock model or an open source model, and each agent can use a different model. It doesn't have to be the same model. So it's built into the agent itself based on the model selection.

That's a good segue to the last fourth section, which we wanted to show in the code. After the prose, we define the agent. We define the name of the agent, the description of the agent, the system prompt, which is the system prompt we saw in detail, and as Mrudhula mentioned, the model, which is agnostic again. You can choose what model works best for you and you can have that model here. The tools are all the tools like agent discovery and the code interpreter tools. Finally, we have the agent running as an A2A server so that each of these agents uses A2A communication between them whenever they want to communicate.

The advantage of using A2A communication is that tomorrow you may have another agent which is not running on agent core but still using A2A communications. These agents running on agent core can communicate with them as well. The data that's passed between the agents uses A2A in JSON RPC message format. The output of one agent gets passed in that JSON RPC format to the agent, like the supervisor agent, for example. That message body contains all the information that's basically the input needed by the next agent.

Best Practices and Resources: Agent Design Patterns, Tool Management, and Official GitHub Repositories

All of these agents use the HTTPS protocol. So whenever they are communicating between them, even when they are communicating this JSON data between them, they're all encrypted. That function happens every step of the way. The OAuth setup that we used at the beginning, which I was talking about using Cognito, is a very important piece that we probably haven't mentioned yet. When you deploy your agent, you configure the OAuth configuration for each of those agents. You can have something like IAM SigV4 if you're familiar with that, which is the default, but it also supports OAuth 2.0. So in this case, every agent that we deploy has a configuration that needs to happen. We set that up as a JWT custom token. So every single one of those agent interactions is authenticated and authorized.

I think you had one question. Yes, correct. So the data across the sessions. Correct. The agent itself, right, so in this case, for example, the BDA agent, the data extraction agent that's actually calling and making that, we set up what is called an MCP client on the agent. So the agent is acting as your MCP client, and it's communicating with the Agent Core Gateway, which is the MCP server.

Behind that, you have a Lambda that's making an API call to Bedrock PDA. The communication between the MCP client and gateway uses OAuth 2.0 authentication and authorization. Behind that, it's all IAM permissions, so that's where our IAM role comes in.

Here you have mentioned all the tools and the single agents. I have experience where if we provide more than 40 tools or something, the agent gets confused between choosing each of the tools. You want to talk about the semantic search? Yes, so one of the features we have in Agent Core Gateway is what we call semantic search. As you said, giving an agent more than 40 tools means it does not decide what tools to use. There are two things: one is making sure our prompt is very clear about what our requirements are so that it specifically knows what task it has, and then it can search the right tool. Agent Core Gateway supports something called semantic search. When the Bedrock Data Automation agent has to do data extraction, it first does a semantic search like "data extraction." What tools are available for data extraction? Then the gateway returns the available tools for data extraction, and then it chooses that and performs the data extraction.

In your case, if you have more than 40 tools, you can have them behind an Agent Core Gateway, and then the agent can do a semantic search based on the task it wants. The Agent Core Gateway will return maybe 10 tools out of 40, and then out of those 10 tools, the agent is able to select the right tool to perform the task. The really best practice is to think of the agent as being a microservice. Essentially, that same mindset applies. Depending on the business function, it's definitely purpose-built for that. So you want to give it only the tools that it needs and only the permissions that it needs for those tools. You don't want to have 40 tools. For example, specialized agents should only have a couple of tools, maybe 2 or 3 tools that it really needs to get the job done.

In the supervisor agent, this is accumulating because it's trying to help the specialized agents figure out which tools to use. It's listing them, but the supervisor agent itself is not really calling those tools. So it's different. But yes, that's a very good point. You need to have purpose-built agents and just the tools that they need. We just have 1.5 minutes left, so I'll walk through one more agent, and any other questions, we are available outside the room to take them as well. We can just wrap, and I just wanted to quickly showcase the Compliance Agent, the prompt, and show how it is doing the debt-to-income ratio. You would see in the prompt here it's using the code interpreter to perform these calculations. Tomorrow, if your business requirement changes, you can come here and change it in the prompt itself. It's not the supervisor agent which is doing that. Here we have purpose-built agents for each of these functions. When you want to change any of these functions, you can come to those purpose-built agents and change the prompt there, and your newer business logic would come into play.

Do you want to show them the code that they can go to? We want to give you something actionable so that as you walk out of the room today you have something to go build. We have a couple of official repositories. One is the Strands Agents itself, which contains a GitHub repo that has all the different design patterns that you can build with. A2A is one of them that we use for today's solution. Another official GitHub repo is for Bedrock AgentCore. It's a really rich set of code samples. We based a lot of our solution on those code samples that are in the Bedrock AgentCore repo. So please use those. They accelerate your entire development process. These are vetted solutions meant for building secure, scalable agent applications. So go have fun with those, and definitely try Quiro. It's been very helpful for us. Thank you so much for listening to us. Hope you like this, and we are available outside the room for any questions.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community