Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Moody’s: Architecting a multi-agent system on AWS (IND3303)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Moody’s: Architecting a multi-agent system on AWS (IND3303)

In this video, Samuel Baruffi from AWS and Dennis Clement from Moody's Analytics present how Moody's evolved from a basic RAG chatbot to a sophisticated multi-agent system on AWS. Dennis explains how Moody's, serving 1,500 customers across 165 countries with 600 million entities in their Orbis database, faced challenges requiring 99%+ accuracy for high-stakes financial decisions. They progressed from Research Assistant (December 2023) through PDF upload capabilities to custom orchestrators with 80 tools, 100+ workflows, and specialized task agents processing over a million tokens daily. Key technical solutions include using Amazon Bedrock's global cross-region inference for capacity scaling, Bedrock Data Automation for complex financial table extraction, and intelligent page classification routing different content types to specialized processors. Their agentic retrieval system decomposes queries and executes multiple searches rather than single-shot vector search. Sam introduces AWS AgentCore primitives including Runtime, Gateway, Identity, Observability, and Memory for building production-grade agents at scale.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Moody's Journey to Multi-Agent AI Systems

People that have put on a headset, can you hear me fine? Can you raise your hand if you can hear me fine? Awesome. Perfect. First of all, thank you so much for being here on a Wednesday morning in Vegas at 8:30 and choosing us versus Swami. I appreciate it. This is IND 3303, Moody's Architecting a Multi-Agent System on AWS. My name is Samuel Baruffi, and I'm a Principal Solutions Architect at AWS, and I have the pleasure to have with me Dennis Clement. Dennis is a Managing Director of Engineering and Architecture for Moody's Digital Content and Innovation.

In the next 60 minutes or so, Dennis and I are going to go through the journey that we've been working together on how Moody's has revolutionized their way of thinking and serving their customers through an agentic AI system. Before I get started, maybe I can ask a question for the audience, and you can just raise your hand. How many of you are currently building agents? Okay, a good majority. How many of you are in the financial services industry? Okay, awesome, the majority of you as well. So hopefully this will be very useful for you.

A quick look at the agenda on what we're going to cover today. We're going to start with Dennis taking us through their journey from almost three years ago on how they saw an opportunity of revolutionizing their business through the lens of generative AI. What was the vision? What were the challenges they faced? After that, we are going to go through some of the technical foundational services that they are currently using on AWS to implement and deliver those services through agentic AI.

Then we are going to take you through the architecture they've decided on, some of the challenges they faced, how they solved those challenges, and how they actually evolved from a simple chatbot through a multi-agent system on AWS. A very important lesson learned that Dennis is going to share with us is that financial services have a lot of unstructured data. How do you actually retrieve insights from those financial documents into a structured ability to use that in agents? Dennis and I are also going to share a little bit more looking forward at what the vision is and what the future entails. So I'm going to pass it over to Dennis to kick us off.

Understanding Moody's: A Century of Financial Intelligence Meets AI

Thanks, Sam. Good morning. I'm Dennis Clement. I'm a Managing Director at Moody's Analytics, and I lead one of the largest engineering teams focused on generative AI. This is a team that has been producing actual production-grade generative AI products since about 2023. I'm going to show you how a 100-year-old financial institution is redefining what is possible with a multi-agent system on AWS.

I need to start with who we are because I think it's really important to understand who we are so that you know why we built what we built. Yes, Moody's is a 100-year-old credit rating agency, but that's what's actually really interesting. That 100 years of domain knowledge is actually what makes us really dangerous in the generative AI space. We are not a tech company trying to understand financial institutions. We are the preemptive service that financial institutions need to solve their problems. We serve over 1,500 customers across 165 countries, 97% of the Fortune 100, and this is all not just ratings anymore.

We are doing risk intelligence across a bunch of different domains: credit, climate, economics, compliance, and more. Folks aren't coming to Moody's just to browse. They are making billion-dollar decisions right across a ton of their own internal systems. We need a strategic AI strategy for customers to actually make a difference on a day-to-day basis. One of the big challenges that we have is the fact that we have a lot of different types of customers coming into our systems. We have commercial banks running loan origination, and they need a multitude of different data sets to solve some of their problems.

We have asset managers that need climate risk models to do portfolio analysis. We have insurance companies that need regulatory compliance validation. It's the same platform, but very different problems to solve. This is our one-pager of our data universe.

Understanding this complexity is key to understanding what we build. We have about four core pillars in our data universe: ratings, research and insights, data and information, and decision solutions. We create our own authoritative datasets through our ratings agencies, which represent decades of research documents, credit opinions, and sector outlooks. We also operate Orbis, one of the largest databases of company and entity data in the world, containing 600 million entities. We are creating climate risk analysis, doing economic forecasts, performing know your customer screening, and handling regulatory filings. As you can see, the scale is quite incredible.

With all of this, we have to truly focus on being accessible, accurate, and fast. The question is how do you build an AI system that takes all of this into account. Our customer diversity is what really drives why we cannot compromise on accuracy. You have to look at who relies on us: 2,600 commercial banks processing loan originations, 1,900 asset managers making portfolio allocation decisions, and 800 plus insurance companies running regulatory stress tests. These are not low-stakes use cases, and that is why introducing generative AI into our system was challenging.

This is our accountability markers. This is why we deliver precision. Our SaaS products carry the Moody's name, and when our company makes certain credit decisions and certain credit risk profiles, the market moves. This is why we really needed to think through why our AI systems need to work for high-stakes financial environments.

The Challenge of Precision: Why 99% Accuracy Isn't Enough in Financial Services

This is where our journey really begins, not truly technical, but what forced us to rethink everything. This is the big fundamental tension that we faced. When a commercial bank is making 500 million dollar loan decisions or an asset manager is rebalancing a 2 billion dollar portfolio, 99 percent accuracy just is not good enough. That missing 1 percent could be catastrophic.

Let me paint a picture of why this is so hard. Look at the complexity we are dealing with. A single customer comes to us needing expertise across all of our different domains: credit ratings, sector analysis, economic comparisons, and regulatory compliance, all simultaneously. These are not adjacent domains. These are different knowledge universes, and they require separate analyst teams. Not only that, we are also in a regulated space, so we have to think about compliance, data separation when users input their data into our systems, and true data isolation.

All of this means that we needed to make really specific decisions about how we were producing our products. What makes it even harder is that our customers are not just querying Moody's data. They want their own data as well. They are uploading financial documents and PDFs into our system, and we have to seamlessly merge not only our expertise but also their datasets to solve real-world problems. The PDF reality and that unstructured data is difficult, and we are still tackling it and still fighting that fight, but hopefully a winning fight.

Now we will give you a little bit of how we evolved. This timeline tells our story. We started like most people did with a RAG application. We deployed our research assistant about December of 2023.

Users loved it because they got answers grounded in real research, and it was able to truly solve a lot of their problems. However, the moment somebody asked something truly complex—where you're comparing risk for multiple companies, their financial metrics across your portfolio, analyzing news—that's when it started to crumble. These scenarios were very difficult for a basic RAG application to solve.

Then in August 2024, we gained the understanding that people wanted to introduce their data into our world. We released our first PDF upload capability, which is where a customer adds their data into our systems to solve real-world problems. That's where our unstructured data story really begins, because that's where we truly started to realize where we needed to upskill ourselves and our systems.

Now at the end of 2025, it's the culmination of all of that. We have learned from our customers. We have created custom orchestrators with specialized workflows with specific task agents all working to solve designated tasks for our users. All of this is powered by AWS infrastructure, operating agentic loops and all the evaluations that we need for our systems. Here is how this works in practice. This is Moody's Research Assistant, our first generative AI chatbot released in 2023. You'll see it's a simple interface, but behind the scenes there's tremendous complexity. This is a specialized intent engine routing to the right expertise and routing to the right data that is necessary. It quickly provides context and delivers fast and reliable answers. All grounded in real data, and let me tell you, we painstakingly cite everything that we can because this is what gives our customers the continued trust in what we are doing and how we are serving our clients.

But this is where our vision really went, and this is what changed everything for us. We spent much of our time in Research Assistant tweaking prompts and tweaking the intent engine for really small gains. We realized early that the future isn't better prompts—it's better context. Research Assistant proved itself by saving people 60% faster insights and delivering 30% reduction in task completion time. We were processing decades of Moody's proprietary research to serve our clients, but we realized we could do even more. We realized there were certain things this chatbot was just not doing for us, so we moved from asking a question and getting an answer to orchestrating specialist intelligence. This is how we were going to move forward.

AWS Bedrock Foundation: Global and Geographic Cross-Region Inference

Now I'm going to let Sam show you what the foundation helped build for all of these systems. Thank you, Dennis. Before Dennis takes us through the journey of how they've implemented AWS services and AWS infrastructure into their agentic system, let's quickly go through some of the services and the high-level idea on how they work. The journey that Dennis just took us through is what we've seen in financial service industries and other industries as well. I really like this slide because it demonstrates exactly the journey they've taken. They started with Research Assistant as a generative AI assistant back in 2023. They started creating specialized agents as the next step. Dennis mentioned the PDF upload, and of course, they are now working and have released agentic AI systems as part of a more autonomous multi-agent system collaboration that we're going to spend a little more time diving deeper into in a moment.

But how do you achieve that with AWS services? There are multiple services. We start, of course, with Amazon Bedrock. If you watched the keynote from Matt yesterday, Amazon Bedrock is not just a service—it's a full ecosystem of services that allow companies across different industries to build production-grade workloads through agents and foundational models. The most important aspect of this is the ability to choose models from different providers in a true serverless fashion.

This is what Moody's is currently doing. However, if you have different use cases such as requiring a RAG database and a managed ingestion pipeline, Bedrock also has capabilities to allow you to do that. One of the challenges of moving from simple chatbot systems to multi-agent orchestration that requires a lot of tokens from foundational models is how you achieve that capacity. This year, Bedrock was very heavy at work on helping customers with more capacity and the capability to expand.

There are 22 features within Bedrock inference of foundational models that I want to touch on first. The first one is what we call global cross-region inference. The way it works is that when Bedrock was released 2.5 years ago, you could call a model within a single region. On this slide, if you have a specific application running on a container or a Lambda, you could call Bedrock within that region, and that region has a specific limit and capacity for a specific model.

What Bedrock now allows you to do is enable the global cross-region inference endpoint. If you see at the bottom of the screen, this is the inference profile ID that you can pick when building applications. What that allows you to do is if the region where you are located does not have capacity or you have achieved the limits of the region for your specific account, it can automatically route your request. Using the internal backbone of the AWS network, it asks the question: where is there available capacity that is closer to my user across commercial regions where Bedrock and that model are available? This model, of course, is Claude 4.5. There are multiple models that support a global cross-region inference endpoint.

In that case, Bedrock behind the scenes will route you to a specific region that has capacity. In this slide, we have chosen AP Southeast One. Keep in mind that everything behind the scenes is using the AWS network backbone, so nothing is going through the internet. If you also hit the limits of that specific region, Bedrock will then redirect the traffic across any commercial available region where that model is available. What that allows you to do is increase your resiliency and increase the limits that you can actually call that specific model for your specific application.

A very common challenge that financial services companies face is that you need to be bound to a specific geographic location. Maybe you have data residency requirements or regulatory compliance that you need to obey, which means you can only serve a specific geographic region. Now you can still call just a specific region, and that is fine, but you have the limits of the capacity of Bedrock and that model within that specific region.

What we introduced as well is what we call geographic cross-region inference. It works very similarly to a global cross-region endpoint. The difference is that rather than picking a global endpoint, you choose a specific geographic location, for example, Europe, the US, or Asia. What happens behind the scenes is if the capacity in this case for EU West One has been breached from your account and you need more capacity, behind the scenes working the same way that I explained for the global endpoint, we will find another region within Europe specifically and redirect the traffic to that specific endpoint for that specific model. Of course, there are more regions that support Bedrock within the European regions, and that allows you to do that.

Why is that important? If you are within a regulated industry, you can still keep the data within a geographic region while increasing the capacity and the ability for you to serve multiple agents that require a higher throughput of tokens. That is definitely something Moody's is using to help them orchestrate across multiple regions in their system.

Knowledge Base for Amazon Bedrock: Managed RAG Workflow and Vector Storage

Dennis mentioned multiple times the challenge of embedding unstructured data into a RAG vector database. As part of the Bedrock service ecosystem, we have something called Knowledge Base for Amazon Bedrock. What this allows you to do is have a fully managed end-to-end RAG workflow. It not only gives you the ability to embed documents into a vector database, but it focuses on the whole ingestion pipeline for a raw document, a PDF sitting on S3, for parsing the document to creating chunks for the document.

It also gives you the ability to embed the document, save it to a vector database, and choose multiple ways you want to retrieve the document. It's a completely fully managed solution that allows you to have peace of mind rather than trying to manage every single component of the stack. You can just rely on AWS with Bedrock Knowledge Base.

If you look at the data ingestion workflow, let's assume you have different file formats: text documents, PDF documents, and you might even have some multimodal documents, maybe some audio files or video files. You want the insights from that data to end up in a vector store that you can retrieve. The way Knowledge Base works is you choose an S3 bucket where you save those files. That S3 bucket will evolve over time. Maybe you'll have updated documents or new documents. Knowledge Base can keep an eye on it and update the ingestion pipeline for you. Once it realizes there is a new document or an updated document, it can do the parsing.

There are multiple ways you can choose to do the parsing. You can use the default way for Bedrock Knowledge Base, you can use a large language model to parse the document, and you can also use what we call Bedrock Data Automation, which is the last service I'm going to mention today. Once it does that, you decide what chunking strategy you want to use. You might want to use a fixed chunking strategy, or you might want to do your chunking strategy with a Lambda function, which allows you to do that. Moody's is actually using this capability. After that, you can choose an embedding model.

The embedding models are available within Bedrock. We have Amazon Titan, Amazon Nova multimodal embedding model that we just released about three weeks ago, and also the Cohere embedding models. If you don't want to use some of these models and you want to use maybe some open source or open weight models, you can also host those embedding models on SageMaker and point Knowledge Base to SageMaker. Finally, because Bedrock wants to provide you with flexibility, you can choose the vector database you want to use behind the scenes.

You can use open source services where you don't need to manage any infrastructure. You can use some partners as well, such as Pinecone or Redis. If you are into more relational databases and you're very familiar with PostgreSQL, you can do PG Vector and use either Aurora PG Vector or RDS PG Vector. The list is bigger as well. We've just announced the general availability of S3 Vectors, which is the ability for you to store vectors into S3 with 90% price performance compared to traditional vector databases.

Bedrock Data Automation: Extracting Insights from Unstructured Financial Documents

Now, that is going to talk to us about how they've taken the very fundamental challenge of unstructured data. You have these PDFs that are full of complex financial tables and graphs. How do you extract not only the text from those PDFs, but the insights and the tables in the proper format? There was a study done by Amazon that found 80% of the data in financial services is unstructured. It could be in PDF documents, maybe it can be audio or video from financial reporting that the company has published. However, only 20% of organizations are actually currently able to take advantage of this unstructured data.

We introduced a capability on Bedrock about a year ago called Bedrock Data Automation. For those specific challenging documents that have a lot of different unstructured data, how can you actually put all the unstructured documents into S3 and extract the insights from that? Well, you can use Bedrock Data Automation.

The way Bedrock Data Automation works has two approaches. The first way you can call Bedrock Data Automation as a traditional API and pass the document. It supports multimodality, so it supports audio, video, image, and documents. Dennis is going to go through how they use Bedrock Data Automation for financial documents like very complex long financial PDFs. You put the document and choose the capability of extraction. You can do summary, you can do text, you can do fields. We've actually seen a significant increase in performance and accuracy when using Bedrock Data Automation instead of traditional large language models.

You can actually gain an advantage both from price and also accuracy performance with Bedrock data automation. So with that said, I'll pass it back to Dennis when he's going to go through the architecture deep dive on how they've implemented their agentic system of database.

Architectural Evolution: From RAG to Multi-Agent Orchestration

Alright, now the fun part. Let's get technical. I'm going to walk you through how we actually built this and the architectural decisions that we've made, and the theory around how we put our theory into production reality.

So again, the evolution. Everybody raise your hand here if you've built a RAG application. All of us have started with that. It's the initial deployment that we all do. It's a RAG system, one model, one context window, trying to solve everything. And so users started to ask about credit risk, sector analysis, all of that, and this one system had to be an expert in all of it. The cracks appeared immediately with context window limitations, performance degradations, and just shallow expertise. We were really asking the impossible.

We kind of made a few series of realizations. The first one is financial intelligence should mirror how we as humans and expert teams actually view some of these problems, and it requires specialized expertise. Second, we noticed that our users weren't asking just random questions. They were orchestrating repeatable workflows. And third, it's context switching. Just like a human, context switching kills performance. Even similar expertise benefits from true isolation, and gigantic systems need that focus and context. So the answer we came up with was deep specialists, intelligent coordination, and repeatable patterns. This was the fundamental shift. No more prompt engineering, but true context engineering.

Alright, so before we dive into this, raise your hand if you spent hours tweaking a prompt. I bet a bunch of you also hit a really brick wall at some point where no tweaking was ever going to solve that problem. That's the reality of the systems that we've built. We have to move away from prompt engineering, which optimizes little instructions and little parameters here and there, and we pray to the LLM gods and hope we don't fall short. Here's what we learned. The breakthrough wasn't about better prompts. It was about better context boundaries. These multi-agentic architectures and multi-agentic workflows own precise context boundaries about their specific domain with no cross-domain interference and no diluted expertise.

Alright, so here I'm going to give you a little context into our workflow system. In late 2023, around 2024, we're running Research Assistant and customers loved it, but we started to see a real specific pattern. Users were asking questions like pull credit ratings for these five companies, now do an analysis of their financial metrics, compare it to news and cross reference it with some sector research. They were forcing chain of thought through our chat. That's when we realized they don't want smarter chatbots. They want orchestration power. They want to visually stitch together Moody's expertise into repeatable workflows and create really specialized outputs, which means charts, graphs, tables, you name it. So we built this workflow designer. This is before OpenAI's agent builder. This is before most orchestration platforms existed. We built it because our customers needed it. This was true customer empowerment. And so our system here has gone from a 20 step workflow to 400 step workflows, right?

It will take anything from a couple minutes to 15 plus minutes. What you get is a structured output of our expertise that ourselves and our customers were able to stitch together to solve real world problems. This is probably one of those systems where we realized that we needed to create it. It isn't about something that our users came to us and theorized about. This was us understanding what users wanted, why they needed it, and how we were going to solve some of their problems. We're very proud of this.

Now for the fun stuff. I want to dive deep into the architecture and go through this, but first I want to align on terminology. I'd be really interested to ask this: how many times have any of you been asked by some executive inside of your company to say, "Hey, I want you to ingest all of our documents. I want to be able to ask any question and I want it to be 100% accurate"? And then they also ask you that they need that probably tomorrow or next month. That's just not realistic. They also say, "Can't you just throw some agents at this?" I think it's really important for us to understand the terminology. If we align on the terminology, we can have a much better conversation.

Here I'm going to talk through what our team has aligned on: what our tools, agents, and workflows are. A tool is a system that performs a very specific task. It is a process that returns context back to an LLM. Think of it as a discrete process. You are fetching data, you are doing calculations, and it is isolated.

An agent for us is an LLM that is autonomously choosing tools and operating in some form of loop, and it determines when its task is complete. Yes, I know that sounds smart, but it's not Skynet. It is what we all know it truly is: honestly, it's a for loop with better PR. A workflow is a deterministic orchestration. This is a predefined sequence of steps that we are coordinating tools, agents, or something along those lines to get a very consistent output.

Fundamentally, tools can also be agents and agents can also be tools, but the real reality here is these are just the building blocks that we have built the system on. With this, I want to talk through the five pillars that we've thought through in orchestrating the system. The first is we strive to be serverless. The financial world that we live in is incredibly spiky. At any moment, a credit change happens and we go from baseline users on our site to 50X. That is just the way we work. So we built our agentic systems in that same way, truly understanding that we were going to use the foundations of our serverless architecture to perform the agentic tasks that we also need.

Second, we learned that tools, those tools that perform operations, are the essential building blocks. So we wanted to focus on how we surface those tools to agentic systems and to the company as a whole. We chose to do this in Lambdas. The reason for this is because they are single purpose, they are fast, and they are stateless. This allows us to have many different sets of workflows or agents utilizing the same tools and it will be able to scale with us.

Third is our agents. We build two types of agents. There are simple agents, and these are some form of system prompt, a set of curated tools, and some validation steps.

We serve all of these in a JSON object so that we can run them at will, and then they get orchestrated by our orchestration systems. We also have complex agents, which are custom-built software. They consist of tools, our datasets, and code all stitched together. We run these as ECS services for a specific reason. Many times these agents require state and are long-running in nature. One of the caveats to using Lambda functions is that they are not great for a large amount of compute, and they have a limit of about 15 minutes before they time out. ECS, on the other hand, gives us stateful information and allows us to have long-running tasks, which is how we push those things forward.

The fourth component is our orchestrator, which is the brains of this whole operation. Our custom-built orchestration system takes in JSON format your list of tools, your list of agents, and your list of prompts, all orchestrated together in an ECS environment. The complexity we have built into our orchestrator is behind the scenes. Inside of this, we are juggling the act of being performant and cost-effective. We parallelize as many of these steps as possible, but we also understand that you need to have a certain number of these steps finished before the next step is allowed to proceed, and we do all that magic there.

The concerted effort here has been to save costs and handle our errors. Any time we get throttled by our LLM providers, that is the real power in our orchestrator. Our lead engineer set a maximum limit of about 20 steps because we did not think in any way that you would need more than 20 tools or 20 steps to solve a problem. However, some of our customer workflows are over 400 steps, which really shows what people are trying to do inside our systems.

The last architectural decision we needed to make was which LLMs to choose. We made a concerted effort not to be locked into a specific model for our systems. Every agent, every step, and every tool has the ability to choose a very specific LLM for its needs. These are things that we can then test, validate, and push forward in small isolated systems. One of our agents could use a reasoning model while another one could use even a small language model for small computational pieces.

Let me take a step back and talk through the numbers. We have about 80 tools, we have 100 plus workflows, we have many specialized task agents, and we are processing over a million tokens a day. I want you to know that this is not a demo. This is in production today, satisfying our customers' needs.

Solving the PDF Problem: Multi-Modal Processing Pipeline for Financial Documents

I want to take a moment here and talk about something that is truly important to all of us. We spent some time talking about the sophistication around our orchestration systems, our workflows, and our agents. But none of that works if you do not have the context to give our agents. This is our fundamental PDF processing and retrieval challenge. Now in our industry, everything is a PDF. 10-Ks have hundreds of pages, annual reports are hundreds of pages, earnings reports as well, and there are regulatory filings. All of this is a challenge.

LLMs are brilliant at reasoning, but without the right context, what will they actually accomplish? We call this an archaeological dig problem. You are trying to excavate layers of buried information across hundreds of different pages. A table header can be on one page, but the actual table can span two to three pages afterwards. There are charts, images, infographics, and text all over the place, with footnotes that are not always attached to what you think they should be attached to. This was a fundamental issue because all of these PDFs have different layouts.

In the financial world, one decimal point that is off could represent a catastrophic change to a customer and what they need. One of the biggest bottlenecks in financial AI continues to be the complexity around trying to deal with this unstructured data. So how do we tackle this? Like a good engineering team, we tried everything that seemed reasonable, some things that seemed unreasonable. We over-engineered a bunch of things and we also under-engineered a bunch of things. I am going to go through all the ways that we failed, and this is how we can be an honest engineering team together because in noticing where we have failed, hopefully that can help you as you are reaching for your solutions.

First, like a lot of us, we started with basic Python libraries. It extracts a bunch of the text and does it pretty decently well, fast, and cheap. But the problem there is you lose all context. Think about it as throwing a two-hundred-page document into a blender. You are going to get all the parts back out, but it might not have all the context that you truly need. So honestly, the verdict here was it failed. It did not provide us the type of context needed to solve financial problems.

Verdict number two was a custom parsing algorithm, and this is where we realized that we wanted to understand the true hierarchy of a PDF document. We would go through and try to do some bounding boxes around certain sections so that we could group some of these things together. The screenshot that you are seeing on the right is what that looks like. That is complex and it is a bit of a crapshoot. It is not always going to be able to work and it is not always going to be able to scale with the different types of documents that get thrown at it. So the reality there is it also failed. It was not able to scale for us.

Approach number three was our multi-modal foundational models. That makes sense because we now have vision models. We should be able to look at these PDFs just like a human would and LLMs should be able to do the same thing. Honestly, it did pretty well. We definitely found that certain complex tables and certain complex layouts would just struggle. It was not able to be as accurate as we wanted and the reality was very costly. Thinking about how to do that at scale was going to be very difficult. So the verdict here as well was failed. It is going to be too pricey and it just got just enough things wrong to make it not usable in production.

Last, we have our one million context windows. Once those came out we thought this is going to solve all of our problems. We can just throw this entire giant document in there and it will be able to understand them all. And it did really, really well. Of course, the problem is the larger the documents, the more you stuff into a context window, the more you are going to start to degrade what comes out. The reality here too is this is expensive and incredibly hard to scale. So here we have it: four approaches, all of them failed to reach the kind of requirements that we need in the financial industry.

Right, but the reality is we learned so much from this. We weren't going to just stop. Our clients need this solved. So we needed to learn from our failures. That led us here. This is the concept of not trying to find one solution to solve our problem, but we would start building pipelines that could route different content types to specialized processors.

So the insights were this. Not all pages are created equal. A text-heavy narrative page needs far different processing than a page dominated by complex tables, charts, or graphs. So we built an intelligent page classification system. It's an upfront analysis step and it could categorize a page that is text-heavy, and that would go to a Bedrock LLM that could do OCR for us and that would then convert that into markdown, and we have something that is a lot easier to query.

Now for table-dominated pages, this is where AWS's Bedrock Data Automation came in and BDA became absolutely essential. BDA is purpose-built for these kinds of complex table extractions and honestly it was a game changer. Traditional table extractors just couldn't handle the complexity around these financial documents. The reality is even I look at some of these tables and I'm even confused. So cheers to anything that can look at those and handle them.

Last, there are our charts and our images. Those then would go into a vision model. A vision model is able to look at those, create some kind of metadata for it, and create something that allows us to query them in some form of vector database. So this multi-modal approach is what finally unlocked the fact that we have a scalable PDF processing system. Different modalities, different tools, intelligent routing—that's the architecture that solves some of this.

Agentic Retrieval and the Path Forward: Smart APIs and MCPs

But here's the thing, this is maybe half the battle, maybe even less than half the battle. The real battle comes in how do you get this information then back out of our systems. This is where we went into a different approach to get our stuff back out. This is about moving away from traditional keyword search to semantic search because that all fails to understand the complexity around these documents.

Here's where traditional single-shot vector search, which just returns the top K chunks, or combining that with some form of keyword search where you run a re-ranker, you get that information into an LLM and you think you're done, this works. And then you realize it doesn't. So what we needed to do is we found that there were very specific types of questions that people were asking us where the information is just scattered across these documents.

Think of a use case where somebody's trying to find in an annual report the business units of the entity, cross-reference that with the revenue that it may have, and then also analyze maybe its sector information. This kind of search query on a document is incredibly complex. Business units exist in one table probably stretched around a couple of different pages. The revenue and other financial metrics are in another table. Maybe some of that information is in a footnote for some table. So how are you able to pull all of that stuff together in a single-shot search? It's just not going to happen.

So this is where we went into agentic retrieval. We built agentic retrieval to look at a document like a human would look at it. In agentic retrieval, we get the user's query, we create a plan off of that. This means decompose the queries, create a search strategy. We then execute that strategy, which could be multiple different searches across it. Then it will reflect. Did it actually answer that question? Was it able to pull down that information? If not, continue the loop.

Then finalize this by providing the individual chunks with their proper citations. This is truly intelligent document navigation, not just search. Taking a step back into the conversation we're having today about this multi-agent system, this now becomes one of the tools inside our tool belt. This is a tool that any workflow or agent is able to utilize so that it can pull the right information at the right time to solve its needs.

This brings together our complete solution. We have a bunch of different tools, specific task agents that work inside their known domains, orchestrated across a really complex orchestration system, all there to solve a specific customer need. Now we need to talk about what's next. We're evolving at all times. Our customers realize that they need our intelligence, but they want to not only consume it, they want to build with it. This led us into MCPs and our smart APIs and a lot of the tools that we built and utilize inside our orchestration systems. We are now able to expose to our customers directly.

I think it's work we were incredibly fast to the market, and we have been custom building a lot of our solutions because there just weren't solutions available to us already. I think one of the next steps that we have to take internally is really figure out the buy versus build decision. Honestly, after being at re:Invent for the last few days, we have realized that a lot of the things that have come out in Bedrock and Agent Core are all going to be things that we are going to be able to replace a lot of our custom code with. It's going to be able to solve our problems, and all of these utilizations of managed services is going to reduce a little bit of our own tech debt.

AWS Agent Core: Production-Grade Primitives for Scalable Agent Systems

Speaking of your primitives, Sam's going to go through Agent Core and teach you how AWS is trying to build this so that you guys can also have production-ready things at scale. Thank you, Dennis. I'd expect the majority of you have heard about Agent Core at re:Invent before. This is not going to be a deep dive. This is going to be a very high-level introduction of Agent Core. Agent Core is an addition to our Bedrock ecosystem that gives you primitives to build production-grade agents at scale on AWS.

There are eight primitives. We just announced a new one yesterday that is not here because it was under embargo and I couldn't add it to this slide. Some of these primitives that I strongly believe Moody's will be able to actually use and benefit from are the runtime. We start with the runtime, which is the ability for you to have a serverless compute environment where you can run agents with session isolation at the VM level with a pay-as-you-go serverless scenario. It supports any framework. Moody's has built their own framework, so they can bring their own framework. They can call whatever model they are using for specialized agents at any point.

The whole idea of Agent Core is to give flexibility and options for users. You can pick and choose any of these primitives at any point and combine them with your own solution. Runtime is definitely a big candidate for helping with the compute power. As mentioned, they currently have 80 tools just for that specific multi-agent system, and I could imagine that tool list will keep increasing. We have released Agent Core Gateway, which allows you to create a virtual remote server that has a complete managed service solution for you to manage your tools that you can call directly through APIs. You can have Lambda behind the scenes to add your own logic. You can even call behind the scenes your own existing servers in a centralized platform.

Now, everything that you do, especially in such a regulated industry as financial services, you want to make sure you have proper identity, authentication, and authorization. When you think about an agent, you can think about two types of authorization. You can think about inbound authorization, meaning who is the user accessing the agent and what are the permissions that user has in order to call that agent.

Agent assistance brings a new challenge into authentication and authorization. The key questions are: what tools can agents use, what type of calls can those agents make on behalf of the user, and how can you propagate metadata and federation from the user identity into what we call outbound authentication? We have introduced AgentCore Identity to help you manage all this complexity of authentication and authorization.

The great news about AgentCore Identity is that you can integrate it with your existing identity providers. If you're using Cognito, Okta, or any custom provider that supports OAuth, you can simply connect your AgentCore Identity, and it will call those providers on your behalf to validate the token and validate the credentials of your user. We're not trying to replace your identity providers. We're simply trying to give you the flexibility to support both inbound and outbound authentication.

Of course, everything you do here requires observability. You need to know what the agent is doing, troubleshoot issues, and ensure regulatory compliance by saving every single step. Dennis showed us a very complex workflow that had dozens and dozens of steps. How do you actually collect every single step that every single agent took behind the scenes, every single large language model call, every single tool call, and every single reasoning chain of thought prompt? With AgentCore Observability, you can use runtime on AgentCore Gateway or even use outside AgentCore. You can extract all that information and ship it to CloudWatch or a third-party observability provider of your preference using OpenTelemetry.

Finally, there is AgentCore Memory, which I think is very important. Let's say you're another bank using Moody's Agentic AI systems. You would like the system to know that you are from this bank, perhaps part of the investment bank, and that you have certain stocks in your portfolio. Wouldn't it be awesome if the system could collect that data as you're interacting with the agent and save it for long-term purposes? With AgentCore Memory, you have a fully managed solution that can collect both short-term memory, which is just user-assistant interaction, and automatically extract insights for you, preferences, summarizations, or even your own custom semantic information.

If you're interested in AgentCore, there are many sessions available at re:Invent. I highly recommend checking them out. That concludes our session. Thank you so much for joining us. Please rate us on the AWS events app. Our session ID is IND 3303. Thank you so much, everyone.

; This article is entirely auto-generated using Amazon Bedrock.