Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Accelerating Real-World Evidence Generation in Life Sciences (IND202)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Accelerating Real-World Evidence Generation in Life Sciences (IND202)

In this video, AWS Healthcare and Life Sciences team, along with Eli Lilly, demonstrates how they're revolutionizing real-world data access through AWS Clean Rooms and Datavant's tokenization technology. Anne Evans explains that 80% of healthcare data remains unstructured, while leveraging real-world evidence can reduce clinical trial times by 40%. The session showcases a multi-agent AI system built on AWS that enables non-technical users to analyze complex healthcare datasets using natural language. Greg Cunningham from Eli Lilly shares how their implementation reduced data evaluation time from four months to under four weeks. Eric Brooks details the architecture of seven specialized agents including SQL creator, medical coder, and data navigator that work together to transform questions into actionable insights. The solution addresses critical challenges like data fragmentation across providers, HIPAA compliance, and the need for patient-level data linking across multiple datasets while maintaining privacy through tokenization.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Growing Impact of Real-World Data and Evidence in Healthcare

Thank you for joining us, and welcome. We're going to start today with a few interesting facts. Did you know that for every breakthrough drug that makes it to market, pharmaceutical companies analyze over 10 million patient data points? Yet 80% of healthcare data remains unstructured and difficult to access. In today's rapidly evolving healthcare landscape, real-world data and real-world evidence are game changers in how we understand patient outcomes and develop new treatments. Studies show that leveraging real-world evidence in clinical trials can reduce clinical trial times by up to 40% and cut costs by millions.

I'm Anne Evans, and I serve as the Strategic Partners and Programs Lead in AWS Healthcare and Life Sciences. Today I'm joined by Greg Cunningham from Eli Lilly, who will share how Eli Lilly is revolutionizing their approach to real-world data, and Eric Brooks, our AWS Principal Solution Architect, who will dive into agentic workflows and how we are transforming data accessibility. Whether you're a pharmaceutical researcher, a healthcare provider, or a data scientist, we're genuinely excited to share how we've been collaborating with our partners and customers to accelerate and simplify access to real-world data.

Let's dive into how data is used in healthcare and life sciences and how AWS is revolutionizing data access. In the last decade, global data generation has increased elevenfold, from 16 zettabytes in 2015 to 181 zettabytes of data at the end of 2025. The same is true in healthcare data. Between imaging data, lab data, and all the data becoming available in healthcare, we are projecting 10 zettabytes of healthcare data being available at the end of 2025.

This proliferation of data availability has driven transformative growth, evolving the use of real-world data and real-world evidence from an emerging field to a mainstream component in pharmaceutical development and healthcare decision-making. However, while this exponential growth creates unprecedented opportunities, it also creates critical challenges. How do you organize, access, and make value from such massive volumes of information? Before we dive into more specifics, I'd like to ask with a quick show of hands: how many of you are familiar with real-world data and how it is used in healthcare and life sciences?

For those newer to this space, real-world data is healthcare data collected during routine patient care, which means it's everything from electronic health records to wearable devices to the insurance claims we all make. Real-world evidence is what we learn from analyzing this data. This evidence is used to provide insights into the possible benefits and risks of medical products. It's also used to study clinical conditions and biomarkers when developing new drugs. Driven by an increasing demand for data-backed decision-making, accelerated drug development, value-based care models, and the continual regulatory expectations across pharmaceutical, biotech, and medical devices, this increasing demand for evidence and insight is driving a surge in the demand for real-world data-backed studies.

Real-world data is used to help boost preventative efforts to identify patients that are at risk of illness or eligible for clinical trials. It's used to shape clinical trials, create public policy, and expand drug safety testing. It's used to determine the value of medical-based interventions and to establish reimbursement strategies while supporting regulatory expectations. We create this flywheel because the more data that is available, the more evidence is being used, creating that demand for yet more data. This is why establishing a strong data foundation layer is essential to effectively use the data and allow agents to use the data to generate these insights and evidence.

Building a Strong Data Foundation: Challenges in Data Fragmentation and Integration

We must start with a strong data foundation layer. In life sciences, this includes historical data that is sitting in your warehouses or on individual scientists' workstations, experimental data being generated in your labs, and computational data being run in simulations. When you take all of your data and mix it with third-party data that will enrich your data, you increase the breadth and depth of the questions that you can answer with information previously unavailable. The ability to make that data work for you is limitless, but we have to remember it all starts with the data foundation layer, because the most sophisticated AI in the world will only ever be as smart and effective as the data that it can access.

Despite this tremendous growth that we've discussed, finding the right data—the data that you need—continues to be messy and challenging and unpredictable, which in this case is fitting for data that comes from the real world. Today, 40% of data purchases are incomplete, which really means ineffective. Given that real-world data is collected all along that patient care journey, there's no one source, no one provider that will have all of the data that you need. Yet 70% of studies require very specific fit-for-purpose data that is linked across multiple datasets. With HIPAA requirements that all of our data remains de-identified, connecting patient data across multiple datasets is extremely challenging when you're looking for very specific fit-for-purpose data.

This fragmentation creates major hurdles. For example, it typically takes 4 to 12 weeks just to find and obtain what looks to be the necessary third-party data. This is further complicated by data producers' reluctance to share data outside of their control. Once you've obtained the data, because each producer will use their own unique format, you still have to harmonize that data, which will add another 4 weeks to your process. Once you've done all of that and you have your data, your different team members from the computational scientists to the experimental scientists all need their own tools to make that data effectively work for them.

AWS and Datavant's Privacy-Preserving Solution for Data Discovery and Access

All of these challenges create timelines exceeding 10 years and costs in the millions, depending on the therapeutic area, even in the 2 to 3 billion dollar range, just to put that new drug out to market. At AWS, we're approaching this complex challenge with a comprehensive multi-part solution. In the first part shown here, we're streamlining data discovery and access. We've been collaborating with our AWS partner Datavant to build a solution on AWS Clean Rooms that uses data advance privacy preserving technology to help researchers quickly find, analyze, and obtain linked patient level data sources.

In this solution, data producers host their tokenized personally identifiable information in their AWS account. Data consumers can then discover and evaluate multi-modal fit for purpose data across Datavant's broad network of data producers. Consumers will then negotiate directly with the producer and subscribe to the data in a private offering with billing directly through AWS Data Exchange.

Datavant's proprietary software takes an individual's personally identifiable information such as name, gender, and date of birth. It converts this information into a unique string of characters known as a token that de-identifies the information. This token is different than an LLM token. This token is specific to Datavant and is within their proprietary software. Each token in each data set is unique to an individual patient.

With Datavant's proprietary software, consumers are able to connect patient level data with this token without revealing any underlying personally identifiable information. The same patient will always generate that same encrypted, non-reversible, deterministic token. That same patient unique token can be used to link data sets with that token across all of these multiple data sets at the patient level, which provides that longitudinal patient journey. The tokenization happens in the data producer's container behind that data producer's firewall, ensuring that no personally identifiable information is actually leaving the data producer's secure system.

Through our collaboration with Datavant, we have been working with healthcare organizations and life sciences companies to bring this tokenized data into AWS Clean Rooms, transforming how we access and analyze these valuable data sets. Datavant Connect, powered by AWS Clean Rooms, gives data producers the ability to discover and evaluate fit for purpose multi-modal data in a HIPAA compliant privacy preserving manner. For data producers, this new discovery and evaluation method allows them to share their data with data consumers while maintaining complete control. Because no data has actually moved between these environments, data consumers retain the ability to share as much or as little of the data as that data producer deems necessary.

For data consumers, multiple personas are able to discover and evaluate fit for purpose data with an easy to use front end agent that acts as their own virtual assistant, helping to ensure that they find the right data. Data sources host their tokenized personally identifiable information in their AWS account. They register directly with the Datavant Connect platform by creating a Glue database and a Glue table that has pointers to their S3 bucket and table schema. Consumers then evaluating different data sets will use AWS Clean Rooms as a secure neutral evaluation space, meaning that no data is moving between these environments.

Researchers are able to evaluate multiple data sets simultaneously, drastically reducing evaluation time. Data producers don't have to manually update their information in the data connect platform because that live connection with AWS Clean Rooms ensures that their latest data is always available. Once the right data is found, Datavant will certify the data, and it can be seamlessly procured through AWS Data Exchange.

Accelerating Insights Through Data Harmonization and Agentic Capabilities

Pilot customers have been testing this solution and sharing how they are unlocking access to data that, once available, is accelerating timelines like never before. For the second part of this solution, shown here in blue, we have been working with partners such as Atheneum and their Activate platform, Manifold and their data layer, and Palvantir with the IHD cloud to enable faster insights through modern scalable data harmonization platforms, which allows us to support both technical and non-technical users.

Technical users, like biostatisticians and data scientists, will use open source tools like R and Python. Non-technical users can access agentic capabilities to build cohorts, applying inclusion and exclusion criteria when analyzing that longitudinal patient journey. Through our collaboration with Eli Lilly, we have developed this agent that allows non-technical users to analyze complex healthcare data sets using natural language, built on AWS's AI stack.

Our agent seamlessly connects to data sources across Redshift, S3, Athena, and Databricks. With data delivered directly to the S3 bucket with automated harmonization, technical users continue using R and Python, and non-technical users can access the agent to act as a persona-driven virtual assistant, making sense of the data and providing insights instantaneously, all within a responsible auditable framework.

The streamlined approach eliminates the traditional bottlenecks while maintaining data security and data privacy. What used to take four months now can be done in under four weeks. As we continue to navigate the evolving field of real-world data and evidence, AWS continues to actively collaborate with multiple partners and customers to tackle the complex challenges of real-world data discovery and analysis. Companies such as Manifold, Deloitte, Palvantir, Atheneum, EPA, and Atropos are all leveraging AWS infrastructure and incorporating AI agents to transform real-world data analysis.

It's important to remember that these solutions are not just about data processing. These solutions are about intelligent automated decision-making that is identifying patterns, generating evidence and insights that accelerate that path from data through evidence to discovery. But as we stated earlier, it all starts with a data foundation layer. Because even the most sophisticated AI in the world will only ever be as smart and effective as the data that it can access. So with that, I am really excited to welcome our next speaker to the stage who will share firsthand experience of putting this technology to work. It's my pleasure to introduce Greg Cunningham, Senior Director of Real World Data.

Eli Lilly's Journey: From Clinical Trials to Real-World Evidence

Thank you, Anne. It's great to be here. This is my first time at Reinvent. I work at Eli Lilly and Company, a pharmaceutical company based in Indianapolis, Indiana. Unlike my four sons, I've been at one company for 39 years, with the last 12 years in real world evidence responsible for our licensing of data and the data platforms that we use. Prior to that, I worked in our clinical trial area on new drugs that made it to the market.

Today I want to talk about a couple of things regarding the business need for a scientific company, but I think this pattern goes beyond just a scientific company. I believe any company can follow the approach that we have developed with AWS. In pharmaceutical companies, the gold standard for data and data foundation is clinical trial data. This data is very clean, follows very rigid standards, and is carefully maintained to ensure high quality.

When I first started at Lilly, we conducted many phase 4 studies after drug approval. These studies helped us understand what was happening with patients after the drug was approved because there was really no other way to know at that time. However, healthcare changed with medical records transitioning from paper to electronic. If you go to a doctor's office today, they are not writing much on paper. Instead, they are entering most information directly into tablets or computers. We found a new source of healthcare research data that allowed us to stop doing costly and slow clinical trials and instead do more real-time work by ingesting data like this.

But first, what is real world evidence? I have a definition that the FDA uses. It is clinical evidence about the use and the potential benefits and risks of a product derived solely from electronic healthcare sources in the system that we all use in our US healthcare system and in other countries as well. At Lilly, we have formulated our own formula to get this evidence. You see to the right there are three key ingredients. First, you have to have a scientific question or a hypothesis you want to test. The foundational thing is the data, getting the right data. Our group spends time helping people get the data that has the best quality for their question, but also looking at what variables they need and ensuring those variables are present in the datasets they use. Finally, you need a statistical analysis plan and then the execution and the analytics.

Real world evidence is increasingly vital and important to life science companies because we want to have actionable real-time insights into what is happening to our drug now that it has left the clinical trial area and is being used out in the real world. The results are not always the same, and there are sometimes surprises regarding what patients you are getting in the market. Real world evidence is used across the whole product life cycle that we have. When I look at the uses, we use it in our early phase discovery as we are trying to discover new molecules. We use it in our clinical trials and we use it post-launch as I mentioned earlier.

For the clinical trial examples I can give, we have protocols that we are always trying to improve to make them easier for the physician to conduct the trial. In doing that, we use real world evidence to identify the best inclusion and exclusion criteria we can have.

In the health outcomes function that I am part of, we conduct observational studies. These studies are designed to demonstrate the economic value of our products and their comparative effectiveness. We examine one of our drugs versus competitors, and while we know the clinical trial results, we also publish papers to show how the drug performs in the real world compared to clinical expectations. Our end goal with that exercise is to publish in scientific journals, which is one of the high priority things we do in our area. After launching in the market, we also use observational studies to look at patient characteristics—who we thought we would reach in our marketing plans versus who actually takes our drug and how it performs.

Real-World Data Types and the Data Acquisition Process at Eli Lilly

There are a variety of real-world data types, and I'll give you some perspectives on how we see and use data today. Over the last ten to fifteen years, we've used the foundational data you see on the screen. Electronic health records have been the tried and true data for real-world data as they became electronic. They contain great clinical variables for people to use to understand patient information, and lab data is usually included. The data we use most in health outcomes is claims and billing data from insurance because it gives you every healthcare transaction someone had over a long period of time. Since most of us stay with the same insurer over time, it provides a very nice long-term view.

The foundational data is typically broad and covers all diseases because these are large databases used predominantly for that purpose. The most exciting development in the last few years is this emerging data. Our goal is to have data that is truly disease-specific and tells you whether the disease is getting worse or getting better. Wearables, sensors, and health apps are really critical for this. An example of how pharma companies use an app is migraine studies where migraine patients use the app and enter information every time they have a migraine, including the severity. This gives us the ability to see what drug they were on and determine which drug is actually reducing the severity and frequency of migraines.

The merging of broad data and this deep, rich data is extremely helpful. As the human genome continues to evolve, biomarkers and genomic data are critical. Our oncology team uses a lot of this data with our real-world data to match up and see which biomarkers are really making an impact with our drugs and whether they are modifying or treating the disease. The token allows us to take this broad longitudinal data and match it up with deep and rich data from the disease, enabling us to do much richer analysis that helps us understand the patient journey and how we can help impact their lives.

The acquiring and ingesting of real-world data is a process that we're starting to see Datavant and AWS really advance. Our process is not much different from what other pharma companies do. We spend time determining the scientific question and identifying the need that someone has. Then our real-world data team evaluates what data will best meet that need. We have data already purchased and licensed internally versus the external market, which is changing constantly. We spend considerable time on those first two steps—I've seen it take months, and I've actually seen it take up to a year at times. We're trying to figure out how to do this better and faster. Once we've made the decision, the transfer, ingestion, and moving of the data uses Redshift database for our analysis.

We're really excited about the AWS Datavant solution they're looking to put in place because of what they're enabling. I've seen this process take a couple of months, and if we can get it down to a month or less, it allows us to use more current data and real-time data faster. We're really excited with where the technology is going in this space.

Democratizing Data Access: From Complex Coding to No-Code Environments

I've talked about real-world evidence and real-world data to set the context of the challenges we're facing in generating evidence from real-world data. Having been in this space for more than a decade, I've seen us try to hire analysts, statisticians, and data scientists. We've made huge leaps, but we're still dependent on analytic professionals who are those data scientists and statisticians. These end users, our scientists, use and count on these folks to do publications and regulated work, so for us it's the core. It's the most important work we do.

We have trouble just staffing that position with all the improvements we've done and all the changes we've made. That is really keeping us busy, and we've added more and more people, but it's not solving the problem. The second challenge we've had is a significant learning curve. This data is not collected for clinical trials. It's collected for insurance, for doctor's notes, for billing. It's not meant for this purpose, so it's not as clean and accurate. With that and the completeness issues, this data is much more difficult. When new analysts or statisticians join, it takes up to six months for them to be really productive in this space.

The third issue is that real-world data does not have clear standards. There are several floating around, but not one of them has taken over and made it easy. This hinders the ability to make tools or write standard code that can be used and leveraged. So what we did to expand on this, we had the first two columns you see. Our original users were doing complex coding using the AWS cloud platform. We've done a lot of great work in this space. We then said we've got to do something different because we can't get enough work done.

We took some of our scientists who actually did coding in college and did their own analysis. We let them use an analytic tool that's on the market. It's a low-code environment where they're able to enter parameters and get output themselves and answer questions themselves. To the left, we use those publications and regulated activities. This middle column lets them generate their own answers. But we found that not all the scientists have the same background, and it's not easy for them all to use a tool.

So we worked with Amazon to say how can we change the game? How can we have a no-code environment where people can enter a text question to get results running across our data? We've been working on this for over a year, and it has come to fruition. We're going live next week. Those scientist users are able to get questions answered themselves. They can even do it in a meeting where someone on the clinical team asks, "How many patients? What's the percent of patients that have disease X? Or what's the most common drug in this space?" These are questions that we don't have time to give to a statistician. It may take a week or longer for them to do it, if they can get to it at all. This is a way to let self-service take over.

I would say that the biggest things we've learned through doing this is that the agents, the LLMs, and all those things do great work, but they need context. They need business context that's in my head and in our analysts' and statisticians' heads. Really putting those things that run before the agents so that we can put that context in and allow the agent to write better code is crucial. The second thing we learned in the testing was that the more general your questions are, the more assumptions your agent is going to have because it's trying to figure out what you're saying. So we've told our users to be very specific. If you want data from the prescription claims, say that in your question. Don't leave it to be assumed.

This has produced greater accuracy. The last thing I would say is that when we had people do questions, I thought you would just put in a big block of questions and it would spit out the answers and you would enter them, get your answers, and move on. But it's a dialogue. You have to ask a question, then ask another question, and another question. The answers and the accuracy of those answers have gotten better because of that iterative process.

Scientists by nature are very skeptical, so we're worried about releasing a tool that people who are very skeptical will say, "Well, is this answer right?" I've had that question four times already from our scientists. We're going to do some tests of real questions that they ask as we go live to give them a little more sense that yes, this is on the right path. The SQL code being written has actually been very good and it's improved at every step of the way so far. We're very excited to go live with this next week.

We did a pilot group, and with the pilot group, our intent was to take people who are more tech savvy, people who are into this, and they're going to be the soft launch that we go with. Then we'll do a hard launch with people who may be less technical, and those people who sit with them can work with them more closely. I think that's been some of the learnings we've seen so far. I will finish here, and I'm very excited to have Eric Brooks join us and give his firsthand experience of implementing what we've just discussed. Eric is a Principal Solution Architect from Amazon, and he has worked with us over the last year and a half to bring this to life. Please give Eric a warm welcome.

Architecting Multi-Agent Systems for Complex Real-World Data Analysis

Thanks, Greg, I appreciate it. What we've seen so far is this journey from really data acquisition. We're buying the data from various providers and loading it into Redshift or S3. We were governing access with permissions through Redshift Access, through SageMaker Lakehouse, and we've gotten to the point now where we've made data available. But we have to tackle this complex multimodal disparate data set landscape because, as Greg described, what we're not talking about here is a simple sales data set. This is not simple data that can simply be queried by a text-to-SQL agent.

What we're also not talking about is a simple text-to-SQL use case. One of the most common questions we get is how this type of solution differs from a common text-to-SQL approach. We've seen this a million times already—nothing new. As we explored this use case with Greg and his team, what we found is that this is very much well-suited for an agentic use case. But as we try to move through this process of taking this mountain of data, one data provider provides something like 15 terabytes on a monthly basis of new data. How do we tackle that kind of evolving landscape where data might be siloed across different services, maybe different data modalities, perhaps even different data schemas, or oftentimes different data schemas?

We're working with a research team and ultimately a consumer team which aren't SAS coders, they're not our coders, they're not Python developers, and they're not deeply knowledgeable about this use case. They don't know that things like continuous enrollment matter when it comes to claims data from a quality perspective. What we're trying to tackle is how do we build a system that uses the most recent frontier models from Anthropic and Amazon to make sure that we can democratize access to this data and ensure, as Greg mentioned, that it produces the most accurate and highest quality outputs because these are critical things that people are interested in at Eli Lilly.

We have to decide how we're going to architect that system, and that's what I'm here for today to talk about. I always think it's good to start with a really solid mental model. As we think about agents, there's a lot of talk about agents in the market. We heard a lot of updates this morning about Agent Core and about all the capabilities within AWS to build agents, but I always like to start with a good mental model so that I can understand the system I'm trying to build. The thing I always start with is what's the goal—what's the user experience I'm trying to provide? Am I just trying to provide SQL queries, or am I trying to build a system which outputs real analysis that has rich insights, solid visualizations, and a high-speed, high-quality, high-performance experience?

So what am I working backwards from? At Amazon, we like to talk about working backwards, and that's a very important thing to define: what's the goal and what's the goal of the agent? Once I understand the goal, I can think about what tools I'm going to use as a human. I have lots of tools available to me—a Python notebook, a SaaS console, my email client. Anything I use that lives within the technology realm is a tool of some kind. I have a tool in my hand right here. It doesn't do that much, but it's pretty important to the job I'm doing.

So what are the tools that I need? If I were a human doing this process or this job, what tools would I need to get that job done? And probably more importantly, what tools do I need to develop the knowledge in the moment when I've been asked that question, to be able to answer it? That's a really important thing to know. Then I have to start to define the protocol. What are the steps I have to take to get to that thing I'm trying to go towards? The goal and the protocol are really what's going to help you start to work towards what prompt I'm going to hand to the agent, and we'll talk about this in a few slides so it can do its job.

Then finally, if I'm thinking about a multi-agent system, I also have to start thinking about the handoffs. What I don't want to do is say I've got six, seven, or eight agents in a system and hand every agent all the context of all the conversations and all the things that have come before, because that's not a successful architecture approach. So what are the handoffs? If I have a team of people working on a problem, I'm not just going to do a data dump of all the things that have come before. I'm going to say here's what you need, here's what I'm asking of you, and here's maybe some guidance on perhaps how to do the job. Please use your tools, come back to me, and that's very much the mental model for how to start to think about and align multi-agent teams to accomplish a task.

There are critical decisions we have to make along the way. The first one is what type of system is going to fit my use case most effectively? Can I build it with a single agent? Can I simply have a text SQL agent which uses the myriad of tools and capabilities that I need, that has a prompt and additional context, that's going to be able to answer the total set of questions I think I'm going to have to answer? Or do I want to think about this more in the sense of specialized skills, capabilities, knowledge sets, and context that's going to be able to break a problem down and answer it effectively?

One of the biggest learnings for me in working with Greg and his team early on was that in the context of real-world data analysis, getting to that real-world evidence outcome, there are lots of different personas and skills involved in this process. It's not just a text SQL agent that looks at the data and says here's your SQL statement and you move on. That's not the problem we're trying to solve here at all. And I think, as Greg mentioned, in a lot of cases, even if you're not looking at real-world evidence or real-world data, you may find that some of these similar kinds of aspects are also part of your use case.

It's not just about the dataset. There's adjacent data, there's context that you need to be able to answer complex questions to really build a system that scales and that generalizes to a larger problem set. So as we look at the single agent versus multi-agent comparison, we have this trade-off. It's the classic architecture trade-off: complexity for capability. In the multi-agent system as we did in this case, what we ended up with is this more complex system which was orchestrated by an underlying framework which enabled us to define agents which encapsulated the key capabilities and functionalities that were necessary to scale to meet the needs of the team at Eli Lilly.

As we look at a multi-agent system, then we start to realize that maybe we need a manager of sorts. If we have a larger team, we like to talk about two pizza teams at Amazon, and usually that team has a manager. That manager is sort of the decider of what the business wants, say if you have a developer team. So that manager, in this case the supervisor, is an agent that works with the user. It's kind of that front door, so the agent is going to be able to identify what the user is looking for, do some entity recognition, some summarization, and then it's going to work with its partners: the SQL expert, a medical coder. These are some of the aspects of this use case that are really important. Even a research planner, an agent who's familiar with a general methodology of how to answer questions in this context, as well as the data navigator. One of the things that we found very quickly too is because of what Greg mentioned around the fact that there's not a consistent standard for most of these data sets, and the fact that we don't have the convenience of saying let's ETL this data all into the same format and schema.

We need to be able to meet the data where it is. We need to be able to navigate multiple data sets to understand very quickly where to find the answer to a question in a data set which has tens of tables. Then finally, we need to be able to actually interact with that data set.

So how do we interact with the landscape? It's not just the underlying data. One of the biggest learnings for me was that there are many code sets, for example, as reference data that are critical to translating a human question into what we actually need to know. It's not necessarily in the data set, the thing that we're trying to find, but I need to take a couple of stops along the way to be able to get the right context.

As we started to think more about this problem, this is where module 3 from what Anne was talking about earlier really came out. What we landed on was seven specialized agents. The first, of course, is the assistant, which is the supervisor that chats with the user. The next one is the SQL creator. Of course, we can't do this without interacting with a structured data set, so we need to write some SQL. Then we need a medical coder. If I ask a question about a diagnosis or some medication or some lab procedure or other things, or maybe I ask about all three, I need an expert in medical coding to make sure that I can translate that human question into the real question.

We also need to create visualizations. We need to make a research plan potentially for more complex questions. We may even need to consult some publications. And then finally, we need to be able to navigate that data and do so in a way which integrates with that complex harmonized data set that you see in module 2. It's always good to zoom in on one of these agents to really see what it's doing.

The neat thing about an agent is it works very much like my brain does, like your brain probably does. You say I've got a problem and I want to solve it, but I have to iterate over it. I have to think about the problem and understand the problem before I can go and solve it, and that's exactly what an agent does. It has this iterative execution loop, a cyclical node on a graph in this case that takes a number of inputs. It starts with the prompt and the reasoning loop, this notion of iterating over a problem until it's solved and really until the agent decides it's solved, which is even more interesting.

The agent has to perform some kind of structured reasoning. This is something that's built into LLMs, so you hear about tool calling, function calling, and instruction following. These are all aspects of what makes LLMs the reasoning engine behind this process. Then I need to be able to do things like, in this case, since my task is to plan and execute SQL queries and provide analysis, I need to be able to plan the query structure and potentially have examples. I've got the graph state, which is all the things that happened before. Have there been other actions or other agents? Has the medical coding agent already deciphered all that complex human language into a set of codes, or have there been queries run before? Has somebody asked a question and then ask a follow-up question as the process that Greg described?

But there's something missing here because this isn't enough to be able to accomplish this. We need a set of tools and we need a set of curated tools to be able to make this agent able to do the things it's doing, just like as a human, I would use a query reference database or I would use a data set metadata. For example, I might use a data dictionary or I might go into the database and look at the actual table DDL. I'm going to be looking through table and column names, foreign key relationships, data types and constraints as I'm constructing a query, and that's exactly what an agent has to do. It has to have the context to be able to solve the problem in front of it.

By the way, this is something that applies to any agent. Remember back to our mental model. This is about defining the tools that enable the agent to accomplish the task. But then there's more to it. There's few-shot prompts, so we've got that as a part of the tool set. An interesting part about this is that it's a good way to integrate that human in the loop. What we found here was that a great set of baseline ground truth queries that answer related questions are a really good way to encourage an LLM to make the right decisions when it comes to query generation. Taking those golden queries as we've stated here and making them available through a semantic search, a similarity search or a similar meaning search to the agent on demand through a tool, is a great way to encourage and improve outputs.

SQL validation is a great way to encourage and improve outputs. We have a very quick way to validate those queries, and then the ability to actually execute and manage the outputs of those queries is really important. What that leads us to is this idea of being able to not only generate multiple queries, execute them, troubleshoot them, and gather information and context around them, but then take that and actually turn it into analysis. That's all part of defining the role, defining the tools, defining the protocol, and then defining the handoff that results in a set of agents which can answer a complex set of questions through this iterative cyclical execution loop with context gathering and eventual analysis.

Practical Implementation: Lessons Learned and AWS Life Sciences Accelerators

Let's talk about how this actually works. We have a question here: What is the age distribution of patients diagnosed with hypertension who are prescribed drug X in the last 12 months? There are about four or five different attributes depending on how you look at that question. How do we break that down? Well, it starts with the supervisor, of course, that's the entry point to this graph. The supervisor is going to look at that question and say, I need to come up with a plan. I see that there's demographic information, there's a diagnosis in there, there's a drug in there, and there's a time frame in there, so I have a bunch of things I need to figure out.

The first thing I'm going to do is stop at the medical coding specialist. I need to figure out what's the coding for hypertension. Is it one code? Is it many? What's the coding for the drug I'm looking for? And then what's additional information that might be of interest? What are there lab codes that might represent that drug? That's one we ran into that was unexpected. Once we've stopped at the medical coding agent, that's great. Now we need to figure out where this is in the dataset. So we stop at the data navigator agent, and each one of these stops goes through this iterative loop of understanding the task that's given to it and providing a high quality output for the rest of the system to use.

Then we stop, of course, at the SQL expert agent, which we just talked about, and then finally we land on the visualization creator because the supervisor decided that this would be a great use case to generate some visualizations for. Not only do we get a high quality output that's iteratively followed almost what feels like a very human logic kind of process, but we've also been able to give some high quality visualizations based on the context of the question that was asked.

As we built this system, we learned some pretty valuable lessons. The first one was that a great thing about multi-agent systems is that multi-agents can be evaluated individually. It's really important as you start to think through the problem set that you're addressing and you start to build individual agents that do these critically encapsulatable, almost atomic kinds of skills and things. You need to make sure that you validate them individually. Do they do the thing that you think they're supposed to do, given the problem set that you've defined when you thought about the original problem set?

You've got a deep dive on guardrails as the second lesson. One of the things that we found very quickly was that the complexity and the nuance in language of the questions that people would ask in this space is really important to get exactly right because the criticality of answering, say, a safety-related question or not is paramount. Being able to understand some of the nuances and deep diving with the subject matter experts on what exactly are some of the nuances that they do or do not want to allow from an inputs and outputs perspective is critical. What's in the underlying data? Are there de-identification kinds of concerns or data privacy issues? Deep diving on guardrails is really important.

The next lesson is to perform system level validation. Just because you've validated the individual agents doesn't mean that when you put them all together the system is going to do what you expect. Make sure that you come up with a good test set of questions that you can run through on a regular basis as you move towards production and as you change language models. Claude Sonnet 4.5 was just released a month ago or so, maybe not even, and the pace of model change is going to continue to increase. As new models come up, it's very important to be able to switch quickly to new models, which means validation, which means testing, which means automation, and these are all really key things to build into your original system.

Including humans in the loop is also very important for improving the quality of the outputs based on the quality of the inputs. This is almost a garbage in, garbage out situation. Including a human in the loop in a very passive way to provide approval of golden queries and human feedback on the quality of that ground truth reference for the agents is really important. And then finally, develop and track metrics. Human feedback from experience, quality of outputs, and even just latency of query execution are all important things to monitor.

As you start to instrument your system with observability capabilities like an agent ops tool, you really have the opportunity to continually and iteratively improve the solution that you're building by developing and tracking those metrics early on. From an overall standpoint as you think about building agents, the key aspect here is taking that raw data into actionable insights. As you can see, that's a multi-step process. A person cannot do this in an instant, and neither can an agent system, but the key is to build yourself that research team to accomplish the task you're going for.

The second aspect is really key, and Greg touched on this: the context aspect. It's about architecting a set of tools and an overall system architecture which results in the ability to capture the right tools that are going to curate that context to each agent in the right place at the right time. One thing I'll mention on that past one is model choice. A lot of attention is paid to model choice these days, and it's very important, but increasingly so as the models continue to evolve, context becomes a critical aspect of high quality, high performance agent execution.

The next aspect is testing and observability, and a lot of this has to do with rigor around application or solution development and then also picking the right framework like a Strands agent running on Agent Core to be able to quickly and easily instrument your system with the observability capabilities that you need to move from development all the way to production. Building an extensive ecosystem is one of the great things about multi-agent systems. You can extend them and add more capabilities as your problem set evolves. You can build reusable agents for different things. You can imagine that the medical coding agent can be used for a lot of things. It's not just exclusive to this use case, so being able to build individual capable agents and then continuing to add more and more agents is important.

Agent-to-agent communication is now becoming very popular as a protocol for this exact reason. That's a really important thing to be thinking about: building that ecosystem of agents so that they can plug together is critical. Safer and auditable workflows are essential. In this case, embedding compliance into the reasoning process and being able to manage things like citations as part of the output is really important to build trust in consumers as they interact with an AI system because they don't see all of the reasoning. They need to know how the agent arrived at the output.

Human plus agent collaboration is really important. Greg touched on it with regard to that interaction process of question and answer, question and answer dialogue, but it's also really important, as I mentioned earlier, for a human to be involved in a less direct capacity to provide that ground truth. Subject matter expertise is critical for both this and probably many of the use cases that you might be thinking about today. Finally, of course, we always want you to build on AWS. The agent services continue to evolve on the AWS platform to an incredible set of capabilities that enable you to move forward with your development to production process.

With that in mind, we will be at Sushi Samba tomorrow from 12:45 to 2:45, so please come and see us. We'll be doing a demo of this actual solution on a real world data set. Please come join us, and hopefully we'll see you at Sushi Samba tomorrow. The life sciences team at AWS has also built an incredible set of accelerators in addition to the RWD agent. This includes things like competitive intelligence and what we call clinical supply chain control tower. The intent of these solutions is to show how you can get from idea to production in the fastest way possible.

You can take that native build process, which I know as builders you like to do, but then also you can use the AWS OS toolkit for life sciences as a next step. Finally, with the accelerators that we're building and iterating over with customers like yourselves, we're able to get there even faster by doing 40, 50, or 60 percent of the work already so we can hand you something and partner with you to move forward to production. I'm really excited to hopefully see a lot of you at Sushi Samba tomorrow to talk about real world data as well as the other accelerators that we have.

If you're interested in more information, grab the QR code here. Get some more information on the agentic accelerators. Of course, also in the massive expo hall, please come and see us at the industry's pavilion. The life sciences team from AWS will be there today and for the rest of the week, and we'll be excited to see any one of you as you come out. Please thank you for your time today. Hopefully you learned something from this session. Please provide feedback on the session content with the survey, and I hope everybody has a great rest of their day. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.