DEV Community

Cover image for AWS re:Invent 2025 - Next-Generation Data Management — Insights at Scale with Agentic AI in Pharma
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Next-Generation Data Management — Insights at Scale with Agentic AI in Pharma

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Next-Generation Data Management — Insights at Scale with Agentic AI in Pharma

In this video, a ZS representative discusses agentic AI transformation in life sciences data management. The speaker emphasizes that 9 out of 10 CIOs are shifting from POCs to full-scale implementation, focusing on value creation rather than just automation. Two core paradigms are introduced: "AI for data" (achieving 40% efficiency gains in data engineering) and "data for AI" (creating comprehensive metadata lakes to improve accuracy from 70% to 98%). Key use cases include automated analytics workflows, clinical document generation, and software development lifecycle optimization with up to 75% efficiency gains in testing. The speaker stresses that successful transformation requires rethinking processes before automation, robust infrastructure planning, cross-functional collaboration between business and IT teams, and rich business context as the differentiator in agentic AI implementations.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Thumbnail 20

Life Sciences CIOs Shift from Experimentation to Transformation: The Imperative for Change in Agentic AI

Hey everyone, nice to meet you all. Today, we're going to discuss data management and use cases for agentic AI in data management as well as in life sciences. Before we dive into the data management specifics, I want to talk about what our CIOs are saying in life sciences. What do they want to know about, and what are their priorities when it comes to agentic AI? You probably won't be surprised to hear that their number one priority is to see value from agentic AI. Historically, over the last year or two, we've been doing a lot of pilots and POCs, but now the question is how do we enable transformation? How do we use these new capabilities that we've discovered to get value and insight at the end of the agentic transformation?

You'll see two paths. I can have a lot of quick wins, or I can really rethink my entire transformation. Across all of the CIOs that we've asked, nine out of ten are now shifting to the mindset that it is time for change. How do I re-envision and reimagine my processes in this dawn of agentic AI? I can take what I have now and just automate it as it exists, or I can start to rethink my processes so that they can really be optimized by agents. AI is demanding this change, so these are the themes that our CIOs across pharma are starting to ask now. How do I start bridging the gap between experimentation and actual execution and implementation? That's where the crux of what life sciences CIOs want to see.

Thumbnail 100

So how do we do that? How do we start thinking about that change? Before I go there, just to reiterate, this is a survey from research that ZS had done with life sciences CIOs. We have a council where we get a lot of insight into where you want to go and what you want to see. As you can see, nine out of ten want the pace of digital, AI, and tech innovation to grow and scale. What does that mean to us as technologists, as people focusing on insights in agentic AI? That means we cannot continue to think about technology transformations or AI transformations the way we have before. We have to think about them in terms of what is the end value that I want? What are the people and process changes that I need so that we can get to that true transformation?

Thumbnail 160

You can see nine out of ten are asking us to shift that mindset. A lot of the time at the senior levels, we've seen this mindset shift, but how do we permeate that throughout the entire organization? And what does that mean for us in data management? There are two core paradigms from a data management standpoint that we like to think about. At ZS, we define it as AI for data and data for AI. What is our goal? What is the goal of data management? The goal of data management is to enable accurate, high-quality insights. At a short level, there might be process optimization down the line that you use for your data. There may be other operational capabilities that we need to enable, but at its core, the reason we are doing data management is to enable insights.

When we talk about data for AI, one thing that we've all probably learned very quickly is that it's easy to get inaccurate results using generative AI. When we first started doing a lot of this work, the best accuracy we could get by using just an off-the-shelf LLM for conversation was around seventy percent. Nobody wants to go to their CIO or their executive leaders with seventy percent accuracy, so I'll talk a bit about how we prepare data for AI in order to ensure accurate insights and high-quality results.

The second paradigm is AI for data, and even within that, there are two components. First is AI for data engineering. How do I optimize my software engineering lifecycle to be more efficient and effective? We've seen you can get up to forty percent efficiency gains in that. Think about it from requirements all the way through to deployment. How do I maximize my efficiency in data management? The next piece is once I've created that data, once I've created my data products, how do I then operate this? Think about data quality, data governance, access management policies, security, all this data and metadata that I've created. How do I sustain that in an operating model?

Thumbnail 290

We've all talked about metadata governance and discoverability to date. Why hasn't it worked? Because it takes a lot of effort to curate and create that metadata. So how do we sustain that? We believe we can use AI not only to create the data but also to maintain and sustain the data. I'll talk through a bit more of these paradigms and how we've seen it permeating across the industry. Alright, I spoke a bit about this, but there are three areas where we can really transform. Number one is reimagining data consumption. What does that mean? I've now curated and created my data such that it's effective for AI. What agentic AI is allowing us to do is rethink human-agent and human-machine interaction so that we can get and consume insights in a different way.

How do I transform my analytic workflows so that I no longer just have to receive an insight? How do I interact with my insights? How do I interact and prioritize those insights to get the most valuable one? Agentic AI is really allowing us to completely transform the way we consume insights today and produce insights today.

Why can we redefine data engineering now? Most people started their agentic transformations in other use cases about a year ago. But for data management and data engineering, it wasn't really working. We were writing SQL that might not have been the best. We were writing code but weren't producing the right things, and it was because we lacked the right context. Agents and AI require a high level of context to be accurate, even for code and engineering development. But this is one area where, given where we are today and the context we can provide, we can truly accelerate the data engineering lifecycle by using agents.

Yesterday, Amazon launched a few of their frontier agents where they were talking about DevOps perspective, operations perspective, and development that can be used for redefining this engineering lifecycle. How do we start looking at some of those frontier agents and creating our own frontier agents to maximize the scaling of the engineering lifecycle? Next, consider data governance. How do we curate metadata that we need to drive this accuracy by leveraging AI? How do we have adaptive governance and self-healing data quality? How do I automate the retrieval of information and enhance transparency?

We have all talked about discoverability and verification for years, but it hasn't been successful. We are all striving to learn this verification, but how do we now use agents to ensure that we can maximize that? We finally have the motivation and incentive to create the metadata we needed for that verification because we can now use it not only for discovery and not only for quality, but for insights. We have got the incentives lined up and we have the tools and capabilities to do it now. So the time is now for us to really focus on data management innovation.

Thumbnail 440

Data for AI: Building a Metadata Lake to Achieve Near-Perfect Accuracy

What does the first paradigm, data for AI, mean? We have all probably heard about data products and analytic-ready datasets, but to drive accuracy for my insights, I actually need to create a much more broad set of metadata. We have coined this at ZS as our metadata lake, so we have a data lake, but we need to now create a metadata lake. When we think of metadata, most people will look at the top and say table and column level metadata. Perfect. I have all the metadata I need. I can give you a field description in English. I can give you a table level description in English.

That is not going to take us from that 70% accuracy to the 100% that we need to really ensure people are using and adopting our insight models. We need to explain lineage and join conditions. We need to explain business rules and have those codified so that an entire organization can consistently use those business rules. Then you need to go one level deeper. This is at a high level across all different domains. How do I now start codifying different domains differently?

There may be a nuance where, for example, sales means something in market access, sales might mean something differently for a field rep, and sales means something completely different for someone in finance. So as I go deeper in my domains, I also need to create the right metadata so that my AI and different agents know the nomenclature I am using correctly and adequately because even those small nuances that are different at a domain level will drive the accuracy and change the accuracy that we have.

So now not only do I have to maintain a data lake, I now need to create a metadata lake that is comprehensive and accurate at a domain level. The other important paradigm change here is your operating model. We may have worked in these centralized operating models. A lot of firms have started to move to your data products operating model, but it becomes even more relevant here. You now need deep business involvement working with IT to bring this to life.

The subdomains that you are seeing here can only provide the context that comes from having that domain knowledge and that deep experience with your area of business. Only a finance person will be able to help us articulate whether we are providing the right context for finance. Only someone that works deeply in market access can help us codify whether we can use agents to create this. Absolutely, but they still need to be fed that context at some level so that we are helping them refine the accuracy of this metadata lake.

By implementing things like this, we have truly seen it go from that 70% to about 98%. We are still working towards the 100%. That is our goal. ZS is really focusing next year on getting it to that 100% accuracy across insights.

Thumbnail 620

Transformative Use Cases: From Analytics Automation to Software Engineering Efficiency Gains

Now let's talk about how this comes to life and what the use cases are. This slide is overwhelming, so don't necessarily read all the details. This is a client we've been working with to bring the entire insights and analytics engine to life.

In pharma, as you may be familiar, we often have large analytics teams executing and turning analysis of all different kinds. It's an army of humans producing these analyses, whether in forecasting, performance insights, or other areas. We have hundreds and hundreds of people executing analytics across different types, from simple descriptive analytics like asking how much did my sales drop today, to deterministic analytics and predictive analytics. All of them have different human workflows enabled to bring them to life, and at the end, they also have that last mile insight that needs to be provided.

How have we been using Agentic AI to make this real and to make this automated? First, we create analytic-ready datasets organized by different domains. Then we ensure we have a deep context layer that expands even further than what I was showing before. Now I'm not only creating the context for my data in all of that metadata lake, but I also have to create context of what those human workflows are like. Your knowledge base grows when you're asked to do a driver analysis as a human. The five steps you traditionally follow could be one path, but you might also follow a different step in a different circumstance. All of that knowledge of the human workflow has to be codified.

We can really bring this to life by creating an intelligence layer with APIs accessing those traditional classical AI models. I now have agents for these different use cases that execute that model following the human workflow I would have had before. Of course, we keep different humans in the loop as you grow in complexity of the type of analytics. You'll have more humans in the loop and that knowledge base will continue growing. At your descriptive analytic levels, you probably can remove the human in the loop because these are very simple questions you might be asking in natural language.

This is one of the pretty big transformative use cases we've seen in pharma where we gain 40% efficiency and the time to insight completely decreases. I've now removed a lot of those manual steps from the process because I've codified not only my algorithms and models, but I've also codified a lot of the human intelligence that my agents can help execute.

Thumbnail 770

There's a different use case more on your clinical side, and my colleagues will be speaking about this in depth later around 11:30. One of the other big use cases in pharma is document generation, and there are many use cases for document generation on the commercial side. You'll create content for marketing and on the supply chain there will also be document generation. On the clinical side, there are hundreds and thousands of documents that need to be generated.

Traditionally, we would have followed a very complex process for generating these documents. We can now use generative AI to power a knowledge hub of your protocols, your informed consent, and all of the different steps that need to be followed to create these documents more efficiently, saving tons of money and time. We're also helping ensure that we're following validated procedures to create these documents because compliance can be embedded across this entire ecosystem as we create the files.

I won't go too in depth since my colleagues are speaking about this in about 30 minutes, so hopefully everyone interested in document generation will go there. This is one of the bigger use cases we've seen take off not just on the clinical side but also on supply chain and commercial. It's a fantastic use of generative AI because it's really great at creating narratives and content. The key question is how do we constrain it to help us be more effective in these massive areas of document generation.

Thumbnail 850

Now my favorite topic, where I'm closest to, is what I was speaking to earlier: the software engineering life cycle. This is an area where we've started doing a lot of work and starting to drive efficiency. One of the biggest areas of efficiency driver in the software development life cycle is the build and test phase. It's very easy to build net new code using generative AI. If you have legacy software though, it's a little bit more difficult. We found that generative AI is much better at the newer Python-based languages. If you're doing something pretty old, it might not be as effective.

For testing, we've seen up to 75% efficiency gains. It is amazing at looking at your code, finding errors, and really helping us execute that entire test life cycle. Across the entire software engineering life cycle, there is a ton of efficiency to be gained, up to 40% in our entire pipeline development just today by implementing agents across the board.

We've started doing this, and this is our initial measure of what we've seen from an efficiency perspective. One of the focus areas for ZS next year is really how we continue to drive this to scale and ensure that entire organizations can do it consistently so that their whole data management ecosystem can leverage and be effective from this perspective.

Thumbnail 930

Key Takeaways: Business Context as King and the Need for Cross-Functional Collaboration

So what do we want you to take away? Number one, I spoke about this. It is time for CIOs, for pharma organizations, and for all of us to start thinking about how to take your insights to scale. We're not doing this anymore in a proof of concept mode. We really want to transform our organizations, whether it be how our organizations create and manage their data, whether it be how organizations do analytics at scale, whether it be how we create documents for a clinical organization. There are so many use cases, but if we look at it with a narrow mind, we are not going to extract the value because our operating model has to change.

We are now in this human agent interaction era and we all have to be aligned on the goal at hand, the transformation that we want to achieve, and how we operate together. How do we provide sustained ROI? Before you start this transformation, think about the ROI that you want to yield. Is the ROI purely on efficiency? A lot of the time we think that's the only gain we can get from generative AI. We think that's the only gain we can get from agentic AI. But there's also a lot of value that can be created. Up front, think about that value creation as well as the efficiency gains that you want before taking on the endeavor of the transformation.

The middle point, and I'm saying this again, is that we should not transform our processes as they are today. We can, but we might not gain the entire efficiency that we want. We need to rethink our processes and then automate those. Nobody likes to start with process flow mapping and value stream mapping, but that's really where a lot of these transformations need to go first. Before we start automating the existing processes, we need to think about what the future process looks like. The paradigms are different now. It is not the same for a bunch of humans to work on a process as it is for humans and agents to work on it together. So what does that new paradigm look like and what is that new objective that we are trying to achieve?

Infrastructure resiliency and the operating model around your infrastructure becomes more important. I think in the keynote we were hearing a lot about the changes in infrastructure and how we manage infrastructure. As we start these agentic transformations, it's very easy for cost to blow up. You can imagine all of these calls, all these GPUs being consumed nonstop. We have to ensure that we're starting with robust infrastructure or thinking about how we are doing our infrastructure before we start these transformations, especially these large scale ones.

Imagine we're producing those gigantic herds of clinical documents I was mentioning before, and we're doing things in the way we used to, not rethinking our infrastructure. That could get very costly for everyone. And did we even end up saving anything where we wanted to go at first? We have to ensure robust infrastructure and robust processes thought through up front.

Now the most important point is cross functional thinking. Digital and AI teams have to work together. People are going to have to collaborate in a way that they might not have had to before. There is only so much an IT team can do alone in these types of transformations. We are all going to have to have a new digital workforce with agents involved in the mix to make these transformations successful. That cross functional thinking, we've all talked about it before. I've been in pharma now for thirteen years, and prior to that in different areas, but in pharma there's been this divide that I've always seen between the business and IT. That will not be sustainable in these types of transformations. They are all end to end cross functional transformations that will not come to life without both of those. One, to set up this infrastructure I was just talking about, it's critical. Two, the other to set the context and the business vision. They have to come together.

The differentiator with agentic AI is that business context and setting it up effectively to ensure we are successful. So hopefully as you all go back, you take one thing out of this: business process transformation is what we're going for. Context is going to be king in the age of agentic AI, and we all need to work together to bring that transformation to life. Thank you, and we'd love to see you at booth 1720 if you have any questions.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)