Kazuya

Posted on Dec 5, 2025

AWS re:Invent 2025 - Build and scale AI: from reliable agents to transformative systems (INV204)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Build and scale AI: from reliable agents to transformative systems (INV204)

In this video, AWS Senior Principal Technical Product Manager Erin Kramer presents the four pillars of building trustworthy agentic AI: reliability, transparency, safety, and ease of use. The session features real-world implementations from Sendbird's delight.ai customer service platform, Lyft's AI-powered support transformation achieving sub-3-minute resolution times, and Cohere Health's Review Resolve system that accelerated medical coverage reviews by 30-40%. Key AWS technologies showcased include Amazon Bedrock AgentCore with built-in observability and sandboxing, the open-source Strands framework downloaded 5 million times, Amazon Nova models with comprehensive customization, and AWS infrastructure including Trainium chips and SageMaker HyperPods. Marc Brooker demonstrates AgentCore's memory capabilities, while Jason Vogrinec shares how Lyft achieved 55% automated resolution through partnership with AWS Gen AI Innovation Center and Anthropic's Claude. The presentation emphasizes that Gartner predicts 40% of agentic AI projects will be canceled by 2027, making trust-first architecture essential for production deployment.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Opening: Building AI Agents You Can Trust

  Please welcome to the stage Senior Principal Technical Product Manager, Agentic AI at AWS, Erin Kramer. Hello, my friends. Good afternoon. Welcome to the very first day of re:Invent, the week when builders like you come to imagine what is next. I hope you walked in with a question or a challenge in mind. What problems can I solve with AI agents?  How do I know if I can trust them? Or where do I even start?

I want you to spend a moment thinking about that question. Are you ready? I'm going to wait. All right, have you got it? All right, good, because that question is your mission this week. What we're going to cover today is just a starting point. It's a lens to help you connect the dots across everything you're going to hear, see, and build at re:Invent. There is no better place to walk in with a big question than re:Invent.

We're all building something new. But if you've been in technology long enough, you know one thing. The systems people actually rely on are the ones that they trust. So how do you build technology that earns your trust and your customers' trust? How do we know if these AI agents are doing the right thing? Over the next 50 minutes, we're going to break down what building agentic AI with trust looks like so you can build and solve meaningful problems in the real world.

The Evolution of Trust: From Customer Reviews to Agentic AI

I'm a builder, just like you. I joined Amazon in 2000 as a web developer. Really, 2000. That gave me a front row seat to nearly every major technology shift of the last quarter century. I still remember a moment that made trust personal for me. Back in 2001, I was working on Amazon customer reviews. Early e-commerce was built on uncertainty. Am I going to get the product that I ordered? Is it going to even arrive at all? Is my credit card going to get stolen?

To help build trust in e-commerce, we decided to show every customer's opinion—the good, the bad, and the ugly—in customer reviews. We scaled reviews in a way that was radical at the time. But when people saw that the reviews appeared instantly and unedited, they trusted that what they saw was real. This was how we built confidence in the business, in the customer experience, and actually fundamentally in the technology.

Every chapter of AWS's 20-year journey has been taking a spark of innovation and scaling it with your trust in us at the core. In 2006, you trusted us to scale compute with EC2 so that you could run your ideas without having your own data center. Then you asked us, what if we didn't have to think about infrastructure at all? What if you could trust that your workloads were running securely and reliably without ever touching a server? So in 2014, you trusted us to run AWS Lambda, the world's first serverless compute service.

In 2017, you trusted us to make scaled machine learning accessible in hours, not months, with Amazon SageMaker. And in 2023, you trusted us to give you instant access to the world's best models with Amazon Bedrock. Giving you the freedom to choose, customize, and innovate without managing infrastructure.

And then in 2024, you trust us to build models with trust from the ground up with Amazon Nova. Nova's trained on responsibly sourced data, built with safety and accuracy as its first class objectives and designed for customization so you can align it with your organization's truth, not somebody else's. And now we're at the next inflection point. You're asking us to give you a way to scale AI agents so that you can trust them for your production systems.

But here's the thing: building AI agents that you could trust is hard. So what's the reason for that? They improvise, they don't behave the same way twice. That's the nature of non-deterministic systems. So Gartner is predicting that over 40% of agentic AI projects will be canceled by the end of 2027. Scaling agents is not just a technical challenge, and it's not just a scientific one, or even a human one. That's why trust has to be built in from the very start.

Reliability: The Foundation of Trusted AI Systems

So what's the good news? At AWS we build the trust foundations. We're making AI reliable, transparent, safe and easy. So that you can focus on what matters, unleashing your ideas. Let me walk you through these four pillars that turn your ideas into trusted systems in the real world. So first, scale without reliability is just risk at speed. A common mistake here is assuming that reliability comes from better prompts or buying more GPUs. But the reality is actually deeper.

So we saw one builder posted this on Reddit: "My AI agent worked great in dev. Then in production, it kept looping the same function call. No logs, no fallback, no way to debug." That kind of breakdown is not because the model is weak. It's because the foundations failed. Compute bottlenecks, missing observability, non-resilient APIs, lack of fallback paths. That's not something you can patch lightly. It actually needs to be built in. So unstable infrastructure can turn the most brilliant algorithms into very expensive experiments in the real world.

That's why we spent nearly two decades building the most secure, extensive, and reliable global cloud infrastructure. This is the same one powering mission critical systems for millions of customers every day. Whatever models you choose to build with, open source or proprietary, small or massive, they're going to run best on AWS. And here's why. So first, AWS is the best place to run NVIDIA GPU workloads. We offer a choice of accelerated EC2 instances for customers to choose the compute solution that maximizes performance, optimizes availability, and lowers the cost of training AI models. And we're meeting the expanding compute demands with utmost reliability with Trainium, our custom chip, purpose-built for high performance AI training and inference. A single Trainium chip can complete trillions of calculations in one second. So to put that in context, consider that it would take one person 331,700 years just to count to one trillion.

So our hardware and software teams co-design every layer from silicon to system to software, so workloads can run faster, safer and more efficiently at scale. And this is why startups like Writer, Luma AI, Hugging Face, and OpenAI are scaling their businesses faster from prototype to production with AWS AI infrastructure.

Infrastructure is just the start. With AI agents, reliability is not just about uptime or speed. It's about accuracy. In the real world, off-the-shelf accuracy is not enough, especially when your customers, revenue, and reputation depend on it. Customization is what turns general intelligence into a strategic business objective. That's why we've introduced comprehensive customization with Amazon Nova models. You have full control from pre-training to post-training. You can fine-tune Nova Micro, Lite, or Pro with your own data, aligning them precisely to your domain. You can even distill smaller models that meet your cost and latency needs.

What we're all seeing is that customization doesn't have a single definition. It's what you as the builder decide it needs to be. You can start simple with prompt engineering and retrieval-augmented generation, which is basically using your enterprise's data to quickly ground the outputs of your AI systems. As you scale, maybe you need deeper control. Maybe you need fine-tuning or preference optimization or even continued pre-training. With AWS, that choice is yours. Here's what this means for your customers. Say an employee asks, why can't I connect to the VPN? The generic model says restart your computer and contact IT. But your customized agentic system checks their identity, verifies last login, runs a connectivity test through an internal API, and comes back with your VPN token expired. I've renewed it and pushed the new configuration to your laptop. You're all set. That's the difference between a model that just answers and a customized agent that knows your domain and can act on it. That's what's going to earn the trust of your customers.

Sendbird's Delight.ai: Building Enterprise-Grade AI Concierge with AWS

Whether you're optimizing Amazon Nova or building with Claude, Mistral, or Llama through Amazon Bedrock, AWS gives you both depth and breadth of model choices, all running on the world's most reliable foundation for AI. I'm very excited to have one of the top AI startup companies share how they've scaled and built trusted AI solutions with AWS reliable infrastructure. Please join me in welcoming John Kim, CEO and co-founder of Sendbird.

How's everyone doing? I hope you all had a good Thanksgiving. Welcome, everyone. My name is John, co-founder and CEO of Sendbird. At Sendbird, we've been working on something very special. We call it delight.ai. Together with AWS, we're going to show you the future of customer service. For the past ten years, Sendbird has been obsessed with one thing: strengthening human relationships. We built this foundation at scale with the security and reliability for the world's largest enterprises. Now we're taking that massive foundation and using it to power something entirely new: delight.ai. It is the world's most powerful AI agent for customer service. It's not just for communications. It is a truly unified AI concierge from sales to support to onboarding on a single voice. It feels very personal. It's a truly continuous experience, and it's delightful. This is a new chapter in how brands connect with their customers.

We partnered with some of the world's most beloved brands around the world. These aren't just experiments. These are category leaders, the brands that you use every day. Working with them, we learned something really profound. Every single one of them wants to treat their customers like a real person, but at scale, that was not possible. We realized the answer isn't just to automate more. The answer starts with understanding your customers.

Understanding the intent, understanding the context and where they're coming from, the history with your brand. Seven billion conversations—that's how many conversations we power every single month for hundreds of millions of people around the globe. We aren't guessing at scale. We live and breathe with our customers around the globe. So when that holiday rush hits or a massive snowstorm grounds the airplanes, you don't have to worry about your AI agents stalling. It just works. Your brand shows up consistently for your customers and also for your employees.

This is delight.ai. It is a unified AI concierge—one agent across all the channels covering the entire customer journey. From the very first hello to the purchase to the support, it brings it all together into one fluid, magical experience. Now let's look at BJ's. They're a retail giant on the East Coast. They do about twenty billion dollars a year in revenue. They used our platform to build a shopper's sidekick. We call her Bev. The results are staggering. We saw a twenty percent increase in average order value. When customers engage with this AI agent, they spend six times more. It's truly personalized help at scale, resulting in measurable revenue and customer loyalty. She does it all—shopping assistance, member care, and even finding the products right there in the store, which aisle to go to pick up your product.

Let's also talk about travel. Meet Norse Atlantic Airways. We all know what travel feels like today, during a storm. It can be chaotic with cancellations, refunds, and stress. Norse introduced this AI concierge named Freya. Here's the difference: most systems see you as a ticket number, but Freya remembers the traveler. She reads your preferences, knowing whether you prefer an aisle or window seat. She answers your questions with empathy. When the chaos hits, she isn't just a chatbot repeating itself. She's the calm in the storm.

In the food delivery business, which is a tough industry, there's a massive orchestration challenge. You have the courier, the customer, and the merchant—three moving parts all happening in real time. One of the largest food delivery companies in Europe used our AI concierge to make them sing. Not only does it handle the coordination, it automates the customer care in an empathetic way. The result: the tickets sent to human agents went down. The anxiety for customers is gone. For complex issues that require that delicate touch, it hands them off to a human agent seamlessly.

We built this delightful AI agent on three major breakthroughs: personalization, presence, and trust. First, personalization. It all starts with memory. It doesn't just process; it learns and evolves. It remembers. Every conversation feels like it was made just for you. Second is presence. Your brand is everywhere, connected across every channel. There are no dead ends, no infinite loops of canned messages. Third is trust—enterprise-grade reliability. It has to be safe. It has to be rock solid, even for the most demanding brands that people love globally.

Memory begins with the basics: CRM records, transactions, and support tickets. But that is just the surface. The real magic happens in the conversations. As customers, we leave these little traces. I'm planning a trip to Seoul. I have to let the dog in. It's my daughter's birthday this weekend. These small moments matter. When you weave them together, you don't just see the data point. You're seeing the living picture. You see the person as a whole.

So we call this the agent memory platform. It is an intelligence layer. It connects the living memory of the customer to the logic of your business. Think of it as two systems working in tandem. First, the view, knowing the customer 360 degrees. Second is a business intent, whether it's reducing churn or driving sales. So next time when a customer reaches out to you, the AI agent doesn't just know them as a record or who they are. He knows what he needs to do.

What if the customer's frustrated? It transitions into recovery. They're hesitating? It nudges them towards the sale. If they're a really loyal customer, it strengthens the bond with the customer. So it gives the AI agent memory a real purpose. Let's talk about omnipresence. I think it's a pretty cool word. Your customers are already everywhere—app, web, SMS, phone calls, even in-store. So usually, when they switch channels, when we as customers switch channels, what happens? The conversation ends. You have to start all over. You have to wait 40 minutes on a phone call. Well, not anymore.

With omnipresence, the context travels with the AI agent and to the customer, so wherever they might be. And what we did is something really cool. We enable simultaneous multi-channel communication. What it means is you can be on a voice call with an AI agent and you can also send a photo or share your location or verify a confirmation while on the call with the AI, all at the same time, without ever breaking the flow. And also, it can even reach out to you. So you can tell your AI agent, "Hey, can you call me back at 8? I've got to pick up my daughter," or "Call me when my food arrives." It turns this fragmented journey into one continuous conversation. No pauses, no dead ends. Always present for you.

So we've seen memory, personalization, and presence. But of course, before any enterprise deploys this into production, there's one question that remains: Can I trust it? It is the right question. And this is why we created something called the Trust OS. It is the first foundation designed for enterprise-grade confidence. It allows you to deploy your AI agent safely and responsibly, and then scale. And trust isn't just a feature; it is a whole framework. We built this Trust OS on four pillars.

First is observability. You get to see everything. What the AI said, the action it took—you can audit it, you can improve it, so you're never in the dark. Second is control. You can test it automatically, stage it, roll it back instantly in production, so you have the keys to the entire operation at your fingertips. Third is human oversight. Because even the smartest system needs quality judgment. So we made it so easy for your team to step in, review, approve, or intervene when necessary. And finally, it's reliability. It is built on the same infrastructure backbone that powers billions of conversations, the same security and resilience that already powers some of the world's largest applications.

So how does this all fit together? I'll start with something that we call Actionbooks. This is how you define workflows and business logic. But you don't have to write code. You write goals, define the guardrails, all in plain English. And AI takes its goals and orchestrates actions across your entire stack—the CRMs, the knowledge base, the real-time APIs. It executes complex workflows reliably and at scale. And to build something this powerful, we need the best infrastructure in the world. And that is why Delight AI is powered by AWS.

So obviously, Amazon Bedrock provides a foundation for reasoning and safety. Amazon's open source offerings give speed and accuracy. We make heavy use of Aurora database to store all these prompts and workflows and conversation history with ACID guarantees. And lots of other things like Amazon EKS, S3, CloudFront, elastic load balancing, and SES ensure this global-scale infrastructure that is low latency, and of course, secure and compliant operations.

The magic of AI agents with the power of the cloud they can trust. So I want to leave you with this. After working with countless enterprises everywhere, really around the world, we realized something very simple. Everyone wants AI. But very few companies actually know how to adopt it really well for their customer experience. So we distill everything into clear, practical, seven-step process.

Step one, you have to really align at the top. Decide what really matters for your business. Define success metrics, not about how many employees are using some AI LLM but really what the success looks like for your customers. Secure budget, set the guardrails. Step two, pick the right use cases. Don't try to boil the ocean at once. Just select two to three high impact use cases, map the workflow, design the agent experience.

Step three, prepare the knowledge and data. So get the right sources, check for quality, see if there's any outdated stale information. Confirm security and data access controls. And of course, step four is the fun part. It's actually building the AI agent. Ingest data, author the action books, add the policies and guardrails, connect tools and systems. And step five, you have to test it, but of course, properly.

So with automated coverage, you also want to make sure there's a quality human review in the early part of the process. Make sure to focus on the quality first. We do this all the time. Don't do the vibe testing with a couple of employees. Actually run the process with coverage. Step six, pilot. Don't rush it out the doors. Test with a small slice of your traffic, five percent, fifteen percent of your global traffic. Do the daily tuning. You want to learn fast from real signals. We call it the hypercare process.

The last step, step seven, is scaling. So more traffic, more channels, more use cases, more regions. So keep iterating, measure the business impact. And with this process, we've seen as short as twenty-one days. So just three short weeks from the first meeting to a production release for a large publicly traded company here, which is mind-blowing if you think about there's a legal process, there's a procurement, there's infosec, all the processes somewhere along the way. So these seven steps, clear, practical, and the fastest way we've seen to see the AI ambition turn into real business outcomes.

Introducing Amazon Bedrock AgentCore for Production-Ready AI Agents

So I'd love to work with you to adopt AI agents for a delightful customer experience. Thank you very much. Back to you, Aaron. So thank you John, and congratulations on building something amazing to help organizations help their customers in nuanced and thoughtful ways. The very human-centric work of startups like Sendbird is actually what gets me excited about this technology. But I wanted to do a quick audience poll. So raise your hand if you consider yourself someone who works in AI. I see some hands out there. All right, now keep it raised if you would have said the same thing three years ago. Not as many hands.

All right, so those of you who dropped your hands, me too. It's okay if you're new to this. Three years ago, I never imagined I'd be working in AI. But as I've learned about this technology, I've realized that applied technology, all that work that I've been doing for twenty-five years at Amazon, it's highly relevant. This space, it's indeed new for all of us, or pretty much all of us. But I want to encourage you to think about your past experience. Think about what you can bring forward to help us build trust in this technology.

And that's why I'm really excited to welcome Marc Brooker. Marc is a VP and distinguished engineer and one of my colleagues at AWS and in my view, one of the greatest builders of all time. He has been behind some of those foundational services in the cloud, but to me, he's always going to be the Lambda guy. So he brought his expertise into this new agentic AI era and helped us build trust and foundations for Amazon Bedrock AgentCore, an agentic platform that enables organizations to get to production with confidence. So please join me in welcoming Marc Brooker to the stage.

AgentCore Capabilities: Memory, Observability, and Developer Experience

Thank you. Well, thank you, Aaron, very humbling introduction. Why did we build Agent Core? We spoke to a lot of customers who are seeing success prototyping agents who were getting excited about the first set of agents they were trying out on their laptops and desktops, but finding it hard to get those agents into production where they can have a real impact on their businesses. Speaking to you, we learned you had a set of needs for bringing those agents into production.

You need a secure, scalable place to run the agent code where your customers can be isolated from each other and issues like prompt injection can be effectively mitigated. You needed a way to connect your existing data sources, services, and microservices to agents, talking the protocols that agents need. You needed a way to observe your agents working, gather metrics on their success, trace and audit their work, and understand cost and performance. You needed a way to have agents remember state, user preferences, and context, so your users didn't have to restart or start over every time for every new conversation with an agent.

You needed a secure, scalable, and controllable way for agents to use the web via a web browser. You needed to give your agents an environment where they could run code allowing them to more efficiently process data and work with data intensive tools without driving up token costs. And you needed to do all of this cost-effectively with great scalability, with high availability, and taking advantage of AWS's world-class infrastructure. You needed all of this while keeping the ability to work with the models that you prefer and the tools that your teams already know.

Some of you we spoke to needed an end-to-end solution with everything built in. And some needed a solution where you could pick and choose the components that worked for you, that fitted into your architecture, or worked with the agentic platform you had already started building. So we built Agent Core, a set of tools and services that makes it easy to get agents into production and critically, to operate them once they're there. Agent Core is aimed at getting you faster to positive ROI on your Agent investments.

Now, I'm going to show off a demo of the developer experience of Agent Core, adding a feature to a small agent that I built. As I was planning my week at re:Invent, I built myself an agent to help me choose which sessions I wanted to attend. I'm interested in AI and databases, and I like to attend sessions early in the morning. And I found that as I interacted with this agent over and over, I found that I had to say the same thing every time: recommend me sessions about databases and AI, give me sessions in the morning, I don't want to go to that late afternoon session about a topic that doesn't particularly interest me.

So I wanted to add memory to my agent. Here we're going to jump into the Agent Core console and create a new memory that I can wire into my agent with just a few clicks. I'm going to choose the user preferences strategy. Here behind the scenes, Agent Core is, with just one click on that checkbox, building a whole pipeline that will take the traces of the conversations your customers have with the agent and extract their preferences from those conversations where they can be used later in prompts. And so there is a lot of work going on behind that click.

Now I click and create the memory. In less than one second, the memory is created. I'm going to jump into VS Code now and integrate this with the agent that I built using Python and our Strands framework. First, a handful of imports. This is the Agent Core memory client for Strands built into the Agent Core SDK.

Next, I'm going to identify the memory ID that was a memory I just created in the console, and the actor ID, which is the user. Here the actor ID is me every time. This is an agent just for me, but in a real production agent, it would be an identity derived from the user's identity, maybe using agent core's identity primitive. Next, a handful of boilerplate: creating the client, wiring in that memory ID, wiring in the session ID, and so on.

Next, the most important part, I need to update the system prompt for this agent to tell it to use these memories and tell it that when the user expresses preferences explicitly, to just say "OK, thank you, I'm going to write those down and remember them for later." But this pipeline will also work for implicit user preferences and also learn from smaller interactions. Finally, wire the session manager, which is that memory client I just created, into Strands. Here we're going to jump back to my agent.

And show what it looks like to express a memory. Now my agent can remember things from session to session. I'm going to type in here: "Hey, I like to attend sessions in the morning. I like to attend sessions about databases and AI." And my agent is going to say, "You know, thank you. I'm going to write that down. I'm going to remember it for next time."

Now we're going to dig a little bit below the covers, inside Agent Core's memory and see what the agent remembered when I asked it to write down this user preference. As I said, this Agent Core memory is powered behind the scenes with a data pipeline that we built for you. And so the first time I run this little script that calls the Agent Core memory API, you're going to see that it has not yet remembered these memories. Here, I see the memory is empty.

But within just a few seconds, that processing has happened in the background. We captured this conversation with this customer. We put it in short-term memory, and then we ran it through a model which extracted the long-term memory and preferences. And so when I ran the script again just a few seconds later, you can see that memory now says the user explicitly mentioned their preference for sessions in the morning about databases and AI. The next time I run my agent, I don't need to tell it that again. That will be included in the prompt and know those things about me.

Over time as I interact with this agent, it will become more and more customized to my preferences and needs, with just this integration with Agent Core memory. This is the first step towards customization, towards making agents that are responsive to user needs, and with Agent Core and Strands, it can be done with just a few clicks and a few lines of code. Thank you. Back to you, Aaron.

Transparency: Making AI Agent Actions Visible and Traceable

Yes. So thank you, Mark. You've seen how intelligent memory makes your agents truly efficient and effective. But to trust your agents, you need more. So on a Hugging Face discussion board, I saw this. "My agent failed silently for two days. It looked fine on the dashboard, but one tool kept timing out in a hidden loop." Another wrote, "Debugging AI agents feels like chasing ghosts. You can't fix what you can't see."

So as builders, we know this pain, right? If we don't know what our system is doing, we can't debug it, we can't improve it, and we can't trust it, especially in production. Enterprises don't adopt black boxes. And for startups, where every customer interaction is a critical first impression, transparency matters even more. So let's start with the foundation. Your machine learning infrastructure and workflows. Imagine being able to see everything in one place: every cluster, dataset, experiment, training run, and deployment, from every node to full production.

So Amazon SageMaker HyperPods built-in observability gives you a single real-time view of performance, utilization, and cluster health. This means you can find problems fast, stay on schedule, and avoid cost surprises.

With SageMaker AI, you don't just build models, you see how they behave in the real world day after day. But with AI agents, the bar is even higher. AI agents don't just generate answers, they take actions. And to trust actions, you need visibility at an entirely new level. To make that possible, we built Agent Core observability to help you see, understand, and control what your agents are doing, whether that's a tool call, chain of thought, or even memory. You can trace every workflow to pinpoint where things go right or go wrong. You can replay actions like rewinding a tape. You can view intermediate states and see how the agent is reasoning. You can track performance metrics to catch issues quickly. And you have full audit trails that verify that your system acted exactly as intended. In short, observability is not just a feature, it's how you build trust in systems that can think and act for themselves.

Safety: Governance and Compliance Built into AI Agent Architecture

You've seen how transparency gives you visibility. But visibility alone is not enough, because seeing a problem doesn't stop it. One builder posted that their prototype agent sent a real customer report to the wrong Slack channel. It wasn't a hallucination. It was a permissions miss. Another posted on GitHub that the agent executed the wrong tool call on a production database. They said they perfectly followed instructions, just the wrong ones. So a lot of us get these requests from our customers with security and governance concerns, particularly as they scale. Given these AI agents and their high level of autonomy, data security and policy enforcement are more critical than ever.

It's important to set clear guidelines for your agents regarding which data can be shared, which vendors are off limits, maximum thresholds, and so on. And compliance with regulations must be coded into the very architecture of these agents. But here's the good news. AWS builds governance into the foundation, so your agents behave with the discipline that you'd expect. We recently launched the Responsible AI Lens to guide you through the best practices you build. These best practices help you surface risks early, avoid costly rework later on, and move to production with confidence. These eight simple focus areas walk you through everything from defining your use cases to monitoring it in the real world. You're not slowing down. You're speeding up responsibly.

And of course, we use the Responsible AI Lens in how we build our services too. For example, Agent Core identity and sandboxes make sure that agents only get the minimum access they need. Keep each session separate and safely contain any risky actions. Policy and compliance controls follow global standards like GDPR, HIPAA, and FedRAMP. And the guardrails for Bedrock handle content filtering, safety checks, and compliance automatically. That way you can roll out agents across teams, regions, and industries without worrying about breaking the rules.

So what does this look like for real people? When you're waiting to hear if your medical treatment is covered by your health insurance, every hour matters. So using Bedrock Agent Core, Cohere Health built Review Resolve, an agentic system that reads clinical notes and test results to surface the key details that determine coverage. Reviews are now thirty to forty percent faster with fewer errors, meaning patients get answers sooner with less stress and uncertainty. And because healthcare is a highly regulated environment, Cohere needs more than just speed. They need trust. Agent Core gives them audit trails, session continuity, and explainable decisions across multi-hour reviews so that every decision is explainable, traceable, and grounded in the patient's real medical context.

Ease of Use: Strands Framework and the Gen AI Innovation Center

So I know the thing that sparked my imagination when I first saw generative AI a couple of years ago was how approachable it was to use. And when we were thinking about helping people use Agent Core to build AI agents, we knew that it wasn't enough to make it powerful. It also had to be easy to build with. So we designed Agent Core so any builder, developers, data scientists, business teams, and even product managers can stand up agents, see what they're doing, and scale them into production.

Ease of use creates adoption and trust. You may have caught in Mark's demo that he was using a library called Strands. Strands is the agentic AI framework that we built for ourselves as we were building increasingly capable and trustworthy frontier agents.

We realized that these patterns were ones that many builders would encounter when building agents, so we open sourced Strands. It has quickly become one of the most active open frameworks for building AI agents, downloaded almost 5 million times since launch. Strands agents is an SDK for building and running AI agents. It's open, it's model-driven, it's easy, and it's fast.

With Strands, you can go from idea to working agent in just a few lines of code. There's no complex orchestration, no heavy scaffolding—just define your model, your tools, and your prompt. Strands handles the reasoning, chaining, and execution. Because this is the framework we are actively using to build agents within Amazon, it will continue to evolve to reflect our learnings and best practices. When you're ready to go to production grade, you can move seamlessly to AgentCore.

Once you have your AWS account set up, it should take you less than 5 minutes to get your first agent up and running. Easy means that anyone can start, but you don't have to build all of them yourself. The Gen AI Innovation Center brings AWS experts and partners together to help organizations move from early prototypes to real-world systems responsibly and at speed. From choosing the right models to customizing AI to building agents that work alongside people, the Gen AI Innovation Center is available to you.

Now with the new physical AI fellowship, we're supporting the future of AI. Let me introduce you to a customer who has transformed their business with the help of the Gen AI Innovation Center and made AI truly accessible to their users. Please join me in welcoming Jason Vogrinec, Executive Vice President at Lyft.

Lyft's AI Transformation: Revolutionizing Customer Support at Scale

Thank you very much. Hi everybody. I'm Jason Vogrinec, EVP of Foundations and AI Transformation at Lyft. I'm thrilled to be here today to talk to you about how we've transformed customer support with AI. Over the next 10 minutes, I'll talk to you about how we made AI work for real customers with real problems and real results.

Every car on the Lyft platform has two customers in it: a rider and a driver. Many of these drivers depend on the platform for their livelihood. It's how they feed their family and pay their rent. For many, this isn't a side hustle—it is the job. In the other seat, we have riders who depend on our platform to get home safely, to pick up their kids from school, or to get to that job interview on time. The platform only works if both sides trust us completely.

It could be a payment issue, vehicle damage, a safety concern, or an account question. Every single support interaction is a moment where we either keep that trust or lose it. That's why for us, customer support isn't a cost center—it's a competitive differentiator and it's core to our purpose to serve and connect.

As we entered 2024, we knew we were not meeting customer expectations. But it wasn't for lack of trying. As our business grew, we had agents serving customers every single day. But as the business grew faster, every new rider and driver meant more agents, and the cost curve went in one direction: up. We also saw really consistent inconsistent experiences with thousands of agents around the world in different time zones with different training levels. It made it very difficult to get consistent experiences for customers.

These are some quotes that we heard from customers. One driver told us their answers are so calculated, always the same—it just looked like robot answers. Another driver spent 45 minutes chasing down a $2 cancellation fee after a 10-hour shift. And another said it's such a hassle to get help, I don't even want to try. But these aren't complaints—they're symptoms of three core problems that we needed to address. First, Lyft was misunderstanding our customer problems. Second, our help content was hard to navigate. And frankly, our support felt designed to reject and deny rather than to help.

And this had to change. With the latest developments in generative AI, we asked ourselves a simple but powerful question: What if we use AI to transform customer experiences into ones that actually work for them? Not just incremental improvements, but fundamental transformations. What if instead of forcing our drivers and riders to navigate our organizational chart or our help content, AI could actually understand their problems and respond to their specific needs? What if instead of waiting hours for a support agent, someone could get help in minutes—help that was personalized and contextual, and actually solve their problem?

But before we wrote a single line of code, we did two really important things. First, we secured executive buy-in. Not just budget approval, but genuine alignment that this was about transformation—transforming how we serve our customers, and that these capabilities create durable and competitive advantages that weren't about cutting costs. Second, we went direct to our customers. No internal pilots or random experiments, no testing with employees first. We went straight to drivers with real problems, real pressure, and real stakes, because that was the only way to know if it was actually going to work.

From those conversations, three very clear principles emerged that shaped everything we've done. The experience had to be easy. No more navigating complex menus or searching through help articles—just natural conversation. It also had to be fast. Every minute drivers are offline is an earnings opportunity lost. We needed minutes to resolution, not days. And it had to be accurate. Getting the right answer matters when it affects someone's livelihood, their ability to pay rent or support their family. Easy, fast, accurate—the three principles that became our foundation and how we measured our success. Not by how much we automated, but by whether someone got their answer in three minutes instead of three days.

Now, having clear principles sounds great, but here's the reality. Knowing what we needed to build and actually building it were two very different things. We faced three significant challenges. First, there was no playbook. A lot of the initial AI stories that we had heard about and examples we had seen were relatively simple RAG solutions. But we needed something that was going to be far more capable. Second, we knew we had a trust gap. Our customers didn't trust our old systems, so why would they trust a new one that said it was AI? Just like many of you, I've experienced the previous generation of chatbots, and they don't engender trust. Third, we faced the classic quality versus speed trade-off. Traditional thinking says you can't have both, but we knew we had to move fast and we had to get it right the first time.

So how did we overcome these challenges? Well, we knew we couldn't do it alone, and so creating the right partnerships was absolutely critical to our success. And I want to emphasize that these were true partnerships, not vendor relationships. We partnered with Anthropic for their cloud models because we needed AI that could actually reason, that could understand context, and that could maintain natural conversations at massive scale. We partnered with AWS for two things: the infrastructure needed to reliably support millions of interactions on a regular basis, and the Generative AI Innovation Center, where their team worked alongside ours to solve problems that neither of us had ever solved before. This wasn't "here's a product, go figure it out." This was "let's build something together that neither of us could have done alone." And that collaboration made all the difference.

So let me show you how we tackled one of our biggest challenges: routing. Drivers can contact support for hundreds and hundreds of different types of issues, and trying to categorize those right the first time is nearly impossible. Working with the Generative AI Innovation Center, we built an intent agent. And here's what makes it special: it combines user details with smart, disambiguating questions. Instead of forcing customers through a rigid menu tree, the AI has a conversation to figure out the customer's real intent. Making AI easy meant making it conversational. Just tell us what's wrong and we'll figure it out together.

What does this look like in practice? Well, the interface is pretty simple. Powered by AI, customers just type their message.

Behind the scenes, Claude is analyzing context, asking clarifying questions when needed, and routing to the right resolution path. The complexity is hidden, the experience is simple. And frankly, the immediate impact was big. Our customers are now experiencing resolution times of less than 3 minutes on average, down from 16, and sometimes as long as 3 days. From frustration to resolution in the time it takes to grab a coffee. Internally, we created capacity for our human agents because the impact of our early AI investments was big. Fifty-five percent of our customer interactions today are being resolved without requiring a single human agent. That was two years ago. And this meant that our specialists now can focus on where they're needed most: difficult disputes, complex cases, safety incidents, the things that require human judgment.

And both of these improvements combined allowed us to open the ability to support access to support for new customers. While other companies often bury support to reduce costs, we wanted to take the opposite approach. We moved support access to be front and center so that all customers felt that support could be easy. This is what transformation looks like. This is what happens, I think, when you take AI that is actually easy, fast, and accurate at scale.

Now while measuring impact is important, so is evaluating lessons learned. The intent agent is only one of the agents we've deployed this year, and the more agents we launch, the more we learn. First, nothing beats real customer experience. Internal testing is good, but nothing compared to what we learned in the first weeks of real feedback. Plan for rapid iteration based on what you hear. We also learned very quickly that evals are an art, not a science. We spent a lot of time trying to perfect our evaluation metrics upfront. We found it better to start somewhere reasonable and evolve as we learned what actually mattered to customers.

Launching something good enough with strong guardrails and adapting quickly was what made the big difference. We made changes daily in those first few weeks, not because we built it wrong, but because real world complexity always exceeded our models. And third, AI agent timelines don't fit neatly in a roadmap. Traditional software development is relatively predictable. AI agents are far more organic. In one of the launches in dev, our AI agent worked great. But with the introduction of millions of customers, we found new edge cases and had to do a tremendous amount of redefining prompts and adjusting guardrails. These things weren't bugs; they're the nature of AI system building. And so build flexibility into your timelines.

So what's next for us? Well, we're focused in three areas. We want to engage with our customers in the medium that works best for them, be it voice, image, or text. Later this year, we'll launch our first multi-modal AI agent to resolve driver issues with cleaning fees and damage. We're also not building one-off solutions. We're creating a platform that lets us rapidly deploy new AI agents across different support scenarios. Think of it as AI infrastructure that makes it easier to build new tools. And so here's my takeaway for you all. Making AI easy isn't about the technology alone. It's about deeply understanding your customers' pain points, having the courage to face them honestly, and partnering with the right people to build solutions that work. Thank you so much.

Closing: Building with Trust at Every Layer

Thank you, Jason. I have always admired Lyft as a company that has been at the forefront of serving people in innovative ways, leveraging the latest technology, whether it was the cloud, mobile, or now agentic AI. I hope you're all as inspired as I am by how Lyft is building trustworthy systems that put people first. You've seen what's possible and what it takes to build trust in agentic AI in the real world. From being reliable to transparent, to safe, to easy. Every layer we build has one purpose: so you can build with trust at every layer. That way you don't have to slow down to stay safe. You scale because of it.

So when you put your trust in AWS, you're not choosing infrastructure. You're choosing a company whose technology powers Amazon at massive scale, battle tested and improved daily. Every lesson we learn we bring straight back to you.

In my 25 years at Amazon, one thing has never changed. We build so you can build. Whether it's using Sendbird's fully managed delight AI to build highly personalized customer experiences, building something that uniquely meets your needs with solutions like SageMaker AI and Bedrock AgentCore, or simply getting you safely to the airport, trust must be at the core of everything that all of us build in this new space.

I'm going to leave you to head off for the rest of your re:Invent week, but do me a favor. Remember that question that I asked you to think about at the beginning? Have you got it? I want you to think about how you're going to prioritize trust as your answer to that question. To get you started, here are sessions that are going to help you go deeper.

First, we want to encourage you to continue your learning journey with AWS Skill Builder, where you can dive deeper with more than 1,000 free expert-led online training courses. Second, we are excited about how easy it is to build powerful agents now, and I'd love for you all to join us tomorrow when we are kicking off our new AWS AI League agentic AI challenge. You can either build your own agent in our workshop or watch top finalists from the 2025 championship compete head to head for $25,000 and ultimate bragging rights.

Finally, Amazon is partnering with Code.org for the 2025 Hour of AI. This is a global initiative that brings AI education directly into classrooms through hands-on, easy-to-follow activities. So I invite you to join us to help the next generation develop the skills they need to grow and thrive in this new technology. And with that, thank you so much. Let's go build.

; This article is entirely auto-generated using Amazon Bedrock.