Kazuya

Posted on Dec 6, 2025 • Edited on Dec 7, 2025

AWS re:Invent 2025 - Build and scale AI: from reliable agents to transformative systems (INV204)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Build and scale AI: from reliable agents to transformative systems (INV204)

In this video, Erin Kraemer, Senior Principal Technical Product Manager for Agentic AI at AWS, presents the four pillars of building trustworthy AI agents: reliability, transparency, safety, and ease of use. The session features real-world implementations from Sendbird's Delight AI platform achieving 20% increase in average order value, Lyft reducing support resolution times from 16 minutes to under 3 minutes with 55% automation, and Cohere Health accelerating medical reviews by 30-40%. Marc Brooker demonstrates Amazon Bedrock AgentCore's capabilities including intelligent memory and observability features. The presentation emphasizes AWS infrastructure advantages with Trainium chips, Amazon Nova models, and the open-source Strands Agents framework downloaded 5 million times. Key technical solutions include the AWS Well-Architected Responsible AI Lens, guardrails for compliance with GDPR and HIPAA, and comprehensive customization options from prompt engineering to continued pre-training.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Opening: Building AI Agents with Trust at re:Invent

  Please welcome to the stage Senior Principal Technical Product Manager, Agentic AI at AWS, Erin Kraemer. Hello, my friends. Good afternoon. Welcome to the very first day of re:Invent, the week when builders like you come to imagine what is next. I hope you walked in with a question or a challenge in mind. What problems can I solve with AI agents?  How do I know if I can trust them? Or where do I even start?  I want you to spend a moment thinking about that question. Are you ready? I'm going to wait. All right, have you got it? Good, because that question, that's your mission this week.

What we're going to cover today is just a starting point. It's a lens to help you connect the dots across everything you're going to hear, see, and build at re:Invent. Honestly, there is no better place to walk in with a big question than re:Invent. Here's the thing: we're all building something new. But if you've been in technology long enough, you know one thing. The systems people actually rely on are the ones that they trust. So how do you build technology that earns your trust and your customers' trust? How do we know if these AI agents are doing the right thing? Over the next 50 minutes, we're going to break down what building agentic AI with trust looks like so you can build and solve meaningful problems in the real world.

I'm a builder, just like you. I joined Amazon in 2000 as a web developer. Really, 2000. That gave me a front row seat to nearly every major technology shift of the last quarter century. I still remember a moment that made trust personal for me. Back in 2001, I was working on Amazon customer reviews. Early e-commerce was built on uncertainty. Am I going to get the product that I ordered? Is it going to even arrive at all? Is my credit card going to get stolen?

To help build trust in e-commerce, we decided to show every customer's opinion: the good, the bad, and the ugly in customer reviews. We scaled reviews in a way that was radical at the time. But when people saw that the reviews appeared instantly and unedited, they trusted that what they saw was real. This was how we built confidence in the business, in the experience, and actually fundamentally in the technology. Every chapter of AWS's 20-year journey has been taking a spark of innovation and scaling them with your trust in us at the core.

AWS's Journey from EC2 to Amazon Nova: Two Decades of Earning Customer Trust

So in 2006, you trusted us to scale compute with Amazon EC2 so that you could run your ideas without having your own data center. Then you asked us, what if we didn't have to think about infrastructure at all? What if you could trust that your workloads were running securely and reliably without ever touching a server? So in 2014,

you trusted us to run AWS Lambda, the world's first serverless compute service. In 2017, you trusted us to make scaled machine learning accessible in hours, not months, with Amazon SageMaker. And in 2023, you trusted us to give you instant access to the world's best models with Amazon Bedrock, giving you the freedom to choose, customize, and innovate without managing infrastructure.

And then in 2024, you trusted us to build models with trust from the ground up with Amazon Nova. Nova's trained on responsibly sourced data, built with safety and accuracy as its first-class objectives, and designed for customization so you can align it with your organization's truth, not somebody else's. And now we're at the next inflection point. You're asking us to give you a way to scale AI agents so that you can trust them for your production systems.

But here's the thing: building AI agents that you can trust is hard. So what's the reason for that? They improvise, they don't behave the same way twice. That's the nature of non-deterministic systems. So Gartner is predicting that over 40% of agentic AI projects will be canceled by the end of 2027. Scaling agents is not just a technical challenge, and it's not just a scientific one or even a human one. That's why trust has to be built in from the very start.

So what's the good news? At AWS, we build the trust foundations. We're making AI reliable, transparent, safe, and easy so that you can focus on what matters: unleashing your ideas. So let me walk you through these four pillars that turn your ideas into trusted systems in the real world.

Reliability: AWS Infrastructure Powers Mission-Critical AI Workloads

First, scale without reliability is just risk at speed. A common mistake here is assuming that reliability comes from better prompts or buying more GPUs, but the reality is actually deeper. We saw one builder post this on Reddit: "My AI agent worked great in dev. Then in production, it kept looping the same function call. No logs, no fallback, no way to debug." That kind of breakdown is not because the model is weak. It's because the foundations failed: compute bottlenecks, missing observability, non-resilient APIs, lack of fallback paths. That's not something you can patch lightly. It actually needs to be built in.

Unstable infrastructure can turn the most brilliant algorithms into very expensive experiments in the real world. That's why we spent nearly two decades building the most secure, extensive, and reliable global cloud infrastructure. This is the same one powering mission-critical systems for millions of customers every day. Whatever models you choose to build with, open source or proprietary, small or massive, they're going to run best on AWS. And here's why.

First, AWS is the best place to run NVIDIA GPU workloads. We offer a choice of accelerated EC2 instances for customers to choose the compute solution that maximizes performance, optimizes availability, and lowers the cost of training AI models. And we're meeting the expanding compute demands with utmost reliability with AWS Trainium, our custom chip purpose-built for high-performance AI training and inference.

A single Trainium chip can complete trillions of calculations in one second. So to put that in context, consider that it would take one person 331,700 years just to count to one trillion. Our hardware and software teams co-design every layer from silicon to system to software so workloads can run faster, safer, and more efficiently at scale. And this is why startups like Writer, Luma AI, Hugging Face, and OpenAI are scaling their businesses faster from prototype to production with AWS AI infrastructure.

Customization with Amazon Nova: From Generic Models to Domain-Specific Agents

But infrastructure is just the start. With AI agents, reliability is not just about uptime or about speed. It's about accuracy in the real world. Off-the-shelf accuracy is not enough, especially when your customers, revenue, and reputation depend on it. Customization is what turns general intelligence into a strategic business objective.

And so that's why we've introduced comprehensive customization with Nova models. You have full control from pre-training to post-training. You can fine-tune Nova Micro, Lite, or Pro with your own data, aligning them precisely to your domain, and even distill smaller models that meet your cost and latency needs. What we're all seeing is that customization doesn't have a single definition. It's what you as the builder decide it needs to be.

So you can start simple with prompt engineering and retrieval augmented generation, or basically using your enterprise's data to quickly ground the outputs of your AI systems. And as you scale, maybe you need deeper control. Maybe you need fine-tuning or preference optimization or even continued pre-training. With AWS, that choice is yours.

And here's what this means for your customers. Say an employee says, "Why can't I connect to the VPN?" The generic model says, "Restart your computer and contact IT." But your customized agentic system checks their identity, verifies last login, runs a connectivity test through an internal API, and comes back with "Your VPN token expired. I've renewed it and pushed the new configuration to your laptop. You're all set." That's the difference between a model that just answers and a customized agent that knows your domain and can act on it. That's what's going to earn the trust of your customers.

Sendbird's Delight AI: Personalization, Presence, and Trust at Scale

And whether you're optimizing Nova or building with Claude, Mistral, Llama through Amazon Bedrock, AWS gives you both depth and breadth of model choices, all running on the world's most reliable foundation for AI. So now I'm very excited to have one of the top AI startup companies to share how they've scaled and built trusted AI solutions with AWS reliable infrastructure. Please join me in welcoming John Kim, CEO and co-founder of Sendbird.

All right. How's everyone doing? You guys have a good Thanksgiving? All right. Welcome, everyone. My name is John, co-founder and CEO of Sendbird. At Sendbird, we've been working on something very, very special. We call it Delight AI. Together with AWS, we're going to show you the future of customer service.

So for the past 10 years, Sendbird has been obsessed with this one thing: strengthening human relationships. We built this foundation at scale, the security, the reliability for the world's largest enterprises. Now, we're taking that massive foundation and using it to power something entirely new: Delight AI. It is the world's most powerful AI agent for customer service.

And it's not just for communications. It is a truly unified AI concierge from sales to support to onboarding on a single voice. It feels very personal. It's a truly continuous experience, and it's delightful. So this is a new chapter in how brands connect with their customers.

So we partnered with some of the world's most beloved brands around the world. And these aren't just experiments. These are category leaders, the brands that you use every day. So working with them, we learned something really profound: that every single one of them wants to treat their customers like a real person. But scale, of course, was not possible. And we realized the answer isn't just let's automate more.

The answer starts with understanding your customers, understanding the intent, understanding the context and where they're coming from, and the history with your brand. Seven billion conversations. That's how many conversations we power every single month for hundreds of millions of people around the globe. We aren't guessing at scale. We live and breathe with our customers around the globe. So when that holiday rush hits or a massive snowstorm grounds the airplanes, you don't have to worry about your AI agents stalling. It just works. Your brand shows up consistently for your customers and also for your employees.

So this is Delight.ai. It is a unified AI concierge, one agent across all the channels covering the entire customer journey. From the very first hello to the purchase to the support, it brings it all together into one fluid, magical experience. Now, let's look at BJ's. They're a retail giant on the East Coast. They do about twenty billion dollars a year in revenue. They used our platform to build a shopper's sidekick. We call her Bev. The results are staggering. We saw a twenty percent increase in average order value, and when customers engage with this AI agent, they spend six times more. Six times. It's a lot. It is truly personalized help at scale, resulting in measurable revenue and customer loyalty. And she does it all: shopping assistance, member care, even finding the products right there in the store, which aisle to go to to pick up your product.

Also, let's talk about travel. Meet Norse Atlantic Airways. We all know what travel feels like today during the storm. It can be chaotic, the cancellations, the refunds, the stress. So Norse introduced this AI concierge named Freya. And here's the difference. Most systems see you as a ticket number, but Freya remembers the traveler. She rebooks for you, knowing your preferences. Is it aisle or window seat? She answers your questions with empathy. And when the chaos hits, she isn't just a chatbot that's repeating itself. She's the calm in the storm.

And food delivery business, my gosh, this is a tough, tough industry. It's a massive orchestration challenge. You have the courier, you have the customer, you have the merchant. There are three moving parts, all happening in real time. Just Eat Takeaway, one of the largest food delivery companies in Europe, uses our AI concierge to make them sing. Not only does it handle the coordination, it automates the customer care in an empathetic way. So the result: the tickets sent to the human agent are down. The anxiety for the customers is gone. For the complex issues that require that delicate touch, it hands them off to a human agent seamlessly.

We built this delightful AI agent on three major breakthroughs: personalization, presence, and of course, trust. First, personalization. It all starts with memory. It doesn't just process, it learns and evolves. It remembers. Every conversation feels like it was made just for you. And second is presence. Your brand is everywhere, so it's connected across every channel. There are no dead ends, no infinite loops of canned messages. And the third is trust, enterprise-grade reliability. It has to be safe. It has to be rock solid, even for the most demanding brands that people love globally.

Agent Memory Platform and Omnipresence: Creating Continuous Customer Experiences

And memory begins with the basics: the CRM records, the transactions, the support tickets. But that is just the surface. The real magic happens in the conversations. The customers, we as customers, leave these little traces. I'm planning a trip to Seoul. I have to let the dog in. Or it's my daughter's birthday this weekend. These small moments. But when you weave them together, you don't just see the data point. You're seeing the living picture. You see the person as a whole.

So we call this the Agent Memory Platform. It is an intelligence layer. It connects the living memory of the customer to the logic of your business.

Think of it as two systems working in tandem. First, the view, knowing the customer 360 degrees. Second is a business intent. It'll be reducing churn or driving sales. So next time when a customer reaches out to you, the AI agent doesn't just know them as a record or who they are. It knows what it needs to do.

What if the customer's frustrated? It transitions into recovery. They're hesitating? It nudges them towards the sale. If they're a really loyal customer, it strengthens the bond with the customer. So it gives the AI agent memory a real purpose.

Let's talk about omnipresence. I think it's a pretty cool word. Your customers are already everywhere: app, web, SMS, phone calls, even in-store. So usually, when they switch channels, when we as customers switch channels, what happens? The conversation ends. You have to start all over. You have to wait 40 minutes on a phone call.

Well, not anymore. With omnipresence, the context travels with the AI agent and to the customer, wherever they might be. And what we did is something really cool. We enable simultaneous multi-channels. It's a mouthful. What it means is you can be on a voice call with an AI agent and you can also send a photo or share your location or verify a confirmation while on the call with an AI, all at the same time, without ever breaking the flow.

And it can even reach out to you. You can tell your AI agent, "Hey, can you call me back at 8? I gotta pick up my daughter," or "Call me when my food arrives." It turns this fragmented journey into one continuous conversation. No pauses, no dead ends. Always present for you.

Trust OS and Seven-Step AI Adoption: From Concept to Production in 21 Days

So we've seen memory, personalization, and presence. But of course, before any enterprise deploys this into production, there's one question that remains: Can I trust it? It is the right question. And this is why we created something called the Trust OS. It is the first foundation designed for enterprise-grade confidence. It allows you to deploy your AI agent safely, responsibly, and then scale. And trust isn't just a feature, it is a whole framework.

We built this Trust OS on four pillars. First is observability. You get to see everything. What the AI said, the action it took. You can audit it, you can improve it, so you're never in the dark. Second is control. You can test it automatically, stage it, roll it back instantly in production, so you have the keys to the entire operation at your fingertips.

Third is human oversight. Because even the smartest systems need quality judgment. So we made it so easy for your team to step in, review, approve, or intervene when necessary. And finally, it's reliability. It is built on the same Sendbird backbone that powers billions of conversations, the same security, the resilience that already powers some of the world's largest applications.

So, how does this all fit together? I'll start with something that we call Actionbooks. This is how you define the workflows and business logic. But you don't have to write code. Write goals, define the guardrails, all in plain English. And AI takes its goals and orchestrates action across your entire stack: the CRMs, the knowledge base, the real-time APIs. It executes complex workflows reliably and at scale.

And to build something this powerful, we need the best infrastructure in the world. And that is why Delight.ai is powered by AWS. Amazon Bedrock provides a foundation for reasoning and safety. Amazon Nova, we use it to give speed and accuracy. We are a heavy user of Aurora database to store all these prompts and workflows and conversation history with ACID guarantees. And lots of other things like Amazon EKS, S3, CloudFront, Elastic Load Balancing, and SES ensures this global scale infrastructure that is low latency.

And of course, secure and compliant operations. The magic of AI agents with the power of the cloud they can trust.

So, I want to leave you with this. After working with countless enterprises everywhere, really around the world, we realized something very simple. Everyone wants AI. But very few companies actually know how to adopt it really well for their customer experience. So we distilled everything into a clear, practical, seven-step process.

Step one, you have to really align at the top. Decide what really matters for your business. Define success metrics, not about how many employees are using some AI LLM but really what the success looks like for your customers. Secure budget, set the guardrails.

Step two, pick the right use cases. Don't try to boil the ocean at once. Just select two to three high impact use cases, map the workflow, design the agent experience.

Step three, prepare the knowledge and data. So get the right sources, check for quality, see if there's any outdated stale information. Confirm security and data access controls.

And of course, step four is the fun part. It's actually building the AI agent. Ingest data, author the action books, add the policies and guardrails, connect tools and systems.

And step five, you have to test it. But of course, properly. So with automated coverage, you also want to make sure there's a quality human review in the early part of the process. Make sure to focus on the quality first. We do this all the time. Don't do the vibe testing with a couple of employees, actually run the process with coverage.

Step six, pilot. Don't rush it out the doors. Test with a small slice of your traffic, five percent, fifteen percent of your global traffic. Do the daily tuning. You want to relearn fast from real signals. We call it the hypercare process.

The last step, step seven is scaling. So, more traffic, more channels, more use cases, more regions, so keep iterating, measure the business impact. And with this process, we've seen as short as twenty-one days. So just three short weeks from the first meeting to a production release for a large publicly traded company here, which is mind-blowing if you think about there's a legal process, there's a procurement, there's infosec, all the process somewhere along the way.

So these seven steps, clear, practical, and the fastest way we've seen to see the AI ambition turn into real business outcomes. So I'd love to work with you to adopt AI agent for a delightful customer experience. Thank you very much. Back to you, Erin.

Bringing Past Experience Forward: Introducing Amazon Bedrock AgentCore

So thank you John, and congratulations on building something amazing to help organizations help their customers in nuanced and thoughtful ways. The very human centric work of startups like Sendbird is actually what gets me excited about this technology.

But I wanted to do a quick audience poll. So raise your hand if you consider yourself someone who works in AI. I see some hands out there. All right, now keep it raised if you would have said the same thing three years ago. Not as many hands.

All right, so those of you who dropped your hands, me too. It's okay if you're new to this. Three years ago, I never imagined I'd be working in AI. But as I've learned about this technology, I've realized that applied technology, all that work that I've been doing for twenty-five years at Amazon, it's highly relevant. This space, it's indeed new for all of us, or pretty much all of us.

But I want to encourage you to think about your past experience. Think about what you can bring forward to help us build trust in this technology. And that's why I'm really excited to welcome Marc Brooker. Marc is a VP and distinguished engineer and one of my colleagues at AWS and in my view, one of the greatest builders of all time. He has been behind some of those foundational services in the cloud, but to me, he's always going to be the Lambda guy.

So he brought his expertise into this new agentic AI era and helped us build trust and foundations for Amazon Bedrock AgentCore, an agentic platform that enables organizations to get to production with confidence. So please join me in welcoming Marc Brooker to the stage.

Marc Brooker Demos AgentCore: Adding Memory to Agents in Minutes

Thank you. Well, thank you, Aaron, very humbling introduction. Why did we build AgentCore? We spoke to a lot of customers who were seeing success prototyping agents, who were getting excited about the first set of agents they were trying out on their laptops and desktops, but finding it hard to get those agents into production where they can have a real impact on their businesses. Speaking to you, we learned you had a set of needs for bringing those agents into production.

You need a secure, scalable place to run the agent code where your customers can be isolated from each other and issues like prompt injection can be effectively mitigated. You needed a way to connect your existing data sources, services, and microservices to agents, talking the protocols that agents need. You needed a way to observe your agents working, gather metrics on their success, trace and audit their work, and understand cost and performance.

You needed a way to have agents remember state, user preferences, and context, so your users didn't have to restart or start over every time for every new conversation with an agent. You needed a secure, scalable, and controllable way for agents to use the web via a web browser. You needed to give your agents an environment where they could run code, allowing them to more efficiently process data and work with data-intensive tools without driving up token costs. And you needed to do all of this cost-effectively with great scalability, with high availability, and taking advantage of AWS's world-class infrastructure.

You needed all of this while keeping the ability to work with the models that you prefer and the tools that your teams already know. Some of you we spoke to needed an end-to-end solution with everything built in, and some needed a solution where you could pick and choose the components that worked for you, that fitted into your architecture, or worked with the agentic platform you had already started building.

So we built AgentCore, a set of tools and services that makes it easy to get agents into production and critically, to operate them once they're there. AgentCore is aimed at getting you faster to positive ROI on your agentic investments. Now, I'm going to show off a demo of the developer experience of AgentCore, adding a feature to a small agent that I built.

As I was planning my week at re:Invent, I built myself an agent to help me choose which sessions I wanted to attend. I'm interested in AI and databases, and I like to attend sessions early in the morning. And I found that as I interacted with this agent over and over, I found that I had to say the same thing every time. Recommend me sessions about databases and AI. Give me sessions in the morning. I don't want to go to that late afternoon session about a topic that doesn't particularly interest me. So I wanted to add memory to my agent.

Here we're going to jump into the AgentCore console and create a new memory that I can wire into my agent with just a few clicks. I'm going to choose the user preferences strategy. Here behind the scenes, AgentCore is with just one click on that checkbox, building a whole pipeline that will take the traces of the conversations your customers have with the agent and extract their preferences from those conversations where they can be used later in prompts. And so there is a lot of work going on behind that click.

Now I click and create the memory. In less than one second, the memory is created. I'm going to jump into my code now and integrate this with the agent that I built using Python and our Strands framework. First, a handful of imports. This is the AgentCore memory client for Strands built into the AgentCore SDK. Next, I'm going to jump down and identify the memory ID that was the memory I just created in the console

and the actor ID, which represents the user. In this case, the actor ID is me every time since this is an agent just for me. However, in a real production agent, it would be an identity derived from the user's identity, perhaps using AgentCore's identity primitive.

Next, I set up a handful of boilerplate code, creating the client, wiring in that memory ID, wiring in the session ID, and so on. Then comes the most important part. I need to update the system prompt for this agent to tell it to use these memories and instruct it that when the user expresses preferences explicitly, it should simply say, "Okay, thank you, I'm going to write those down and remember them for later." This pipeline will also work for implicit user preferences and will learn from smaller interactions as well.

Finally, I wire the session manager, which is that memory client I just created, into Strands. Now we're going to jump back to my agent and show what it looks like to express a memory. My agent can now remember things from session to session. I'm going to type in here, "Hey, I like to attend sessions in the morning. I like to attend sessions about databases and AI." My agent is going to respond with, "Thank you, I'm going to write that down and remember it for next time."

Now we're going to dig a little bit below the covers, inside AgentCore's memory, and see what the agent remembered when I asked it to write down this user preference. As I mentioned, this AgentCore memory is powered behind the scenes with a data pipeline that we built for you. The first time I run this little script that calls the AgentCore memory API, you're going to see that it has not yet remembered these memories. Here, I see the memory is empty. But within just a few seconds, that processing has happened in the background. We captured this conversation with this customer, put it in short-term memory, and then ran it through a model which extracted the long-term memory and preferences.

When I ran the script again just a few seconds later, you can see that memory now says the user explicitly mentioned their preference for sessions in the morning about databases and AI. The next time I run my agent, I don't need to tell it that again. That will be included in the prompt, and it will know those things about me. Over time, as I interact with this agent, it will become more and more customized to my preferences and needs with just this integration with AgentCore memory. This is the first step towards customization, towards making agents that are responsive to user needs, and with AgentCore and Strands, it can be done with just a few clicks and a few lines of code. Thank you, back to you, Erin.

Transparency Through Observability: Seeing What AI Agents Actually Do

Yes, so thank you, Marc. You've seen how intelligent memory makes your agents truly efficient and effective, but to trust your agents, you need more. On a Hugging Face discussion board, I saw this: "My agent failed silently for two days. It looked fine on the dashboard, but one tool kept timing out in a hidden loop." Another wrote, "Debugging AI agents feels like chasing ghosts. You can't fix what you can't see." As builders, we know this pain, right? If we don't know what our system is doing, we can't debug it, we can't improve it, and we can't trust it, especially in production.

Enterprises don't adopt black boxes, and for startups where every customer interaction is a critical first impression, transparency matters even more. So let's start with the foundation: your machine learning infrastructure and workflows. Imagine being able to see everything in one place: every cluster, dataset, experiment, training run, and deployment, from every node to full production. Amazon SageMaker HyperPod's built-in observability gives you a single real-time view of performance, utilization, and cluster health. This means you can find problems fast, stay on schedule, and avoid cost surprises.

With Amazon SageMaker AI, you don't just build models—you see how they behave in the real world, day after day. But with AI agents, the bar is even higher. AI agents don't just generate answers, they take actions. And to trust actions, you need visibility at an entirely new level.

To make that possible, we built Bedrock AgentCore Observability to help you see, understand, and control what your agents are doing, whether that's a tool call, chain of thought, or even memory. You can trace every workflow to pinpoint where things go right or go wrong. You can replay actions like rewinding a tape. You can view intermediate states and see how the agent is reasoning. You can track performance metrics to catch issues quickly. And you have full audit trails that verify that your system acted exactly as intended. In short, observability is not just a feature—it's how you build trust in systems that can think and act for themselves.

Safety and Governance: AWS Well-Architected Responsible AI Lens in Action

You've seen how transparency gives you visibility. But visibility alone is not enough, because seeing a problem doesn't stop it. One builder posted that their prototype agent sent a real customer report to the wrong Slack channel. It wasn't a hallucination—it was a permissions miss. Another posted on GitHub that the agent executed the wrong tool call on a production database. It perfectly followed instructions, just the wrong ones.

A lot of us get these requests from our customers with security and governance concerns, particularly as they scale. Given AI agents' high level of autonomy, data security and policy enforcement are more critical than ever. It's important to set clear guidelines for your agents—which data can be shared, which vendors are off limits, maximum thresholds, and so on. And compliance with regulations must be coded into the very architecture of these agents.

But here's the good news. AWS builds governance into the foundation, so your agents behave with the discipline that you'd expect. We recently launched the AWS Well-Architected Responsible AI Lens to guide you through the best practices as you build. These best practices help you surface risks early, avoid costly rework later on, and move to production with confidence. These eight simple focus areas walk you through everything from defining your use cases to monitoring them in the real world. You're not slowing down—you're speeding up responsibly.

And of course, we use the Responsible AI Lens in how we build our services too. For example, Bedrock AgentCore Identity and Sandboxes make sure that agents only get the minimum access they need, keep each session separate, and safely contain any risky actions. Policy and compliance controls follow global standards like GDPR, HIPAA, and FedRAMP. And Guardrails for Amazon Bedrock handle content filtering, safety checks, and compliance automatically. That way you can roll out agents across teams, regions, and industries without worrying about breaking the rules.

So what does this look like for real people? When you're waiting to hear if your medical treatment is covered by your health insurance, every hour matters. Using Bedrock AgentCore, Cohere Health built Review Resolve, an agentic system that reads clinical notes and test results to surface the key details that determine coverage. Reviews are now 30 to 40 percent faster with fewer errors, meaning patients get answers sooner with less stress and uncertainty.

And because healthcare is a highly regulated environment, Cohere needs more than just speed—they need trust. AgentCore gives them audit trails, session continuity, and explainable decisions across multi-hour reviews so that every decision is explainable, traceable, and grounded in the patient's real medical context.

Making AI Accessible: Strands Agents Framework and Generative AI Innovation Center

I know the thing that sparked my imagination when I first saw generative AI a couple of years ago was how approachable it was to use. And when we were thinking about helping people use AgentCore to build AI agents, we knew that it wasn't enough to make it powerful—it also had to be easy to build with. So we designed AgentCore so any builder,

developers, data science and business teams, and even product managers can stand up agents, see what they're doing, and scale them into production, because ease of use creates adoption and trust. You may have caught in Mark's demo that he was using a library called Strands Agents. Strands Agents is the agentic AI framework that we built for ourselves as we were building increasingly capable and trustworthy frontier agents.

We realized that these patterns were ones that many builders would encounter when building agents, and so we open sourced Strands Agents, and it's quickly become one of the most active open frameworks for building AI agents, downloaded almost 5 million times since launch. Strands Agents is an SDK for building and running AI agents. It's open, it's model-driven, it's easy, and it's fast. With Strands Agents, you can go from idea to working agent in just a few lines of code. There's no complex orchestration, no heavy scaffolding, just define your model, your tools, and prompt, and Strands Agents handles the reasoning, chaining, and execution.

Because this is the one that we are actively using to build agents within Amazon, it's going to continue to evolve to reflect our learnings and best practices. When you're ready to go to production grade, you can move seamlessly to Amazon Bedrock AgentCore. Once you have your AWS account set up, it should take you less than 5 minutes to get your first agent up and running. Easy means that anyone can start, but you don't have to build all of them.

The Generative AI Innovation Center brings AWS experts and partners together to help organizations move from early prototypes to real-world systems responsibly and at speed, from choosing the right models to customizing AI to building agents that work alongside people. The Generative AI Innovation Center is available to you, and now with the new Physical AI Fellowship, we're supporting the future of AI. Now, let me introduce to you a customer who's transformed their business with the help of the Generative AI Innovation Center and made AI truly accessible to their users. Please join me in welcoming Jason Vogrinec, Executive Vice President at Lyft.

Lyft's AI Transformation: From 16 Minutes to 3 Minutes Resolution Time

Thank you very much. Hi everybody, I'm Jason Vogrinec, EVP of Foundations and AI Transformation at Lyft, and I'm thrilled to be here today to talk to you about how we've transformed customer support with AI. Over the next 10 minutes, I'll talk to you about how we made AI work for real customers with real problems and real results.

Every car on the Lyft platform has 2 customers in it, a rider and a driver. Many of these drivers depend on the platform for their livelihood. It's how they feed their family and pay their rent. For many, this isn't a side hustle, it is the job. In the other seat, we have riders who depend on our platform to get home safely, to pick up their kids from school, or to get to that job interview on time. The platform only works if both sides trust us completely.

It could be a payment issue, vehicle damage, a safety concern, or an account question. Every single support interaction is a moment where we either keep that trust or lose it. That's why for us, customer support isn't a cost center, it's a competitive differentiator, and it's core to our purpose to serve and connect. As we entered 2024, we knew we were not meeting customer expectations, but it wasn't for lack of trying.

We had, as our business grew, agents serving customers every single day. But as the business grew faster, every new rider and driver meant more agents, and the cost curve went in one direction, which was up. We also saw really inconsistent experiences with thousands of agents around the world in different time zones with different training levels. It made it very difficult to get consistent experiences for customers.

These are some quotes that we heard from these customers. Their answers are so calculated, always the same. It just looked like robot answers, you know. One driver told us about spending 45 minutes chasing down a $2 cancellation fee after a 10 hour shift. Another said, it's just such a hassle to get help with you, I don't even want to try. These aren't complaints, they're symptoms of 3 core problems that we needed to address.

First, Lyft was misunderstanding our customer problems. Second, our help content was hard to navigate.

Frankly, our support felt designed to reject and deny rather than to help, and this had to change. With the latest developments in generative AI, we asked ourselves a simple but powerful question: What if we use AI to transform customer experiences into ones that actually work for them? Not just incremental improvements, but fundamental transformations. What if, instead of forcing our drivers and riders to navigate our org chart or our help content, AI could actually understand their problems and respond to their specific needs? What if, instead of waiting hours for a support agent, someone could get help in minutes—help that was personalized and contextual and actually solved their problem?

But before we wrote a single line of code, we did two really important things. First, we secured executive buy-in—not just budget approval, but genuine alignment that this was about transformation, transforming how we serve our customers. These capabilities create durable and competitive advantages that weren't about cutting costs. Second, we went direct to our customers. No internal pilots or random experiments, no testing with employees first. We went straight to drivers with real problems, real pressure, and real stakes, because that was the only way to know if it was actually going to work.

From those conversations, three very clear principles emerged that shaped everything we've done. The experience had to be easy—no more navigating complex menus or searching through help articles, just natural conversation. It also had to be fast. Every minute drivers are offline is an earnings opportunity lost. We needed minutes to resolution, not days. And it had to be accurate. Getting the right answer matters when it affects someone's livelihood, their ability to pay rent or support their family. Easy, fast, accurate—the three principles that became our foundation and how we measured our success, not by how much we automated, but by whether Maria got her answer in three minutes instead of three days.

Now, having clear principles sounds great, but here's the reality. Knowing what we needed to build and actually building it were two very different things. We faced three significant challenges. First, there was no playbook. A lot of the initial AI stories that we had heard about and examples we had seen were relatively simple RAG solutions, but we needed something that was going to be far more capable. Second, we knew we had a trust gap. Our customers didn't trust our old systems, so why would they trust a new one that said it was AI? Just like many of you, I've experienced the previous generation of chatbots, and they don't engender trust. Third, we faced the classic quality versus speed tradeoff. Traditional thinking says you can't have both, but we knew we had to move fast and we had to get it right the first time.

So how did we overcome these challenges? Well, we knew we couldn't do it alone, and so creating the right partnerships was absolutely critical to our success. And I want to emphasize that these were true partnerships, not vendor relationships. We partnered with Anthropic for their Claude models because we needed AI that could actually reason, that could understand context, and that could maintain natural conversations at massive scale. We partnered with AWS for two things: the infrastructure needed to reliably support millions of interactions on a regular basis, and the Generative AI Innovation Center, where their team worked alongside ours to solve problems that neither of us had ever solved before. This wasn't "here's a product, go figure it out." This was "let's build something together that neither of us could have done alone," and that collaboration made all the difference.

So let me show you how we tackled one of our biggest challenges: routing. Now, drivers can contact support for hundreds and hundreds of different types of issues, and trying to categorize those right the first time is nearly impossible. Working with the AWS Generative AI Innovation Center, we built an intent agent, and here's what makes it special. It combines user details with smart disambiguating questions. Instead of forcing customers through a rigid menu tree, the AI has a conversation to figure out the customer's real intent. Making AI easy meant making it conversational—just tell us what's wrong and we'll figure it out together.

What does this look like in practice? Well, the interface is pretty simple. Powered by AI, customers just type their message.

Behind the scenes, Claude is analyzing context, asking clarifying questions when needed, and routing to the right resolution path. The complexity is hidden, the experience is simple. And frankly, the immediate impact was big. Our customers are now experiencing resolution times of less than three minutes on average, down from 16 minutes, and sometimes as long as three days. From frustration to resolution in the time it takes to grab a coffee.

Internally, we created capacity for our human agents because the impact of our early AI investments was big. 55% of our customer interactions today are being resolved without requiring a single human agent. That was two years ago. And this meant that our specialists now can focus on where they're needed most: difficult disputes, complex cases, safety incidents, the things that require human judgment.

And both of these improvements combined allowed us to open the ability to support access to support for new customers. While other companies often bury support to reduce costs, we wanted to take the opposite approach. We moved support access to be front and center so that all customers felt that support could be easy. This is what transformation looks like. This is what happens, I think, when you take AI that is actually easy, fast, and accurate at scale.

Now while measuring impact is important, so is evaluating lessons learned. The intent agent is only one of the agents we've deployed this year, and the more agents we launch, the more we learn. First, nothing beats real customer experience. Internal testing is good, but nothing compared to what we learned in the first weeks of real feedback. Plan for rapid iteration based on what you hear.

We also learned very quickly that evals are an art, not a science. We spent a lot of time trying to perfect our evaluation metrics upfront. We found it better to start somewhere reasonable and evolve as we learned what actually mattered to customers. Launching something good enough with strong guardrails and adapting quickly was what made the big difference. We made changes daily in those first few weeks, not because we built it wrong, but because real world complexity always exceeded our models.

And third, AI agent timelines don't fit neatly in a roadmap. Traditional software development is relatively predictable. AI agents are far more organic. In one of the launches in development, our AI agent worked great. But with the introduction of millions of customers, we found new edge cases and had to do a tremendous amount of redefining prompts and adjusting guardrails. These things weren't bugs, they're the nature of AI system building. And so build flexibility into your timelines.

So what's next for us? Well, we're focused in three areas. We want to engage with our customers in the medium that works best for them, be it voice, image, or text. Later this year, we'll launch our first multi-modal AI agent to resolve driver issues with cleaning fees and damage. We're also not building one-off solutions. We're creating a platform that lets us rapidly deploy new AI agents across different support scenarios. Think of it as AI infrastructure that makes it easier to build new tools.

And so here's my takeaway for you all. Making AI easy isn't about the technology alone. It's about deeply understanding your customers' pain points, having the courage to face them honestly, and partnering with the right people to build solutions that work. Thank you so much.

Thank you, Jason. So I have always admired Lyft as a company that has been at the forefront of serving people in innovative ways, leveraging the latest technology. Whether it was the cloud, mobile, or now agentic AI. I hope you're all as inspired as I am by how Lyft is building trustworthy systems that put people first.

So you've seen what's possible, and what it takes to build trust in agentic AI in the real world. From being reliable to transparent, to safe to easy. Every layer we build has one purpose: so you can build with trust at every layer. That way you don't have to slow down to stay safe. You scale because of it.

So when you put your trust in AWS, you're not choosing infrastructure. You're choosing a company whose technology powers Amazon at massive scale, stretched, battle tested and improved daily. Every lesson we learn we bring straight back to you. In my 25 years at Amazon, one thing has never changed.

We build so you can build. Whether it's using Sendbird's fully managed Delight AI to build highly personalized customer experiences, building something that uniquely meets your needs with solutions like SageMaker AI and Bedrock AgentCore, or simply as your Lyft gets you safely to the airport, trust must be at the core of everything that all of us build in this new space. So I'm going to leave you to head off for the rest of your re:Invent week, but do me a favor.

Remember that question that I asked you to think about at the beginning? Have you got it? I want you to think about how you're going to prioritize trust as your answer to that question. And to get you started, here are sessions that are going to help you go deeper.

Alright, and here's a few more resources. First, we want to encourage you to continue your learning journey with the AWS Skill Builder, where you can dive deeper with more than 1,000 free expert-led online training courses. Second, we are so excited about how easy it is to build powerful agents now that I'd love for you all to join us tomorrow when we are kicking off our new AI League agentic AI challenge. You can either build your own agent in our workshop or watch top finalists from the 2025 championship compete head to head for $25,000 and ultimate bragging rights.

Finally, Amazon's partnering with Code.org for the 2025 Hour of AI. This is a global initiative that brings AI education directly into classrooms through hands-on, easy to follow activities. So I invite you to join us to help the next generation develop the skills they need to grow and thrive in this new technology. And with that, thank you so much. And let's go build.

; This article is entirely auto-generated using Amazon Bedrock.