Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Cox Automotive's Blueprint for Agentic AI on Amazon Bedrock AgentCore (IND3329)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Cox Automotive's Blueprint for Agentic AI on Amazon Bedrock AgentCore (IND3329)

In this video, Ravi from AWS and Brian Lloyd Newberry and Tabaré Gowon from Cox Automotive share their blueprint for moving AI agents from prototype to production. They introduce Amazon Bedrock AgentCore's five managed services (Runtime, Memory, Identity, Gateway, Observability) that address key production challenges. Cox Automotive's journey is highlighted, where they launched 3 of 5 agentic AI products to production in just 5 weeks using AgentCore and the Strands framework. Tabaré details their automated virtual assistant for dealership customer conversations, emphasizing critical patterns: using AgentCore as foundation, implementing both hard and soft guardrails, red teaming before each phase, automating evaluation with LLM-as-judge, and setting circuit breakers for cost and turn limits. The session concludes with lessons learned: get moving immediately, think disruptively, use agentic AI daily, red team continuously, design for worst cases, and set hard limits to fail gracefully.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Production Gap: Why AI Agents Struggle to Scale Beyond Prototypes

Good afternoon, everyone, and welcome to our session. I'm Ravi, Senior Solutions Architect at AWS, and today I'm joined by Brian Lloyd Newberry, Associate Vice President of Enterprise Architecture, and Tabaré Gowon, Lead Architect at Cox Automotive. Just a quick question before we dive in: How many of you are experimenting with AI agents in your organization? Keep those hands up if you have launched one to production and operating at scale. You see the drop—that's what we're going to talk about today, and you'll see Cox Automotive's blueprint for addressing this challenge.

Here's the agenda. I'm going to do a quick introduction of Amazon Bedrock AgentCore, then hand it over to Brian Lloyd Newberry, who'll talk about Cox Automotive's agentic AI journey and their success blueprint. Tabaré will then walk you through a reference implementation, key considerations, and lessons learned from the trenches. We'll stick around for any questions, so let's dive in.

The biggest challenge we see is getting your agents from prototype to production. What are AI agents anyway? They're autonomous systems that can reason, make decisions, use tools to take actions, and perform goals autonomously. Your proof of concept works great, but there are five key challenges that you must address at runtime. You need scalable infrastructure to run long-running workflows with complex orchestration without performance overhead and cost overruns.

Agents need to maintain context and have meaningful conversations with end customers across sessions, so you need a managed memory offering. Agents need secure access to the agents themselves, the tools they access, and the data. Agents need a mechanism to discover your existing systems through agent-ready tools and talk to other agents seamlessly. Because agents are non-deterministic in nature, you need complete visibility into their execution so that you can trace, debug, and evaluate their behavior.

Amazon Bedrock AgentCore: Infrastructure for Production-Ready AI Agents

Let me use an example to drive this home. Picture a service technician assistant. A customer walks in with a check engine light. The technician takes pictures of the dashboard, perhaps records the engine noise, and keys in the customer concern. The agent uses this data and looks at knowledge sources like OEM manuals, technical service bulletins, and recall notices to come up with the proper diagnostic plan.

The technician starts working on it and finds something else. Then the agent has to adapt and make a different recommendation. You see this back and forth that goes on to adapt and diagnose the car. The agent determines that the service technician needs parts to fix the car. Then the agent goes and checks the internal system for pricing, inventory, and availability while keeping customer preferences in mind. This customer prefers quality OEM parts, or this customer is budget conscious. Then it recommends the parts. If parts are not in stock, perhaps the agent has to log into legacy supplier applications and web applications to place an order.

This repair can take hours, and the agent maintains the session throughout. When the customer returns weeks later, the agent must remember everything to resume the diagnostic process again. Now imagine this whole infrastructure scaling up, supporting multiple technicians and addressing multiple customer problems with full session isolation. You need to trace every single step of how an agent recommended a particular diagnostic step and what sources it used for complete debugging and evaluation. Without these capabilities, we believe you're stuck in prototype mode, and that's the gap AgentCore fills.

AgentCore offers five fully managed services with two special purpose tools. Runtime is secure, scalable, and serverless. It supports multi-modal inputs up to 100 megabytes of payload with long-running sessions up to eight hours. You can use any framework to build the agent, and you can use it with any model.

Memory offers short-term memory for storing the conversational state and long-term memory to extract and learn from customer interactions and adapt. There are out-of-the-box strategies such as user preferences, semantic facts, and summarization. You can use those strategies, overwrite them, or bring your own strategy. Identity provides secure authentication and credential management. It supports OAuth and IAM out of the box. Gateway converts your existing APIs and Lambda functions as agent-ready MCP tools and provides a built-in semantic search mechanism for intelligent discovery. Observability gives you complete visibility into every interaction.

The agent's path trajectory includes out-of-the-box CloudWatch dashboards and integrates with your existing observability stacks via OpenTelemetry format. Browser enables the agent to perform complex web automation tasks, and code interpreter allows agents to run ad hoc complex calculations in any language in a secure sandbox. We believe this is the infrastructure that gets your agents into production. With that, I'll hand over to BLN to walk through Cox Automotive's agentic AI blueprint.

Cox Automotive's Journey: From Traditional AI to Generative AI Leadership

Good, you can hear me. My name is Brian Lloyd Newberry and everybody calls me BLN. If you want to know why, I'm happy to tell you after the talk. I have the privilege of leading the Cox Automotive Product and Technology Innovation lab. Me and a small team have for the last three years been completely embedded in generative AI and in this journey to move toward agentic software. I want to start by introducing you to Cox Automotive because you may not understand the brand or Cox Automotive as an entity. But if you've shopped for a car or bought a car, you've probably heard of brands like AutoTrader.com and Kelley Blue Book. These are some of the largest consumer portals in the automotive industry. People use them to buy vehicles every single day, and we get a lot of traffic there.

We do websites for the majority of the OEMs, the automotive manufacturers in the United States, so we see a lot of web traffic there. In the consumer space, we see a ton of consumer information coming together. We also provide liquidity in the automotive market. If you've ever traded a car in, you probably had that car liquidated through Mannheim Auctions, where the dealer sold it and bought other cars from Mannheim Auctions to sell to other people. We also offer an entire suite of dealer software things like ERPs, custom CRMs, and inventory management, as well as a suite of finance applications. We're moving into mobility and other spaces as well. We're the dominant player in the automotive industry for software providers.

Because we have that much visibility in the industry, we have a view of everything happening. We see the automotive industry every single day. Just as a round number, we have about 5.1 trillion insights about vehicles in the market. We see hundreds of millions of customer interactions as people look at vehicles, get their vehicles serviced, and move through that lifecycle. We take all of that information, which is the lifeblood of AI, and we wrap AI models around it. We've been doing AI for quite a while. We have hundreds of AI models in production. I think 150 something was a threshold we crossed last month. Some of those might be heuristic.

What I want to do today is tell you how as an organization you can get moving. One of the things we see as we work with product teams across the organization and engineering teams is there's a lot of playing with agentic AI. Although most of the time people are just playing with assistants, not agentic AI. There's a lot of generative AI and sort of noodling around the edges, but people don't really know what an agentic application is and how to get moving and how to build it. Our chief product officer has this saying: start with crazy and work backwards. So I'm going to tell you about a crazy experiment we ran this past August, and then Tabaré's going to show you how one of the specific projects that we kicked off went, and then we'll wrap it up.

As I said, we've been in this journey a long time. Mannheim, our auction provider, has been around for probably 80 years at this point. In 2018, we went all in with AWS and migrated from 50 data centers down to 3, with the rest running in AWS regions, both east and west. We've been doing predictive AI for a very long time using traditional machine learning models.

Kelley Blue Book, that little book that people used to literally buy to look at the price of used vehicles, was a mathematical compilation of vehicle evaluations going back 100 years. In 2023, something happened. On November 30th, 2022, ChatGPT was released by OpenAI. The ChatGPT 3 model went live, which created a ton of buzz. We knew these models were out there and had been looking at them, but this created tremendous buzz.

What happened is in March of 2023, we went all in on generative AI. We took the team that I'm leading right now, and within a month had access to multiple providers and models in our environment. We built a sandbox, started to build products, and in the following year we had three major products live. One of which saved about $750,000 a year by eliminating the need to pay for content to be generated elsewhere, and our content was better and faster. We also went in with products for dealers that increased response rates from customers by 50 percent, and we did other products as well.

To power that generative AI model, we really focused on our data and bringing all of our data together. In 2024, on the way back from Amazon re:Invent, we launched a new initiative. We set up a couple of teams to dive into the agentic space and get ahead of the industry. There was a lot of FUD and excitement there, but there wasn't a lot of substance behind it. We did a ton of work and were really successful driving through building internal apps and doing proof of concepts, but we were kind of stuck. We didn't really get a lot of pull-through on our products.

We had four or five products using generative AI in production, a couple of them internal and the rest external, but we weren't seeing the pull-through from our product teams. People had things they needed to deliver and didn't want to disrupt what they were doing. People were scared. If we all sat down and asked what an agent is, there would actually be some debate. Agentic applications people think about as a chatbot with a generative AI interface strapped on, and that's a very naive version of agentic AI. Google Deep Research being one of the first examples I saw of the true power of this kind of system.

The Five-Week Challenge: Launching Five Agentic Products by Labor Day

Going back to Newton's principles from high school physics, you've got inertia, momentum, and reaction. What you need to do is figure out how to get moving. If you're not changing anything, nothing is going to change. You have to get moving. No one is an expert at building agentic AI systems. It's been around for about a year, right? Having a PhD in math doesn't help you because it's a soft skill. It's looking at systems, doing systems modeling, writing text that helps guide these LLMs to do things. Just pick something and get your teams moving.

So let's talk a little bit about the craziness. The second week of July, our chief product officer said, "I want to launch five products by Labor Day, and there are going to be agentic AI products, brand new product capabilities in the market." We said okay. We had been building things and knew we could do that, but we had five weeks to do it and the teams we had had never built agentic AI products before. Tabaré and team had built a little bit, but they were a little ahead of the curve but not very far, so we were starting from ground zero.

What do you do when your chief product officer and EVP of product and technology is saying we're launching five products by Labor Day? You're starting from ground zero with five teams and you've got the blank page problem, so you panic. It's kind of like when you have to put together a deck in a hurry. You don't know where to go, you don't know how to move, you're stuck. So here's what's necessary to accelerate: have a goal, set a deadline, and drive excitement. We had a goal of five projects with five teams in production. We had a deadline of Labor Day, which was about five and a half weeks from that point. And we had excitement because what we did is we brought those five teams together in a room.

We made a lot of noise around it. It was a big event, and we said that we're doing this hard thing. We're going to launch five products, and here's what the products are. They weren't fully fleshed out, but they had a good core around them, and we're going to use Agentic AI to deliver them and the solutions that we're bringing to market. So you've got the excitement. Now, what do you do next? What are our assets? What are our gaps? And then let's get moving.

So what do we have? We were all in with AWS. We're a close partner with them. We understood what their product roadmap was, and we had been deeply engaged with Bedrock. We were one of the first customers running Bedrock models in production, and first with access to Cloud. We had a small team of internal expertise, a couple of teams that have built these types of apps, and we had some ideas, whether they were good or not, we didn't know yet. We needed to get them into production, and we had to launch five things. That's our inventory.

So what do we need? I'm an architect, and blank pages are scary to architects, especially in new spaces. So you start out with the reference architecture placemat. This is what we sat down and worked through. Most of the gray in this is stuff that we already had solved for, but there's a lot of white boxes here that we had to figure out, things like cost management, model monitoring, orchestration, agent workflows, tools, API management, and how to connect it to our data and how to see what's going on.

You know where the story is going because Ravi talked to you about the products that AWS has put together in Bedrock and in Agent Core. But this is a really scary problem for a lot of teams because it's a blank slate. They don't know what to do. So what do we do? We were on Amazon Bedrock, and we said we're going to stay here. We're going to take a risk and we're going to dive into Agent Core. Agent Core hadn't been released. It wasn't released until after we were done with these projects. We never release stuff into the market, and it wasn't hyper-scaled at that point, but we never put first-generation technology products into market. We did here because we partnered with AWS. We saw the roadmap. We understood what was there, and we knew how silly it would be for us to build it ourselves and more silly for our teams to have to figure out how to build them themselves. You heard some of what Ravi was talking about earlier.

Building with Strands: Focusing Teams on Features Over Frameworks

So the next decision we made is you get into this conversation about which agentic framework to use, and I make the joke, and it's really not a joke, that there's a new agentic framework every two or three days, and there has been for the last year plus, just like there's new models every day. Pick one and go. What you're doing here we'll talk about in a second. You have the ability to pivot and change, especially if you're using Agentic AI to help you do that work. Pick things and go. Strands is an AWS-backed open source product that runs natively, and at the point we started, was the best option to go and run on Agent Core, and it met all of the needs we had.

The other thing is Agent Core could run other frameworks, and we could have given one framework to each of the five teams, but maybe we would have learned things. We really wanted to get to market in five weeks, so let's focus all of our energy on learning one thing and getting really good at it and enabling the teams. Focus is an important part of this. So we talked a little bit about Strands at the high-level architecture level. Building an agent, we talk about how magical they are and all the things you have to maintain, but building an agent yourself as a developer, there's only a couple of core things you do in the agent.

One, you write some text, which is the prompt. Two, you make some configuration, which is usually also written in text. Three, you attach it to a model with some configuration parameters, and then you attach it to some tools that allow it to do things. Conceptually, that's all you have to do with Strands. All you really have to think about is what the text you write is and the tools you use are, and how you take that very abstract system and make it real. You're literally just writing Python as configuration.

In this super simple case from the Strands website, you're bringing in an agent, you're bringing in a calculator tool that already exists. I don't have time today to show you the implementation of that tool, but if you looked at it, you'd see the Python documentation describing what the tool is and when to use it and what the parameter values mean, just like you should be doing with any Python or any source code you write. It's well documented, and the Strands framework can bring it in and wire it in automatically.

Then you say to that agent, "What's the square root of whatever that number is?" This is how LLMs actually solve math problems, by the way—or how they should. You should rely on a deterministic process to do that. It really is that simple, and you can get started in Strands running on your desktop today or running on AWS infrastructure in minutes. So the key message here is get started and focus your teams on the features and functionalities, not the frameworks and infrastructure. The language doesn't matter. I can write it for you. The frameworks don't matter. There are 100 of them, and they all do pretty much the same thing. Find one that's really well supported and run it. Since you're all here at the AWS conference, start with AgentCore and Strands. You can use the other frameworks as necessary.

From Proof of Concept to Production: Building a Trusted Virtual Assistant

I'd like to invite Tabaré up. Tabaré was the lead architect for one of the projects we launched, probably the most successful project with the most customers using it today. He's going to tell you about what he learned and what the journey was for that, and then we'll come together and wrap up on how the rest of the projects went. All right, am I on? Great. OK, thank you, and good afternoon, everyone. My name is Tabaré Gowon, and as mentioned, I'm a lead architect at Cox Automotive. Just a moment ago, I shared with you our agentic journey, our company's overall agentic strategy, and where we're planning to go with these principles. Now, I'd like to walk you through what that's actually looked like on a product team at Cox Automotive and hopefully leave you with some of the patterns that can guide you to building your own agentic solutions. But of course, I have to tell you how we got here.

Let's start at the top. As mentioned earlier, it's around 2022 and 2023, and the LLMs are hitting the mainstream with ChatGPT and others. If you were like our team, you were probably thinking around this time that this technology is actually pretty cool. How can we use it? How can we put it here? How could we put it there? Our team at the end of 2023 started our first GenAI proof of concept, and at the beginning of this year, we actually launched that product called Predictive Insights. It's a human-in-the-loop GenAI message generator for our company's CRM.

With Predictive Insights, as our dealers click a button, they can craft personalized messages to their customers using all of the data mentioned earlier. Since we launched earlier this year, we've received a lot of good praise. Our dealers are loving it because they can move their customers right along their journey with just a simple click. However, we have a problem. Predictive Insights requires you to actually click a button and generate a message to the customer, but the reality is that at the dealership, more than half of their leads come in after hours when there isn't anyone to click the button.

So we thought to ourselves, this is working out pretty well. How can we take what we just built and move to fully autonomous? In doing that process, we realized the actual question was how do we build automated AI solutions that our users will trust? It's one thing to say we need to build a solution that knows a dealer's voice and knows their brand and can meet their customer at any time of the day. But at the same time, we need to do that while maintaining safety and brand reputation for us and the dealer, all without human oversight. So that became the challenge, and here's what we built.

This is our fully automated virtual assistant for customer conversations at the dealership. It's built on Amazon AgentCore and the Strands agent framework. Starting at the top, we have an orchestrator. When a customer sends in a new message, the orchestrator understands that intent and routes that message down to one of several sub-agents—sales, service, and so on. Each sub-agent now understands its own domain and can handle its part of the conversation independently.

Once all the sub-agents are complete, the orchestrator grabs all of that resulting data and crafts a message back to the customer and continues the conversation. This goes back and forth for as long as the conversation is necessary. As you can see in this diagram, there is more to this story. We can give you all of the technical details, but the reality is when you are starting one of these projects, where do you actually start? You start with your foundation, and that foundation is Agent Core.

When you are building conversational assistants, one thing you need is to keep your data separated. As mentioned earlier, many of you know that these AI frameworks change every day, so the tools that you build your solutions with today might not be the tools that you use tomorrow. You are going to need to build your infrastructure on things that are both safe and that can evolve as your business evolves, and this is where Agent Core comes in.

Agent Core keeps your data sessions separated and allows you to swap frameworks without rearchitecting your entire stack. When my team started our process, the guidance that we received from several sources was to start with Agent Squad, which I believe is AWS Labs. However, by the time we got into mid-project, the new guidance was to use Strands agents, which is a better solution. Normally in the old days, this could have taken several weeks to migrate the project, but because we had started with a solid foundation, it only took us two weeks.

When you are building your solutions, start with a foundation that can evolve with your business. This will leave you the room to think about the problems that actually matter to your domain, like safety. How do you keep these agentic solutions safe? For us, we red team them.

Red teaming is when you actively try to make the system go off the rails, and you can do this in a number of ways. For our assistant, for example, you could try asking the assistant a question in a foreign language. Another option is to feed the assistant bits and bytes of unreadable characters, or if you are trying to be clever, you could try to sweet talk the assistant to maybe give it its system prompt or tool definitions. You are basically trying to find your system's failure modes.

Testing checks what works, but red teaming actually tries to break it. You cannot leave this type of solution to the end. For us, we red teamed before our alpha phase, our beta phase, and we continue to red team as we move to production. We catalog every exploit, then we fix them, and then we iterate to the next thing. We do this after every code deployment and every prompt change because realistically, your stakeholders are going to ask you how this thing breaks. If you do this, you will have both the answers and the examples of what could go wrong, and then of course you are going to have to fix it.

How exactly do you fix these solutions? You use your guardrails. You have probably heard about Bedrock guardrails. Most of you know that you need something to prevent AI from going off the rails. That is a given. But when you are dealing with customer service, if you only focus on blocking, you are going to provide a poor customer experience. In our opinion, you have to consider your guardrails from at least two perspectives: something to completely block the system, as well as something to gently nudge the conversation into a different direction that you actually want to do something with. We call these hard guardrails and soft guardrails.

With hard guardrails, the system does not even talk to the LLM. Usually these guardrails sit right on top, and they are the things that respond with things like "I cannot help with that," and the conversation stops. These are the things that you can configure with Bedrock guardrails. On the other hand, soft guardrails do use the LLM, but you configure them with your workflow or your prompt design. For example, let us say you created a virtual assistant for the automotive industry. In the process of a customer having a conversation, the customer says, "You know, I really do like this car, but I have pretty bad credit, so I was hoping I could get into this for maybe 100 to 150 a month."

You've designed your virtual assistant to solve a lot of problems, but you don't want it to handle pricing negotiation. When you use soft guardrails, you tune the system so that it responds with, "That's a great question for our finance team, let me schedule an appointment." These kinds of guardrails allow you to continue to be helpful while staying safe. Think about your guardrails from both of these perspectives.

I've given you a lot already. Let's pause for a second. Let's say that many of you here are trying to hit that orange "ready to launch" milestone, and after this conversation you've decided to use AgentCore as your foundation. What I just said sounded about right, so you're going to go home and red team your application. Because we are safe practitioners of this agentic landscape, you will also configure your guardrails so that things don't go off the rails, and you'll use the soft guardrails. At this point, you should probably be ready to launch, correct? No. The reality is that even if you red team your system to see if it breaks, you still actually have to test that it works. Traditional testing can only take you so far because these LLMs are probabilistic. You're going to need to find something that's more dynamic.

We know this, so what are our options? Well, you could manually review every conversation, and that will work for a couple dozen conversations, maybe a couple hundred conversations. But when you're dealing with tens of thousands of transactions daily, manual review just won't scale. You're going to have to look for some other option, and this is where automated evaluation comes in. What do I mean by automating your evaluation? Just as you can use an LLM to generate these responses, you can use another LLM to judge whether or not these responses are correct. This approach is called LLM as a judge, and the process is straightforward. First, you generate a bunch of test conversations, then you run them through your system. Then you evaluate how it did and track this over time.

Once you have actual data, you can use the same evaluation framework to test whether your system is working for the things that matter. This type of testing actually checks what's right. For us, we track metrics like relevancy, completeness, and tone—things that matter for customer conversations. Of course, yours will be different, so find the metrics that matter most for your business. We're going to go back to that previous slide one more time. We're still here. We know that AgentCore is where we're going to go. Red teaming is the right thing to do. Sure, be our guardrails and the soft guardrails, we got it. And now you know that not only are you trying to test what breaks, but you've got to make sure that it works.

So we're going to automate our evaluation so we can scale up for our actual production needs. At this point, now you're ready to launch, right? I think you already know where this is going. No. Because like I said just a moment ago, LLMs are probabilistic, and you could do all of this work and you still will have the LLM do something that you least expected. The last thing you want is to wake up to a whopping LLM bill. So now what do you do? You need to put something in place to stop the system before it goes off the rails while you're not around, and that's where circuit breakers come into the mix.

When I talk about circuit breakers, I'm talking about the hard limits that stop the assistant when things go wrong. In our case, we consider two metrics as our hard limits: cost limits and turn limits. For example, let's say that in the process of a conversation, we've identified that the conversation has hit our P95 threshold in terms of cost. At that point, the assistant will stop. Likewise, let's say that we're having a conversation and we've gone back and forth and back and forth and back and forth, and all of a sudden, the system recognizes we've exceeded our turn limit.

At the same time, I don't know why there's a 2 and a 0 underneath, but let's say that we've hit 20 turns. In either case, the P99 cost per turn, we hand off the conversation to someone at the dealership, so now they can decide, does this conversation continue? These types of metrics—cost and turn—are things that Bedrock tracks for you, but it's your job to analyze that data and set the thresholds that make the most sense for your business.

Set these limits from day one and know where your boundaries are so that when you fail, you can fail gracefully and you can sleep well at night. Those are some of the patterns that we have learned through the process of getting our project into beta and beyond. Let me share with you where we are today. We're right now in beta and our dealers are seeing the results that they need. We're hearing that their customers are getting the answers that they need both during the day and after hours.

We plan to launch at the beginning of Q1, so stay tuned. That's really all I have right now. Your system, however, is going to look different because your requirements are understandably different. But if you start with the right foundation, like AgentCore, and focus on the patterns that matter, you also can create automated AI that your users trust. Thank you, Stanford.

Results and Lessons Learned: Three Successes and the Path Forward

So I promised you we'd come up and tell you how we did, but before you do that, who thinks we can—you know, you saw one—who thinks that's the only success we had? Who thinks we knocked all 5 out of the park? So we had 5 projects today. We have 3 apps in production in a beta format. The first one to go full live is the one that Tabaré is talking to you about.

We have an app going into production that's actually being used in dealerships. When you price inventory, there's a lot of work you have to do, and we remind people to do it and they don't do it. We built an agentic solution to do it for them, and instead of nagging them to do the work, we said, "Hey, I did the work for you. Are you good with it?" and then we'll move that into full agentic moving forward. So that's another type of product we built. We built a couple others as well, so 3 are in production today. One's about to launch a little bit later and let's just say 1 we took back to the drawing board and reimagined.

That actually is probably some of the most valuable data we got—what didn't work. What I'll tell you is agentic systems are hard. The hardest thing that we ran into is that this is a soft way of programming. It's not deterministic. This whole thing is non-deterministic, and so the way you think about products and the way you think about capabilities here is different. Agents have agency. They can do what they want. You have to put the guardrails around them, but that's a hard thing for people to wrap their mind around from a product point of view.

So what were our lessons? Let's talk really quickly about our lessons today. The first lesson is get moving. Nobody's an expert. There are some great Amazon offerings here in Bedrock and AgentCore Strands is an option, but there are a bunch of agentic frameworks that can run on it if you prefer something different. The other is to think disruptively. Many people would not have said we're taking 5 projects and putting them with live production traffic with customers in 5 weeks. That seems insane, right?

Start with crazy and work backwards. Agentic AI lets you fundamentally change your assumptions. You can use it to build systems faster than you can ever imagine before, and you can have it build things that you could never imagine. The embrace of agentic is key, and you need to be using agentic every day, not just as engineers but as product leaders, as engineering leaders, as architects. Because until you get past the fundamental understanding that agentic is different, you can't understand how to build agentic products. I talked a little bit earlier about chatbots. Tabaré's application is far more than a chatbot.

You have those multiple agentic layers. There's a chatbot agent with other agents in the background doing work. You can build tools like deep research. You can build tools that essentially invert the shopping funnel so that you give customers the ability to just click on a button and buy it. There's all kinds of things you can do that are not just chatbots. Think beyond chatbots, but you've got to use it every day because it's going to change the way you work.

When you're thinking about those things and you start building, what can you start with right after this? Number one: start red teaming today. I know it says tomorrow, but start today. Learn how your system breaks. Learn how to break your system and do these things before your customers get to it first so that you can set up your guardrails and prevent some of those things from happening down the line.

Second, design for your worst case. If you already have an idea of what keeps you up at night with what your agentic solution could be, build your evaluation framework around that so that over time you can see whether or not that was truly the worst case and whether or not you've mitigated the crisis from occurring. Then finally, as I said before, even if you do all these other things, things will still go wrong. So let you know, things will probably still go wrong. So set your hard limits, understand what your thresholds are, understand what your boundaries are, so that when you do have these failures, you can fail gracefully.

If you do those things, you'll be all right. So to wrap this up, we're at AWS re:Invent and so I was thinking about what's the right message to land this on. Today is your day one. This is a Bezos quote. Day one is a core Amazon philosophy. We work very closely with it. We talk about day one and we adopt it in our practices and try to model it. Day two is stasis followed by irrelevance, followed by excruciating painful decline, followed by death. And that's why it's always day one. Get moving. Don't be stuck in waiting for someone else to do this or the right answer. Get moving and build something in five weeks or one.

I want you to know there are a couple of resources here. There's a QR code for leveling up skills and there's a QR code for the AgentCore deep dive hands-on workshops. I highly recommend that. It's a great way to get your hands on and see how easy it is to build these things. So definitely scan those QR codes. And please, please, please complete the session survey in the mobile app. We'd love to hear your feedback and how we could do better. Thank you very much. If you have questions, we'll be up here for a little while. We've got a few more minutes before we need to relinquish the room, so thank you very much for being in attendance today.

; This article is entirely auto-generated using Amazon Bedrock.