Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Behind the curtain: How Amazon’s AI innovations are powered by AWS (INV211)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Behind the curtain: How Amazon’s AI innovations are powered by AWS (INV211)

In this video, Amazon leaders demonstrate how AWS powers innovation across Amazon.com, Zoox, and Prime Video. Dave Treadwell reveals Amazon Stores deployed over 21,000 AI agents, achieving $2 billion in cost savings and 4.5x developer velocity increase through AI-native development with Kiro and Spec Studio. Amazon Rufus, the generative AI shopping assistant, scaled to 80,000 Trainium and Inferentia chips during Prime Day, processing 3 million tokens per minute with customers 60% more likely to complete purchases. Zoox CTO Jesse Levinson explains how their autonomous robotaxis run trillions of calculations using tens of thousands of GPUs on AWS, leveraging SageMaker HyperPod and S3 for simulation and training. Prime Video's Eric Orme showcases Prime Insights features like Defensive Alerts and the Burn Bar, built on ECS, Kinesis, and Bedrock, delivering real-time AI predictions for NFL and NASCAR broadcasts. All innovations demonstrate AWS services enabling massive-scale AI deployment across Amazon's businesses.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Day One Culture: The Foundation of Customer-Centric Innovation

We're very comfortable that we're doing a better job for customers, and the way you do that is just keep working on it every day, raising the bar every day. And that's the heart of day one.

If you want a culture that behaves like a day one culture, you need people who wake up hungry to improve the customer experience, hungry to invent, hungry to move quickly, hungry to learn, and hungry to figure out what else could I be doing for customers to make their lives better that we're not trying to do right now.

Behind the Curtain: How Amazon Powers AI Innovations on AWS

Please welcome to the stage Director of Technology for AWS, Paul Roberts. Good morning. Welcome everybody. It's great to be here with everyone today. My name is Paul Roberts and I'm a Director of Technology here at AWS, and we have a really special session for all of you today. We're going to go behind the curtain and show you how Amazon is powering their AI innovations on AWS. And with some of these businesses, they are our largest businesses such as Amazon Stores, and we're going to dive really deep into what they're doing. This is super exclusive. It's not something that we do every day, and I'm really excited to share it with all of you.

So kicking off, we've brought together three senior leaders across Amazon, starting with Amazon.com, and next, our autonomous vehicle business, Zoox, and Prime Video. We're going to give you a behind the scenes look at these extraordinary customer experiences that these teams have built, and they have built them on AWS. It is re:Invent after all. And they're going to share how they are powering these innovations. I am really excited because like many of you, I'm a builder and I love geeking out and understanding how these things work under the covers and under the hood.

Prime Day at Massive Scale: AWS Services Powering Amazon's Super Bowl

And some of these businesses, they operate at truly massive scale. And some of the biggest examples of the businesses that are doing that are Amazon Prime, and they also support our Prime Day event. So let's spend a few minutes understanding how this event works. So just recapping real quick, Prime is a massive operation. We have over 200 million members of Prime, and last year we shipped over 9 billion packages the same day or the next day to our Prime members.

And because, besides the work that we do with Prime, we also support crazy events that require a lot of planning. And we do events like Prime Day where we focus on that. And for those of you that aren't aware, Prime Day, it is our Super Bowl. It is really incredible. But with Prime Day, it doesn't just happen. There's a lot of work that goes into it. There are months of time that leads up to Prime Day. Our teams do game days. These are preparing for the actual events. We run Well-Architected Reviews to make sure that our architectures are ready to handle, and they're optimized and they're as efficient as possible to be able to scale to meet the demands of Prime Day. In addition, we run capacity management exercises to ensure we are able to scale to meet the moment of these customers shopping for Prime Day.

And on Prime Day, Amazon leverages AWS services on a truly massive scale. Let's dig into a couple of these. So for Graviton, during Prime Day, over 40% of Amazon.com is actually served up by our Graviton instances. And then we have other services such as ElastiCache. ElastiCache served up over 1.5 quadrillion daily requests, and over 1.4 trillion per minute requests serving ads on Amazon.com. Our Elastic Block Storage service was transferring up to an exabyte a day, an exabyte per day handling Prime Day.

Now other services like our NoSQL database like DynamoDB, it was serving up responses in under 10 milliseconds, and then CloudFront was serving well over 3 trillion HTTP requests. And one of my favorites is AWS Outposts. But what's unique about Outposts is that Outposts manages the command and control of Amazon's fulfillment centers and the robots that are operating within them. And there's over 524 million commands that are sent to these 7,000 robots. But that is just one of our fulfillment centers, and one of our largest fulfillment centers.

Amazon Rufus: A Generative AI Shopping Assistant Built on Custom LLMs

But there's one use case that I am really excited to talk to you all today about, and something that we're really proud of here at Amazon, and it's used quite extensively during Prime Day, and this is Amazon Rufus. And I want to dive in a little bit deeper here. Rufus is Amazon's generative AI shopping assistant. It's built to make discovering, comparing, and buying products as intuitive as having your own personal shopping assistant, like you would at any store here in Vegas or wherever you live.

It uses our own custom LLM that's trained on Amazon's product catalog, and it has information, it's trained on information from across the internet that gives us a really rich amount of data that gives our customers the best shopping experience. This provides the best balance of cost, latency and accuracy to provide superior customer experience. Let me show you Rufus in action.

So, here, we're on the homepage, and I'm going to open it up here on the top left. And my son, he loves playing soccer and he loves red cleats. And you'll see that immediately, it starts to generate a response thanks to the streaming inferencing architecture that's built on AWS. Rufus uses continuous batching through vLLM that's hosted on Trainium instances on Amazon ECS. We dynamically group these requests to maximize utilization while streaming the responses back to customers in near real time.

So continuous batching, it helps reduce the cost and time to first token, in other words, the latency which helps provide a superior user experience. You don't want to wait for your requests coming back. In addition to using ECS, we're also using the Application Load Balancer, and we're using the algorithm of least outstanding requests to make sure that we're funneling as many requests as possible to the backend. And what this enables us is it increases throughput by about 5x.

So, Rufus gave us an answer, but it's not just a simple answer that you might get from another chatbot. This is actually a rich response that includes images and additional data that you can click on, and so these are the widgets that we can click on. So here, as I mentioned, my son, he loves these red soccer cleats. I love red as well, so we're going to look at these Adidas Predator shoes. Let's dig in a little bit deeper.

So from here, you could see some of the common questions that people are asking when they're shopping. Similar to if you had a personal shopping assistant in the store, and you can think of about these blue pills as prompt caching in a way, because we're using Amazon Bedrock behind the scenes to actually pre-generate some of the questions that you would be asking if you're looking for these red shoes. In addition to using our own custom model that we've built, we're leveraging Nova and Claude Sonnet to help power these innovations.

Scaling Rufus with 80,000 Trainium and Inferentia Chips for Prime Day

And you'll notice, I'm still on the product page, but I'm maintaining my conversation with Rufus, just like I would with my personal shopping assistant who is on that shopping journey with me. So, if I have a question, how are these cleats, how are these shoes going to perform in rainy weather? Rufus knows exactly the type of shoes that I'm talking about because it has that context.

But in addition, what Rufus is doing, is looking at other data sources like the product description or even data coming from the Adidas website by using Amazon Bedrock's agentic ability to call tools. It's then, Rufus then reasons about this information and it tries to provide the best information back to you. So Rufus is doing all of this hard work for you.

Now, let's say I get distracted when I'm shopping, I think everyone gets distracted and they pull out their phones sometimes, but what's really interesting is now I can go to the mobile app on my mobile phone and I can say, what were the soccer cleats that I was looking at before? Rufus will reflect on the prior conversation that we were having using agentic memory capabilities, and it's going to surface the same shoes and I can pick up where I left off.

Now, this is just me looking at some soccer cleats. Imagine on Prime Day when we have millions of customers shopping at the same time. So to handle that volume of traffic, it takes a lot. We scaled up Rufus to use over 80,000 Trainium and Inferentia chips that Matt was just referencing in the keynote. With the inferencing stack served up by ECS, we averaged about 3 million tokens per minute, and the responses were served up in under one millisecond of latency.

Now, with AWS custom silicon here, we reduce costs over 4.5 times, and then the real big benefit that we love here is that it drives 54% better performance per watt. Now you might be thinking, why is Rufus so important to the business? Because we are customer obsessed, and what we found is that customers using Rufus are about 60% more likely to complete a purchase when they're using it. When you have a personal shopping assistant, it makes things easier to find what you're looking for.

What's most powerful here is not just any single component, but how AWS enables all of these primitives to work together and enable Rufus on Amazon scale. And the same services and silicon that make Rufus possible are available for all of you, for all these builders, for all of our customers. For Rufus and every other initiative that you'll hear about in the next hour, you're going to hear how AWS has helped Amazon drive this innovation, and you don't have to worry about any of the undifferentiated heavy lifting. AWS is going to take that on.

Dave Treadwell on Amazon's eCommerce Foundation: Amplifying Every Knowledge Worker with AI

So that's a quick look at Prime Day and one of our key innovations in Rufus. But that is just the tip of the iceberg. There is so much more innovation that is happening across Amazon. And our next speaker, he's essentially the CTO of the Amazon Stores business. He owns the mobile app, he owns Prime Air, which is the Amazon delivery drone program, among other things. He oversees tens of thousands of developers managing hundreds of millions of lines of code. It is staggering. Here to share how they're working to turbocharge Amazon's innovation using agentic AI, please welcome the Senior Vice President of Amazon's eCommerce Foundation, Dave Treadwell.

Good morning, re:Invent. Thank you. I'm excited to tell you about how Amazon.com is leveraging AWS to embrace AI to do more for our customers. I lead what we call the eCommerce Foundation team at Amazon. We're responsible for being the layer above AWS. We're demanding customers of AWS, and we support all of the commerce across Amazon. We focus on Stores, but we support many other divisions in Amazon. In addition, we're stewards of many core fundamentals, things like security, privacy, site availability, site latency, and cost efficiency. All of these things are key to Amazon delivering a great experience for our customers and to running efficiently to help us provide customers with products at low prices.

We own everything from the catalog and offer systems to the order pipeline, our identity systems, tax calculations, promotions, and shipping. We provide support for Amazon Search to help customers find the products that they want and select the things based on great information from that catalog. We help Amazon customers in all these different ways, and we're really about providing you with a great experience. Every day around the world, hundreds of millions of customers access Amazon.com. Supporting that, we have hundreds of millions of lines of code and we have hundreds of thousands of microservices all operating together. These hundreds of millions of lines of code support hundreds of millions of requests per second, all between these individual microservices. There's a huge amount of complexity and intricacy in all of this, and our customers expect that the store will be fast, that it's reliable and innovative every single time they use it.

So with that scope, there's a lot of complexity. There are challenges. How do we maintain and scale the complexity while accelerating innovation and doing more for our demanding customers? To do this, we have thousands of people working across all these systems, and the key thing is that we make all these employees efficient and productive, improve their velocity, and enable them to delight their customers, not getting bogged down in low-level tasks that don't really matter. So that's what eCommerce Foundation is all about.

We exist to improve knowledge work productivity and to unleash developer velocity. And now we've been using AI and AI-native processes more and more. Today we're going to show you a lot about how we're doing it.

We're going to show you the tools, the practices, and the platforms that we use at Amazon to empower our workers and our developers on an unprecedented scale. Because when we move faster, we deliver more for our customers. Our core belief is that every knowledge worker at Amazon should be amplified by AI, not replaced but amplified. AI should help them get more done and deliver more for their customers.

Autonomous Agents Delivering $2 Billion in Cost Savings: From Address Correction to 21,000 Agents

We started down this path about two and a half to three years ago, leveraging AI and foundational models to make a step change in employee productivity. We first started in work that was more standard, common processes, things people do over and over again. We have lots of people who do this kind of work, and we found early on that AI is exceptionally good at optimizing and automating a lot of their work and allowing these people to get even more done.

One of the things we found is that it's not just about embracing tools. The use of the tools alone doesn't drive impact. You have to really reshape the work and change the processes of how it all comes together. So we've had to build services, orchestration platforms, and we've had to reshape these processes to let teams across Amazon rapidly build, deploy, and scale AI agents to their specific workflows.

We crossed the chasm for the more standard work. And in 2025, we're delivering over $2 billion of cost savings to Amazon by using AI and agents. Earlier this year, we started delivering an agentic platform to enable teams to create agents with very little engineering work, a low code system. Since July, as of this slide created two days ago, we had 20,930 agents. I hear today from the team that it's over 21,000 agents that teams are using across Amazon to deliver more for customers.

The interesting part is that the primitive building blocks that we use, things like AgentCore, Kiro, and Bedrock, are the same ones available to you. And we orchestrate all these with a system we call internally AgentZ. Let me show you a real example of how we're using it.

As you know, a core part of Amazon is delivering packages to our customers. We get an address from a customer, and that's where they want us to deliver the package. But it's really a lot more complicated than just having an address. Our systems need to know, are we delivering to a business? Is it a PO box? Is it an apartment, or is it a single family home? Because the way we do these deliveries for each of those kinds of addresses is actually somewhat different.

Historically, when we got a new address, we might not have known what kind of location it was. So we had people who would try to predict, is this a business that's only going to be open from 8:00 a.m. to 6:00 p.m., or is it a home where they don't want a super early delivery, or is it some other kind of address? As a result, we had a lot of what we call first-time delivery defects. We get a new address from a customer, we might not know what it is, and we're seeing more defects than we want.

Our customers demand that we get the package to where it's supposed to be when they want it. A lot of what happens quietly behind the scenes is trying to get that right. So it's an enormous effort to keep all of this data current.

Earlier this year, we created a new autonomous agent. It basically takes an address as input and then leverages numerous data sources to do a better job predicting what kind of address we need to deliver to. It uses information that we already have stored in S3 and DynamoDB. One of the cool things it does, though, is this agent actually crawls things like government websites to help learn from those websites what kind of address we're delivering to. We also get other information sources like emails from customers and the notes that they might put on with delivery instructions. So the agent takes all of this information about the address and gives an output to our delivery systems.

What we found with this agent is that first off, it saved 2,500 hours on the team that does this address correction. But most importantly, it resulted in a 74.4% reduction in first-time delivery defects, something our customers love. We're much more accurate about delivering a package where they want. This is just one example of the over 21,000 agents that we have.

AI-Native Development: Spec Studio and Kiro Driving 4.5x Developer Velocity

Next, unleashing developer productivity.

About a year ago, it became clear that AI tools were not just going to make standard work more efficient, but that they were also going to make software developers significantly more efficient. And we started adopting what we call an AI-native approach. We want to reshape development so it's AI-native from the ground up, from the very beginning.

One of the things that we learned very early on in this process is that while writing code is the core of a developer's job, at Amazon, our builders spend a significant percentage of their time, a majority of their time doing tasks other than coding. Things like writing specifications, documenting their code, attending meetings and stand-ups, working with their stakeholders and the teams with which they collaborate. Aligning roadmap details, and we have over 100,000 microservices. Those teams all have to coordinate with each other. We try and minimize it, but it still takes time and effort.

Design reviews are a key part of a builder's job. Performing functional integration testing, AI is really good at actually writing tests, and facilitating those tests with AI is something we found very effective. Security compliance, security is a top priority for Amazon. AI can really help us there too. System alerts, operations work. These are things that take a lot of time from builders.

Of course, for the core of development, Kiro has been an amazing tool for us. You heard about this this morning in Matt's keynote. Kiro has really enabled our builders to get more done quickly. And the core thing it does, as Matt talked about, is what we call spec-driven development. With this structured development approach, a builder can create a specification, and then Kiro turns that specification into code. Effectively, the coding process becomes natural language. Kiro creates the code from it, and then the compiler creates the machine code ultimately.

One of the challenges we have with Kiro, remember earlier I mentioned those over 100,000 microservices that we have. We want to apply AI-native development to all our teams, not just teams that are starting new projects. So the challenge is, well, how do you get started with AI-native development when you have an existing code base?

So one of the tools that we created internally is what we call Spec Studio. The idea with Spec Studio is it takes a code base, converts it into a spec, then you can modify the spec. Kiro does its core function of turning that spec into code. So what's happening here is that for the teams that have embraced this, it's an entirely new SDLC, a new software development life cycle, where you go code to spec, spec to code, production.

Spec Studio has gone viral inside of Amazon. We're seeing over 100% month-over-month adoption growth. Over 15,000 specs have been created already using this tool. And we're working with the Kiro team to figure out how we can bring this capability to all customers of Kiro. I think it will be very compelling for the vast majority of people who have existing code bases, existing services that they want to move to an AI-native development approach.

In Stores this year, we had dozens of teams pilot AI-native using tools like Spec Studio. And what we found for these teams is that on average, they had a 4.5 times increase in their developer velocity. These teams are delivering 4.5 more deployments on average to their customers. Next year, in 2026, because this has been so successful, our target is to have 75% of the teams across Stores, across the Amazon.com organization, using these AI-native development techniques. We're very excited about what that's going to allow us to do for our customers.

Transforming Amazon.com: Building an AI-Amplified Organization on AWS

So AI is enabling Amazon to deliver more for our customers. Agents in particular are substantially increasing our productivity, and our development teams are rapidly embracing AI-native development. Using AI, Amazon.com is reinventing every part of how we deliver for customers. We're building an organization where every developer, every product manager, every designer, every TPM, every operations engineer, they're all amplified by AI that understands their context and helps them get more done.

And of course, all of this is powered by AWS services like Bedrock, AgentCore, Kiro, QuickSuite. We're using extensively for all kinds of job disciplines, and it's making us able to do more and more for our customers. Frankly, we're just getting started. All of this is possible today. The services we're using from AWS, you as the customers can use as well.

The question isn't whether AI will transform your organization, it will. The question is how fast will you move? We're not waiting for the future, we're building it and we're doing it on AWS. Thank you. Let's go build.

Thank you, Tread. All of this is possible today because of the services that they're using. It is really incredible. So the value that Tread and Dave Treadwell and his team are realizing from these agents today and the pace of development, it is all powered by these AI native experiences. 21,000 agents so far, it is really mind blowing and it is all underpinned by AWS. Dave, thank you so much for sharing your journey with all of us.

Amazon's Million-Robot Milestone and the Computational Complexity of Autonomous Vehicles

Next up, we're going to talk a little bit about robotics. At Amazon earlier this year, we passed a major milestone. We have deployed over a million robots across our entire fulfillment network. And these fleets of robots are operating in more than 300 facilities globally. And these robots help Amazonians move inventory, sort products, and even pack pills. All of these robots are leveraging computer vision models that have been trained on AWS and these robots, they save our workers a lot of time and a lot of effort.

Now, those fulfillment robots are cool, but for a truly cutting edge robot, I'm excited for our next speaker from Zoox. Earlier this week, Zoox launched their autonomous vehicle here in Las Vegas, their robotaxi. Some of you may have seen them driving up and down the strip this week here in Vegas, but let me tell you, riding in one of these, it feels like magic. But there's an insane amount of complexity and computational complexity that happens into delivering that magical experience.

Just to make a left turn, you have to analyze millions of scenarios. Imagine just a person walking their dog, or perhaps a person who trips and is trying to tie their shoelace, or whether the cars are all going to stop at that stop sign. You have to then run all of these simulations of that scenario and train your model against all of that just for that left turn. Then imagine doing that for every scenario you can think of as a driver. We're talking about billions of calculations that need to be handled in real time.

Jesse Levinson on Zoox: Designing Mobility from the Ground Up with AI and Simulation

Here to share how they do it and how AWS enables that innovation, please welcome to the stage the co-founder and CTO of Zoox, Jesse Levinson. Good morning. Thank you. Great to be here. So, raise your hand if you've been in a Zoox. OK. A couple of handful of people. That's exciting. We've been out here for a couple of months, open to the public. The wait times aren't great because we have 32 robots driving around and they're still free. So there's a lot of demand, but hopefully you get a chance to try that thing out. It's pretty neat.

So, why did we do all this? It's, you know, it's a lot of work to build a vehicle from the ground up. People thought it was kind of nuts. So, what inspired us to tackle this thing? Basically, you know, we started at Zoox in 2014 and we didn't set out to just build a better car. We really wanted to build a better way for people to move through cities.

From the very beginning, we weren't about retrofitting technology into cars. We wanted to design mobility itself from the ground up for autonomy. These are some of our early sketches. Now, these are some slides from our original pitch deck back in 2014, and interestingly, we're still doing pretty much exactly what we set out to do back then. It is taking a while, but it's kind of cool that it's finally starting to work out.

And really the insight was, you know, this is not a retrofitted car. It's an opportunity to redefine how people move around cities. It's an opportunity to create a vehicle that can move in either direction, no steering wheel, no pedals, no front or back, that can sense its environment symmetrically, 360 degrees in all directions, and then also can change the business model. Instead of having to sell cars to customers who on average only use them 4% of the time, we can own and operate a fleet and offer rides, and that's a much better use of resources, environmentally and economically.

So a few years into Zoox, we were able to build some pretty serious prototypes. They're a little janky looking, but they have the same architecture,

generally the same shape, and a similar kind of drivetrain. They have four-wheel steering. They're symmetrical. They're bidirectional, and this gave us some confidence that from a vehicle perspective, we were on the right track. And of course, in the meantime, we're also working on the compute and the software. It turns out to be an incredibly difficult problem.

But when you put that all together, you get a very different type of vehicle. It's not a car. It's a vehicle designed for AI to drive and for humans to enjoy. So let's take a look at this thing. When you remove the driver, you actually get to rethink everything, and that is exactly what we've done.

So it turns out it's very compact. This is an amazing advantage of having a purpose-built design. You can see it's bidirectional and it's symmetrical. And interestingly, on the exterior, it's very, very compact. It's a really short vehicle. There's tons of room on the inside. None of that would be possible in a conventional car architecture.

And then on the inside, it's a rider-first experience. You'll notice there's no driver's seat, so you get a better and equal experience for all four riders. That's never been done before. And then we can optimize our sensor architecture for autonomy and for safety. So we get amazing coverage and reliability.

Now the vehicle is only half of the story. The other half is the technology that powers it. Let's talk about that. If you're at an uncontrolled intersection, you have cars behind you, they might be honking, you have oncoming traffic, maybe other people are trying to turn, traffic from your left, from your right, cars, buses, bicycles, pedestrians, maybe there's a police siren in the distance. You have to take all this information in, hundreds, maybe thousands of data points.

Now, as a seasoned driver, maybe you don't have to think about this too much. You kind of just know what to do. But for a robot, you have to actually think through all of these things and process them all. And so that means that we as the developers have to handle all of that stuff all the time while our vehicles are driving.

That meant we had to develop our AI stack from the ground up. There wasn't one we could just get that does all this. It doesn't exist. So at the core of this system is advanced AI and machine learning models that enable the vehicle to perceive, predict, and plan in complex urban environments.

So this is what it looks like when we're actually driving the robotaxi. We start with perception. So we can think of this as the eyes and ears of a robotaxi. The perception models help the vehicle understand everything around it, who's sharing the road, what's happening in every direction. We have a multimodal sensor suite of cameras, LiDARs, radars, thermal sensors, and even microphones, and we fuse all that data in real time to build a living, breathing view of the world.

And then for prediction, we have to figure out what's going to happen next. It's not enough to just perceive. We have to be able to predict what everybody else is going to do next, and it's probabilistic because you can never know for sure. But might the cyclist merge? Is the car going to brake? Is the pedestrian going to cross? We have to think about all that in real time.

And then the motion planner has to decide what our vehicles should do in response. Should we accelerate? Should we change lanes? Should we slow down? Always with safety, rider comfort, and efficiency at the forefront of these decisions. Now, unfortunately, you cannot train this level of intelligence and validate it purely on real-world miles. It would take decades, and you still wouldn't see all the possible corner cases.

So we built one of the most advanced simulation systems in the industry with a powerful data infrastructure on top of AWS, anchored by Amazon S3. Our simulation environment lets us create a digital twin of the real world, virtual cities that look, behave, and even feel like the places our vehicles drive. Every intersection, every lighting condition, every unpredictable human behavior can be recreated safely inside of this virtual environment.

Now we can validate our AI stack thousands or millions of times over before it ever goes to a live vehicle. And because safety depends on how we perform in these very rare moments, we focus heavily on edge cases like a child running into the street, an unprotected left turn in tricky lighting, or an emergency vehicle approaching at high speed.

To make this scalable, we use advanced diffusion-based machine learning models running on tens of thousands of GPUs to generate these simulations and run them automatically. The outcome is this really cool continuous feedback loop. Data from the road feeds our simulations.

The simulations improve our models, and then those models make our robotaxis safer on every trip. Every safe, smooth ride is powered by millions of lines of code, petabytes of data, and many AI models learning from pretty much every scenario imaginable. Scaling that intelligence, training it, testing it, and deploying it is where AWS comes in.

AWS Powers Zoox: Petabytes of Data, Tens of Thousands of GPUs, and Trillions of Calculations

Now, when Zoox was acquired by Amazon in 2020, we gained not only a parent company, but a very important technology partner. AWS plays a crucial role in enabling and scaling our AI workloads, from data ingestion and data processing to training our foundation models and validating them with large-scale simulation workloads. So let's talk a little bit more about that.

S3 is our source of truth for storing petabytes of sensor data from our vehicles. It's been highly scalable and reliable, and leveraging its intelligent tiering keeps our storage costs under control. We've been working closely with SageMaker Unified Studio team to share our early adopter learnings on their notebook offerings for interactive data-intensive workloads, and to ensure they provide best-in-class features that are useful to all AWS customers.

Now, training, simulation, and large-scale model validation all depend on GPU performance. In our case, tens of thousands of GPUs running on AWS. That scale lets us build really rich, realistic simulations and continuously retrain and validate our AI models. But it's very expensive and complex to manage, so we've had to be thoughtful and strategic about efficiency.

First, we focus on smart scheduling. We leverage an open-source scheduler, SLURM, that prioritizes workloads based on importance. Critical jobs like safety validation or perception model training get immediate access to GPU resources, while lower priority experiments can pause or queue until capacity frees up.

Then we focus on data locality and orchestration. By ensuring that our data models sit close to where the compute happens, like within the same AWS region or availability zone, we can minimize our transfer costs and reduce latency, which add up to pretty massive efficiency gains at scale. And finally, we're very intentional about how we procure and balance our GPU capacity.

We use SageMaker HyperPod to train and scale some of these foundation models efficiently. HyperPod is purpose-built for large-scale distributed training with Elastic Fabric Adapter networking optimized for high throughput, low latency communication. It also provides a resilient environment for model development by automatically detecting, diagnosing, and recovering from infrastructure failures, and it provides a rich set of GPU observability and system health metrics.

We've been also working closely with the SageMaker HyperPod team to ensure their EKS-based infrastructure is easy to set up, and it's flexible to port the training plan across AWS accounts, which will also help other AWS customers. Also, some of our training of our large-scale models with FSx pushed some of the scale boundaries and unearthed corner cases, which AWS and Zoox iterated on to improve.

Our EC2 Capacity Blocks help us with procuring state-of-the-art training optimized GPUs for a short period of time to handle spiky ML workloads. That means no waiting in a queue, no worrying about GPU shortages, and we can align our compute with our engineering milestones, whether that's testing new perception architectures or scaling massive reinforcement learning experiments.

It's also incredibly efficient. We can experiment with the latest GPU types like the new P5 and P6 instances, which allows us to pivot to newer GPUs as they become available. And that gives our teams the freedom to test, learn, and fine-tune models faster, all while optimizing cost and utilization.

We use tens of thousands of inference-optimized GPUs for executing our clearance runs. That's how we validate that a particular version of software is safe enough to deploy on the robotaxis. AWS is able to help us provide on-demand reserved capacity for this very large number of GPUs, which we actually only need for a short duration of time, as you can see from this very spiky graph. Reserving GPUs is a delicate balance between cost efficiency and performance, and with the elasticity of AWS we can optimize for both.

AWS powers our reliable live service and fleet operations. And their secure networking VPC and foundational managed services like EKS, MSK, DynamoDB, S3, and ElastiCache provide the resilient and scalable infrastructure we need to manage and operate our robotaxi service backend efficiently.

So this really important partnership lets us keep what we've always done best, innovating systematically and relentlessly. So again, what might look like a smooth lane change or a calm stop at a crosswalk actually represents trillions and trillions of calculations. It's all happening seamlessly, safely, and silently.

Now, we live and breathe this every day at Zoox, but most people won't realize how much intelligence, precision, and teamwork it takes to make this autonomy look effortless. It's been quite a journey at Zoox, over 11 years now that we've been working on this, and we're really excited that it's finally out for people to use. We launched in Las Vegas a couple of months ago to the public. We are in the process right now of rolling out to the public in San Francisco. We've announced that next year we'll also be coming to Austin and Miami. We have Los Angeles and Atlanta coming after that, and a bunch more cities later.

So, we're really excited. It's been an amazing journey so far. We have tremendously more to do to scale this and bring it to lots more people, but we're really grateful for the opportunity and very thankful to Amazon and the AWS teams for helping to make it possible. Really excited to keep building what's next together, and thank you so much for having me here.

Eric Orme on Prime Video: Where Science Meets Storytelling in Live Sports with AI

All right, thank you, Jesse. That was really incredible, and it is really incredible to see all the innovations that the Zoox team is building. And as we discussed, Zoox vehicles operate on trillions of calculations to make an accurate prediction. If you'd actually like to go sit in one, we have one over in the Caesars Forum, so please check it out.

Okay, our next speaker comes to us from Prime Video, and you may be surprised to learn that when you watch your favorite team on Thursday Night Football, I like the Commanders, they're taking those calculations just as seriously. In fact, AI and machine learning models are powering Prime Video storytelling. They are effectively reasoning across millions of possible trajectories and state combinations per play in order to make those predictions. That is off the charts, and it's something that nobody else in broadcast is doing.

So Prime Video is doing incredible work in sports with Thursday Night Football, with NASCAR, and most recently the NBA. And in addition to producing shows like Reacher, The Terminalist, Fallout, and Lord of the Rings, The Rings of Power. To hear how Prime Video is innovating on AWS to power all of this incredible entertainment, I'm delighted to welcome to stage the Vice President of Live Sports and Engineering at Prime Video, Eric Orme. But first, take a look at this.

We are bringing artificial intelligence into the mainstream. This is Prime Video's Defensive Alerts. This is basically telling you what the quarterback's looking at, where are the safeties, where is it leaning. There's some blitzes that the Defensive Alerts picks up and I'm not seeing as a quarterback. Oh, look out. I did not want to think that a machine could predict blitzes better than the humans, but I came to it and very much respect the power.

The Burn Bar, an AI tool that calculates fuel mileage for all the cars in a NASCAR Cup Series race. We're totally screwed up on fuel here. The ability to calculate every car in the field just makes me a better analyst in the booth. That's the best mileage by a very big chunk. Show you what you need.

Welcome into the home of NBA on Prime. You can see the LED court. We got guys on the wings. We got guys in the corner. Back to back three. Returned to sender. See, we're learning things as we go. Everyone, it's great to be here. I'm very excited to be here at re:Invent. I'm very excited today to take you behind the scenes and show you some of the cool innovations that we're building for customers in Prime Video.

Prime Video is operating on a massive global scale. We're delivering streams, millions of streams to customers across a wide variety of geographies, device types, and formats every single moment. And for us, availability is actually the biggest, most important feature. We call it feature zero. Every stream must work flawlessly every single time.

Built on top of AWS, our multi-region highly available architecture allows customers to have uninterrupted service no matter where in the world they're watching. Customers rightly expect every single stream to be at least as good as broadcast, and every decision that we make is geared towards delivering the highest picture quality possible. Built on this rock-solid infrastructure, we've spent over a decade developing our programming. That's even more true with sports.

The challenge in live sports is not a lack of data. Every single play, every game is generating literally millions of data points. So the real question is how do we use all this data to tell a story to better engage our fans. This is an area that we're innovating in constantly, both for our live products as well as on-demand, trying to bring the action closer to our customers through AI and immersive experiences. We've been pioneering these new experiences for a while, leaving a lot of others in the industry still chasing us.

Over the next few minutes, I'm going to share three examples of how science meets storytelling here at Prime Video. Our scientists work hand in hand with our on-air talent and production teams to create better predictions and deliver unique and exclusive insights that you just can't find anywhere else. AWS enables this pace of innovation. It allows us to focus on the features and not on all the infrastructure. So let me show you how we're solving some of the toughest challenges in live streaming through examples of NASCAR, Thursday Night Football, and the NBA.

How many of you out there have watched Thursday Night Football? Maybe Black Friday last week where we had about 15 hours of live streaming. It was great. So many of you then have seen the new and compelling ways that we're engaging our fans and explaining the game to them. Over the last three seasons, we've created five new-to-broadcast innovations that collectively we call Prime Insights.

First, Defensive Alerts, where we're actually predicting who's going to blitz the quarterback before the snap. Pressure Alerts, where we predict who's going to disrupt the quarterback after the snap. Coverage ID, where we're predicting whether the defense is in a man or a zone coverage. And my favorite, Pocket Health, where we show the pressure on the quarterback and their decision-making abilities. And then finally, we have End of Game Sweep. This allows customers to understand their path, their team's path to winning based on the allotted time remaining and the predicted number of plays.

We make all this look really simple, but behind every single one of these is a very complex architecture that's running all the time. We're combining our cloud and local architecture to make sure all of this runs flawlessly on game day. On site, we leverage tools like ECS Anywhere that allow us to deploy our models in containers to servers right inside of our trucks. This allows us to have the ultra-low latency that we need on game day. Combined with AWS Systems Manager, ECS extends AWS manageability at the edge, right where the action happens.

For example, we're getting data constantly from hundreds of tracking sensors, live play-by-play data, as well as processing video frames, all in real time. This data, by the way, powers a wide range of workloads, both locally and in the cloud. Our models are calculating insights and expected play probabilities that we use to augment the feed with our features. This is a simplified diagram, but at the end of the day, it's a sophisticated architecture that ties everything together during these games.

From the camera to the screen, our broadcast pipeline relies heavily on AWS to deliver us reliability, low latency, and the operational simplicity that we need, all in real time. Essentially what we've done is we've created a common data science workflow to deliver all of these compelling insights. We've tagged thousands of historical plays that have generated millions of data points such as position, play-by-play, formation patterns, all of this to help us better predict things in real time.

These features may take different AI approaches, but they all lead to insights that even experts might miss. As a result, we have a system that anticipates plays with incredible precision, and as you saw in the video earlier, we even shocked Andrew Luck. So this approach that we've taken has really given us a recipe for success. We started with the Prime Vision feed, which allowed us to have an incubator, a sandbox for all of our ideas where we could try things out quickly without fear of failing.

Over the years, that's now evolved into an infrastructure that we use for all of our sports. Using AWS, we now have a pipeline that allows us to innovate, validate, and deliver these innovations quickly. Take NASCAR, for example.

We applied the same approach there. You have 40 cars that are traveling at breakneck speed, just inches apart, all of them chasing the perfect line. But underneath all that noise and adrenaline is a bigger strategy. There's a race within a race. Every single team is running a chess match at 200 miles an hour. Do I pit? Do I stay out? Do I need two tires? Do I need four? It's a sea of constant predictions, and they're also trying to figure out what their competitors are doing. Prime Insights really helps us tell those storylines and bring them to life as they unfold.

So how many of you have seen a fuel gauge in a stock car? Okay, great. That's actually a trick question. There really isn't one. It shocked me too when I first found this out, but what's happening is every single team is calculating their fuel consumption on paper with their own formula that no one else has access to. There's zero visibility except for that team. And so we asked ourselves, what if we could give fans that visibility? What if we could do that in real time? And that's what we set out to do with the Burn Bar, an AI overlay that's predicting and visualizing fuel consumption in real time.

The Burn Bar is ingesting vehicle telemetry, positional, throttle, RPM data constantly. And those inputs are changing every single race. The track type is different, the weather is different, the tires are different, and so on. And so this model is taking in thousands of data points every single second. This would be impossible to do manually. And we build visualizations both for our production teams and for viewers to help everybody follow this fuel strategy in real time.

So if you look at this diagram, you see that we're taking all of this data, and we're ingesting it into continuous clients on Amazon ECS and AWS Fargate, streaming it via Amazon Kinesis, processing it via Flink, and then delivering that to our production graphics through Amazon DynamoDB and APIs. Each crew chief is doing this for one car. We're doing this for all the cars on the track in real time. Both fans and NASCAR teams now see what was invisible, the race within a race. And the crazy thing about this is we did it on AWS in less than three months. And those closest to the race absolutely love it. We're doing something that no one's ever seen before, and we're just getting started.

And so we've taken all of these learnings and now we're applying them to innovate for the NBA. This first year we have 67 games available to Prime customers, and we have all the games in League Pass. But first, let me touch on a system called Event Detection Classification, what we call internally EDC, which is a robust system that supports all this data ingestion and allows us to build our models. It powers experiences like Rapid Recap, a two-minute highlight reel where if fans join the game late, we can catch them up and then transition them right into the live stream.

It's also powering features like Key Moments, where during live games, our AI models are detecting and surfacing the most impactful plays. Fans can now browse three-pointers, momentum changes, dunks, all in real time. EDC is a foundational system in our Prime Insights platform that automatically tags and categorizes every single moment across multiple sports. It's leveraging tools like Amazon SageMaker, and we're using Claude running on Amazon Bedrock, for example. We combine a wide range of data, including official league data. We enrich all of that with the deep contextual metadata that we have, and that's resulted in this immersive, ever-growing ecosystem, hundreds of millions of tagged moments, all of it fueling personalized viewing experiences around the globe.

And so beyond these features, there's a lot more that we're doing that I just can't show you today. It's too early, unfortunately. It's just too soon to announce. But you'll see them throughout the season, I promise. And when we're not on the court, we're also innovating in our physical production spaces. Coinciding with the launch of the NBA, we introduced our most technically advanced two-story, 13,000 square foot studio. It has 2,300 LED screens, which is about three billion pixels. They're allowing us to fully immerse our fans in off-the-court experiences.

Commentators and analysts can now literally step into the data, interact with the plays, and bring fans closer to the action than they've ever been before. And AWS is powering all of the data, all of the compute and networking to make this possible, feeding us live player tracking data, statistics, and predictions straight into the studio all in real time. This space is what's going to showcase where the physical and the digital come together to help us tell better stories for fans.

This is just a glimpse of a few of the innovations that we're building on AWS. But now, before we go, let's hear a little bit more about AI from the original AI. Thank you.

Innovation Never Stops: Amazon's Day One Journey on AWS

The NBA on Prime is changing the game with AI. Want a quick runback with Rapid Recap? How about Multiview with League Pass? The NBA with AI, it's on Prime. You know, I remember as a teenager trying to emulate Allen Iverson's massive crossover, and having Allen Iverson partnering with Prime Video to help with artificial intelligence is mind-blowing. Look, this has been a thrill of a lifetime to be able to work with Eric and the broader Prime Video organization, and every time I meet with them, I keep hearing the new things that are coming, as Eric was hinting at. So it's really incredible.

As a sports fan, I already mentioned this, I love the Washington Commanders, so Eric and I are always talking about our fantasy team. It's great to be able to partner with them, and I love seeing how they're improving the overall viewing experience of sports fans around the world. We've just heard three powerful innovation examples that are happening across all of Amazon, and these are just a few of the ways that Amazon is innovating on AWS. There are so many more: Alexa Plus, our broader devices business, fulfillment centers and the robotics that's happening there, and also Amazon Kuiper, which is our satellite internet service.

The common thread is that AWS is how these businesses are driving innovation at unprecedented scale. I really wish we could talk about them all. Everyone wants to get into the club. We're here in Las Vegas, but this was only a sixty-minute session. Thankfully, there's always next year. I hope you enjoyed this look behind the curtain at how Amazon runs on AWS.

I want you all to please take some time and go check out One Amazon Lane at the Caesars Forum. This is an activation that we have where you can actually go sit into a Zoox and experience some of the innovations that you've heard earlier today. Remember, the innovation never stops here at Amazon. I've been here for ten years, and as Andy said in our opening, it is truly day one. Thank you all for joining us. Have a great rest of your week at re:Invent.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community