Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - Powering Prime Video's NASCAR Coverage: ML Fuel Analytics in Action (SPF303)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Powering Prime Video's NASCAR Coverage: ML Fuel Analytics in Action (SPF303)

In this video, AWS engineers Aravind and Mona present the Burn Bar, an industry-first real-time fuel analytics innovation for NASCAR on Prime Video. They detail how they built a serverless solution using Amazon Kinesis Data Streams, Amazon Managed Service for Apache Flink, and Amazon SageMaker to process thousands of telemetry signals per second and deliver fuel consumption insights to broadcasters in under 5 seconds. The team overcame challenges including the absence of fuel sensors in NASCAR cars and lack of ground truth data by combining physics-based models with machine learning. Key technical concepts covered include Flink's keyed streams, broadcast streams, tumbling windows, and state management. The solution achieved 534 million media impressions and was built from MVP to production in just 12 weeks with a small team.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Bringing Real-Time Fuel Analytics to NASCAR Fans on Prime Video

All right, good morning everybody. Welcome to session SPF303. Can I get a quick show of hands? How many of you watched NASCAR races on Prime Video? All right, great, we've got some NASCAR fans in the room today. My name is Aravind. I'm a Senior Solutions Developer in the Prototyping and Customer Engineering team. With me I have. Hello everyone. I am Mona. I am a Solutions Prototyper also from the Prototyping team. Thank you. We are super excited to take you through this journey of innovation on real-time fuel analytics and how we actually brought it to millions of NASCAR fans.

So on the agenda today, we're going to have a brief touch on the NASCAR coverage on Prime Video. We'll understand more about the Burn Bar innovation. For most of the session today we're going to spend some time on understanding the solutioning aspects, and then I'll wrap it up with a call to action, which I hope will inspire you to build solutions in your organizations which are very similar to what we've done here as well.

So I want you to imagine this. You're watching a NASCAR race, and you see two racers fight it out for a position. One of them is charging really hard while the other is holding back, and as a fan you wonder why is this driver really holding back? Is it a problem in the skills? Is there a problem in the car? Or is this racer actually playing a much longer game of fuel strategy so that he wants to position himself to be a leader at the end of the race?

So fuel strategy is something which has never been made visible to both broadcasters and fans. But if you think about it, fuel strategy still remains one of the most critical and exciting parts of a NASCAR race. So we challenged ourselves and said how can we use the concept of fuel strategy to unlock unique storytelling opportunities and equip our on-air talent so that now they can drive fan engagement by providing deep technical insights to what the racers are really doing on the track.

The key challenge here is this is a race, right, and fuel strategy keeps changing very, very often, every few seconds perhaps in some of the races. So we want to make sure that these fuel insights are provided to on-air talent in a near real-time basis. We chose an option where we said we want to actually equip our on-air talent to have these fuel insights in less than 5 seconds. So this was the primary challenge for us.

The Burn Bar Innovation: Making Fuel Strategy Visible Through Graphics and Dashboards

So what we did was we built the Burn Bar. So what is the Burn Bar? It is truly an industry-first innovation from the Prime Video innovation team, which basically gives us an ability to look at a driver's pattern. What is the strategy on their driving? Are they driving really aggressively? Are they driving conservatively? What's the fuel efficiency they're getting out of their cars?

There are two aspects to the Burn Bar. We actually created a customer-facing graphic, which is displayed during the broadcast, which is an example that you see on the screen. And what you don't see is a real-time dashboard which is actually made available to the on-air talent. Now this real-time dashboard is critical because this is what enables our on-air talent to be proactive rather than being reactive. They're able to spot fuel strategies as they develop during the race before any of that becomes obvious for the viewers.

So this combination of the customer-facing graphics and the on-air talent dashboard is really powerful because we're not just showing some data, we're actually equipping the on-air talent to develop interesting stories and engage the fans as they watch these races.

So for those of you who haven't seen what a Burn Bar really looks like, let's take a look at it. And trying not to allow that 11 car to get by. Well, let's check it out. Chase Briscoe getting nearly 5 miles a gallon, Denny Hamlin 4.6, meaning for every gallon of gas, Chase Briscoe's going about another third of a mile, another quarter of a mile. That over the entire fuel run may be a difference. So to your point, that's not by luck and that's not by tune-up. That is Chase Briscoe saving gas versus Denny Hamlin.

Fuel mileage continues to play a role here at Pocono, right here at the end of the straightaway. He's probably lifting a little bit there and that jumps up to 52. Look at that last lap, big saving, big chunk of saving right there, and it allows no brake into one or less brake. Just drive in deeper. That partial lift is not doing it for us. So what you saw just there is an example of how a Burn Bar actually looks like and how it's being used by the on-air talent to engage the fans as the race occurs.

Now the Pocono race here was very intriguing because it's one of those races where fuel strategy played a vital role for Chase Briscoe to actually win it. If you analyze this race, he actually had a really nice fuel saving strategy where he was basically lifting off the throttle at the beginning of the straights and coasting at the turns. So if you really haven't watched this race, I highly encourage you to take a look at that.

Technical Challenges: No Fuel Sensors, Guarded Secrets, and Stringent Performance Goals

Well, let's look at some of the challenges that we had to go through to build this capability. The first one is the lack of real-time fuel data that our on-air talent can make use of to provide strategic insights. We wanted to create a really engaging customer broadcast. Now this is a race. We deal with thousands of telemetric signals that are coming from each of these NASCAR cars. And we wanted to process all of these telemetric signals in real time in a reliable and highly accurate way.

Now I know some of you who may not be aware of NASCAR racing might be wondering why are we even talking about things like fuel consumption, fuel efficiency. Isn't it all straightforward? I mean, all the cars have sensors or gauges, not in the case of NASCAR cars. None of these cars have any fuel gauges in them. None of them have any sensors which give us any indication about fuel efficiency, fuel burn, any of that.

Another interesting challenge here is fuel strategy still remains one of the most guarded secrets in the NASCAR teams. Each of these NASCAR teams own this secret. It is competitive intelligence at play here. So there's really no way for us as a development team to reach out to them and say, hey, can you share any of your strategies with us? Finally, the lack of ground truth data. Obviously we've got some machine learning models as well, but the lack of ground truth data really makes it hard for us to train a model and integrate it into the processing system.

On top of it, there are too many variables which actually impact fuel consumption and fuel efficiency. These variables could be, for example, driving styles. It could be track temperatures, weather conditions, and a whole lot more. So, having seen these challenges, let's look at some of the goals that we actually set for ourselves here.

The first was to determine the fuel analytics. We want to figure out how much fuel is being burned in the current lap, in the last few laps across stages, and provide trends to the on-air talent so they're able to spot these fuel strategies as they're developing during the race. Next, the Prime Video innovation team have really stringent KPIs both in terms of performance and accuracy, so this solution had to meet all of that. We had to provide these fuel insights to on-air talent in less than 5 seconds. We want to enable broadcasters the ability to display this Burn Bar graphic on demand and finally create a real-time dashboard for the on-air talent.

High-Level Architecture: A Serverless Solution Built on AWS Managed Services

So for the rest of the session today, we're going to look at details about how we built the solution. I'm going to begin with a really high-level architecture of what's involved in the system.

The first thing I want you to look at is that we've only relied on serverless and managed services from AWS. We are an extremely small team, and we didn't want to spend a lot of our time looking at the operational overhead or managing servers. So we said from day one we're going to adopt these serverless technologies and managed services from AWS. I'm going to quickly walk you through our data flow here. Telemetry signals from these race cars basically flow into ERDP. ERDP happens to be NASCAR's official data distribution platform. All of that data is now ingested into AWS.

Now you'll see that we've got different layers. These are all separation of concerns. The first layer is the data ingestion, where we use AWS Fargate along with Amazon Kinesis Data Streams. Once we've got all of this data flowing into Kinesis Data Streams, we use the processing layer where we actually make use of Amazon Managed Service for Apache Flink. So Flink happens to be the core processing engine for the entire solution here. Flink integrates with the AI layer, where Mona is going to talk about some of the models that we've developed and how we integrate that with Flink.

And finally, once we've determined some of these fuel analytics and trends, we push all of that into the storage layer, which is later distributed to broadcasters and to on-air talent dashboards using API Gateway and delivered over Amazon CloudFront, where we also make use of AWS AppSync. So what we're going to do in the remainder of the session is go through every layer and understand each of them in depth. But first, I'm going to call on Mona to talk about the work that we've done on fuel analytics.

Fuel Analytics Approaches: Computer Vision, Physics-Based Models, and Machine Learning

Thank you, Aravind. As Aravind explained, one of our biggest challenges was we had no direct information about how much fuel was left in each car. No car broadcasts "I have 5 gallons left" in the middle of the race. So we watched hours of NASCAR coverage. We consulted with subject matter experts from the Prime Video Innovation team to learn about fuel analysis, race analysis, and also the physics of fuel consumption. So from day one, we knew that no single approach would crack this problem. So we took three different approaches.

The first one was visual analysis or computer vision. There are several fuel consumption patterns that we can extract from NASCAR video coverage by training a specialized computer vision model. We analyzed different behaviors like drafting, pack racing, and so on that have correlation with fuel consumption. The next approach was physics-based models. Physics-based models are parametric models that map the set of telemetry data points to fuel consumption. And to find the parameters of this model, we used historic races with different characteristics, and we also used real-world events such as green flag and yellow flag as calibration points to optimize the parameters of this model.

Once we built our foundation, we used Amazon SageMaker to train an AI specialized predictive model to calculate fuel consumption. Unlike the previous approach where it was a parametric model, this was data-driven, allowing the model to find patterns and correlations in the data. The result of this exercise, or all these experiments, was that we settled down on two approaches: the physics-based model and the machine learning model.

Now, let's see how we deploy this model and use it in our solution. The AI-based model lives on Amazon SageMaker Inference endpoint that's capable of running computationally intensive inference predictions.

SageMaker gives us auto-scaling, which is so critical when we are processing over 40 cars during the race. We are automatically able to auto-scale during the inference. We are also talking about millisecond inference time using the SageMaker inference endpoint, which is critical when we want insights every 5 seconds. Our physics-based model is embedded in our real-time processing layer, where Irvin explained more about this layer.

Now that we've talked about how we train the model and where we deploy the model, let's talk about the data. NASCAR NEXTGEN cars transmit thousands of telemetry data points every second. This can be throttle position, RPM, brake pressure, and so on. This data flows into the NASCAR Event Racing Data Platform, or ERDP. This is how broadcasters, racing teams, and other authorized entities subscribe to access the data. ERDP uses NATS, which is a messaging system that enables pub-sub for real-time messaging. This is particularly good when we are working with real-time data, and this gives us high performance and low latency. Now Irvin is going to talk in detail about the AWS architecture that we use to ingest the data from ERDP using NATS. Thank you.

Data Ingestion Layer: Using ECS Fargate, NATS, and Kinesis Data Streams

Alright, thanks Mona. So now that we've got telemetry signals coming into the ERDP platform, let's understand how we ingest all of that data into AWS. You will see here that the first thing we're doing is running subscribers. We've got subscribers who subscribe to this ERDP data using a single ERDP endpoint. These subscribers are essentially ECS Fargate tasks which run in multiple availability zones. More importantly, because we use NATS, which is enabled by ERDP and is what ERDP supports, we also make sure that these tasks share the same NATS queue group. This actually gives us two important capabilities.

With NATS queue groups, NATS will ensure that it is actually able to load balance messages as they arrive to multiple subscribers, which are running in multiple availability zones. NATS also will ensure that it will deliver a message only once to a single subscriber in this group. Now if one of those subscribers were to become unhealthy, NATS will ensure that it will load balance and push those messages to the healthiest subscribers. That's basically the fault tolerance aspect that we get out of it.

Now that we've handled all of those aspects, the next thing we do is route all of those incoming data into multiple Kinesis Data Streams. One thing we don't have here is an exhaustive list of all the telemetry signals that are being used, but you want to imagine that there are plenty of signals that we get, and so we basically separate out these signals logically to be sent to multiple Kinesis Data Streams.

I want to quickly talk about some best practices for stream consumers here. If you step back and look at what the subscribers are doing, they're essentially subscribing to the ERDP endpoint, getting all those messages, and routing them back to multiple Kinesis Data Streams. If you think about what the subscribers' role is here, it's more like a pass-through layer with very little processing happening. We want to make sure that we maximize the throughput for each of the subscribers.

From that aspect, you want to make sure that every implementation that you're doing within the subscriber is asynchronous in nature. You don't want any blocking network calls. One of the things we also do within the subscriber is, for example, publish a lot of custom metrics for monitoring the latency, the throughput, and so on. We don't want any of those blocking calls.

We obviously want to rely on a lot of asynchronous calls there. The next thing we do is monitor for back pressure. This is a critical aspect for many of the streaming systems because this is essentially a problem that occurs when you have a lot of messages that arrive much quicker than what a subscriber can process, and it gets manifested in two different ways.

First is memory exhaustion, where you have messages that land and get accumulated within a queue, and thereby your queue depth keeps increasing. The second is message loss where you typically run into buffer overruns. You want to make sure you can use CloudWatch with alerting, which is what we use, and we set up alerts to monitor all of these aspects. The other thing we also want to monitor is the throughput and the latency that we're seeing. As I said, the subscriber has to be extremely efficient so that we are not causing any more latency, with the overarching goal of saying we want to actually provide insights every five seconds.

Another really critical aspect when it comes to stream processing is you have to question yourself: do you need every message that is coming through the pipe? Do you really need to process it? If not, you want to actually use something called adaptive sampling. Now, there are cases where, for example, you will require every message to be processed. In our case, if you think about it, we're receiving thousands of telemetry signals per second. If I get thousands of telemetry data points in every second, which is corresponding to, let's say, engine speed, do I really need so many data points? Absolutely not, right?

When we did that analysis, we saw that there is really no need for us to process every single message. What we do instead is we have some sophisticated adaptive sampling strategies in place. That really helps us because now all of a sudden we don't really have to scale the processing layer to some crazy levels that are really not required for what we want to do. Now that we have the data ingested and stored durably within Kinesis Data Streams, let's look at what and how we actually process all of this real-time data stream.

Processing with Amazon Managed Service for Apache Flink: Building the Real-Time Pipeline

First, I want to actually start off with some core requirements on why we picked the Amazon Managed Service for Apache Flink. We wanted a service which can give us the ability to process stateful computation for both bounded and unbounded data streams. We want the ability to connect to multiple data stores with really nice state management capabilities built in. Additionally, but more importantly for our specific use case, we wanted a service which can help us convert unbounded data streams into bounded data chunks. We call it windowing, a concept that's present within Apache Flink. We'll get into those details very soon.

That brings us to Amazon Managed Service for Apache Flink. It is a managed service, perhaps the easiest way for you to actually start building analytical applications when it comes to real-time data streams. You have the ability to run these Flink applications on a continuous basis. It supports both bounded and unbounded data streams. A great way of getting started with Apache Flink is to make use of Zeppelin notebooks. With Zeppelin, you're able to analyze these data streams in an on-demand way without even having to create an Apache Flink application.

You could write, for example, you could use a SQL way of interaction. For example, if you want to figure out if there are anomalies in the real-time data you're getting, or you want to create some visualizations, or you want to perform some data analysis, you're able to do that on the fly using Zeppelin notebooks, and the Apache Flink service supports that. Finally, the service actually supports you to build applications in multiple programming languages.

So where does Flink fit in the overall data streaming services that AWS provides? On the left, we've got different sources for real-time data streams. In the ingestion and storage layer, we offer two capabilities or services: Amazon Kinesis Data Streams and Managed Streaming for Apache Kafka. In our solution, we made use of Kinesis Data Streams.

Once we've got this ingestion covered, we can use the Amazon Managed Service for Apache Flink to process all of these real-time data streams. And then with Flink, you have the opportunity to send all of this processed data to multiple destinations using the concept of Flink sinks. Let's come back to our solution and look at the concept of a Flink pipeline. What does it really mean?

At a high level, what this really is is essentially a data processing pipeline. So in our context, we're dealing with multiple Kinesis Data Streams. We'll build a pipeline within Flink using some of the constructs that Flink offers. There's a really nice integration with Kinesis Data Streams, and then we also use the concept of a broadcast stream. We'll get to the concept and what it really means in the next few slides.

Then we actually use enrichers to add additional context to the stream that is flowing into Apache Flink. Finally, if you look at one of our goals, our goal was to actually provide these insights to the on-air talent every 5 seconds. So we are dealing with a case where we have unbounded data streams coming in, but our requirement is to actually provide insights every 5 seconds. So we really don't have to process all of those streams in an unbounded fashion.

Rather, what we could do is we could use windowing options within Flink to break it down into more of a batch process. So this is exactly what the tumbling window actually provides us, and we'll get into what tumbling windows means. A lot of the fuel analytics work and the processing that we do, including integration with the AI models, happens within these windowing functions. Once Flink processes all of those data points, we've got the fuel analytics in place, and we flush all of that data out to an output Kinesis Data Stream, which is further consumed by downstream components.

Core Flink Concepts: Keyed Streams, Broadcast Streams, Tumbling Windows, and State Management

Next, we'll walk you through a couple of really key important concepts within Flink, which is what is really important for you to build processing of real-time data streams. The first is the concept of a keyed stream. So Apache Flink is a distributed system. When you start sending data streams to Apache Flink, you have the ability to actually partition this data stream and run it in multiple task instances.

How you partition the data stream is actually done using something called keyed streams. An example is being shown here where we use the key by operator. If you look at the enriched data, that is the enriched data stream which has all the enrichments in it. Then we use the key by operator. We use the vehicle ID in this case and let Flink partition it out.

What that does is Flink will now partition the stream and make sure that all of these partitions are being processed on each of these tasks. On the diagram, you will see that we've got events coming in for three different vehicles, and Flink has partitioned it and is basically running each of those partitions on different task managers in parallel. So when we talk about low latency, high throughput processing capabilities on Apache Flink, this is exactly what we mean. If you really want higher throughput, you have an option to now horizontally scale it and run it on multiple task managers and task instances.

Now let's get into the concept of a broadcast stream. Broadcast streams are really powerful in that you're always going to have cases where you've got your real-time data streams being processed by an existing Apache Flink application, and you actually have specific events that occur very rarely. In our case, very rarely in the race, for example, maybe there is a car crash that happens in the race. There are also events where racers have to pit stop. Now if you think about some of these events, these are events that don't occur on a continuous basis.

So you're always going to have a main data stream, and then you're going to have the sporadic data streams as well. But you want your Flink application to make use of these events, these short-lived events as they occur. So the way you're going to enable that is using the concept of broadcast streams. With broadcast streams, you have an ability to distribute your reference data to all the parallel instances, and they can process them in parallel. One of the things you could also look at here is how you want to make sure that this data is available for all of those task instances that are processing your real-time data streams. And in those process functions that you're writing, you're able to now tap into these broadcast streams and pick up the state changes and incorporate that into your logic.

Next, let's talk about the tumbling windows. This goes back to the goal that we want to say we want to actually provide insights every 5 seconds, so we break down this unbounded data stream into bounded chunks using the concept of windows. Now in our case, we use the concept of tumbling windows, and we do that on purpose because we don't want any overlapping between those windows. We want to make sure that events are exclusively falling into specific isolated time buckets. So with tumbling windows, you get non-overlapping fixed-size time buckets. And in the example here, you see we've got 3 different windows with a window size of 3 seconds, and you'll see that each of these events have been placed in each of those windows, and there's absolutely no overlap.

By far the most powerful concept within windows is process functions. So what happens with Flink is after 3 seconds, if you look at Window 1, after 3 seconds, Flink will close this window and invoke the process function, and this is where you perform all the analytics that you actually want to do with all of those events that have been aggregated into that time bucket. Another important concept to understand here is you're going to be dealing with a lot of state within Flink. So one thing you want to understand here is if you're not managing state the right way within Flink, Flink is going to actually remove all of that state when it closes that window. We're going to talk about how we persist the state across windows using something called Keyed State.

Keyed State lets you persist state across windows within Flink. So if you look at our example, if I want to know at any point in time, hey, tell me which this vehicle, tell me how much fuel has it burned till now in the current lap, it's like a running total that I want to maintain. And because I have these 3-second windows, I want to make sure that I'm aggregating across all of these windows and that state for me. So I want to persist this across windows, and the way I do that is by using Flink primitives such as ValueState and MapState. Flink actually supports a lot more, but we're actually using ValueState and MapState in our solution.

The other cool thing about this is also it enables fault tolerance and state recovery using the concept of checkpoints within Flink. We'll talk about checkpoints very soon. Now, as with anything, whenever you're dealing with state, you always have to think about how much of state am I actually aggregating.

An important consideration is to think about state size. So what do you do with that? If you don't really handle state in the sense that you want to actually try and clean up unwanted state, you're going to have some resource contention issues here. Perhaps you're going to get into unbounded memory growth and have problems with availability and so on.

So what you want to do is think about what makes sense for your applications. Think about different ways in which you can actually clean up the state. Flink allows us multiple options. There's a time to live State TTL, which Flink automatically handles. It will expire the state as long as there is no activity in it. Then you also have a programmatic way of cleaning up the state. Finally, you also have a timer-based approach where you actually register callbacks and Flink will call it on your behalf.

Now let's talk about checkpoints, which actually gives us a really important capability, especially when you're running systems continuously to process unbounded and bounded data streams. It's the concept of checkpoints. What checkpoints allow you to do is to take a consistent backup of your application state across all Flink operators. So if you look at the example here, if you look at the data stream that we have, we've got multiple events coming in. And Flink service can actually initiate a checkpoint in predefined intervals.

This is a configuration that you can have within the service, and the Flink service will make sure that it's actually checkpointing the states. It does so by storing all of this state in Amazon S3. And when Flink sees that there's a problem with one of the operators or the underlying task instances, it initiates a restoration from the previous successful checkpoint.

So when you're thinking about this, you want to make sure that the state that you're persisting within Flink is really optimal. You want to monitor for checkpoint duration. You want to monitor for checkpoint sizes, and the way you do that is by using custom metrics with CloudWatch. And once you see if you're seeing some of these checkpoint durations or sizes looking abnormal, it's a pretty good indication that your Flink application state is really high, and it's time for you to really optimize that.

Another important aspect when you deal with real-time streams is messages getting out of order. In a distributed system, it is bound to happen. You're going to have messages which will get out of order sometimes. So as a solution provider, you want to think about what does this really mean to my system. If I have messages coming in in real time, but all of a sudden I've got some messages which come 10 seconds out, do I still process those messages, or do I simply ignore it? This is a decision that you really have to take as a development team.

This is where the concept of watermarks is applicable within Flink. So with watermarks, you're essentially telling Flink that events up to time T have arrived. And when Flink sees that, it basically closes the window and triggers the process function. So in this example, what we're showing you is a setup where we have a window size configured as 3 seconds. However, we're also telling Flink that you can expect some events to come out 10 seconds out from now. So basically now Flink, instead of closing the window in the first 3 seconds, it's going to actually wait for 13 seconds.

So this is a trade-off between latency and accuracy. Whether you want to actually process every message, even though some of those messages are delayed, or is it okay for you to drop the delayed messages with the gain that you're getting in terms of latency. So coming back to our use case, we want to actually create insights in less than five seconds. So if I have a message which is beyond five seconds, we essentially drop it. Right, so this is a conscious decision that our team took, and I encourage you to kind of revisit this concept when you start developing your systems as well.

So when you're building such a system, you obviously want to look at observability, which is very critical. You can obviously use a lot of CloudWatch metrics that is available out of the box. But when you're working with Apache Flink, it is bound to have, you're going to be creating a lot of custom metrics as well. A great way of implementing that is by creating custom sinks within Apache Flink. With custom sinks, you have the liberty to encapsulate all of these API calls. You can batch them up for cost optimization. You can handle any throttling errors, if at all you're subjected to, by implementing some retry logic using exponential backoff, or you could even implement circuit breakers as well. Think about encapsulating that into custom sinks within Apache Flink.

Throughput Optimization: Scaling Kinesis Data Streams, Apache Flink, and SageMaker Inference

Next, I want to talk about various options for you to optimize on the throughput, and I'm going to focus on three important services in our solution. Starting with Kinesis Data Streams.

The way you increase throughput and scale Kinesis Data Streams is using the capacity mode that the service provides. There are three different capacity modes that we support. The first one is the provisioned capacity mode. This is applicable for use cases where you have a predictable data stream, you know how much throughput you will require, or you have a steady increase in the input data stream traffic. That way you are able to now configure the number of shards you want to actually enable within Kinesis Data Streams.

The next option is the on-demand standard, and you want to actually use this option where you have unpredictable behavior or unpredictable traffic patterns. You may expect certain surge in the traffic pattern as well, and that's the place you want to actually make use of on-demand standard. The only thing you want to be careful with or watch out with on-demand standard is when you have extremely high surges in data stream traffic, the producers may be subjected to throttling.

Now we very recently announced the third option, which is the on-demand advantage option. With this, you really get a combination of nice performance with cost optimization. You want to use this option again when you have unpredictable traffic patterns, but the distinguishing part between the advantage and the standard is with advantage, what you do is you essentially provision warm throughput. So you could say, for example, I want my Kinesis Data Stream to have a warm throughput of 100 megabytes per second. And if your traffic looks like maybe it starts with five megabytes per second and then all of a sudden you're seeing 50 megabytes per second and all of a sudden you're seeing 85 megabytes per second, this is going to be instant scaling. Kinesis will handle this injection without any problems because you've already warmed the throughput.

So if you haven't checked this option, I encourage you to definitely take a look at it.

The next service to think about in terms of throughput optimization is Apache Flink, and your option here is to make use of parallelism and something called KPUs. So with Kinesis, I'm sorry, with Apache Flink, your option to scale out is by increasing the parallelism. And you could further break out those parallelisms into something called KPUs, and you're essentially billed for the number of KPUs you're provisioning. So it's a great option for you to think about, one, optimizing the Flink pipeline that you've developed, and two, how do you actually optimize the throughput by provisioning the right amount of parallelism and KPUs.

Finally, I want to talk about Amazon SageMaker AI inference as well. So with SageMaker inference, auto scaling is a great option. It's an option that we actually currently use as well in our solution. There are different techniques for you to actually establish this auto scaling, as Mona called out. If you are interested in low latency inference timings and you want to be able to scale out as the demand raises, auto scaling is the way to go.

Live Dashboards for On-Air Talent: Leveraging AWS AppSync and GraphQL Subscriptions

Now that we've seen some details about what we do within Apache Flink, let's look at the next architectural area, which is about how do we enable these live dashboards for on-air talent. So if you recall, one of the steps within the Flink workflow is once we figured out all the fuel analytics, we push out all of that data into an output Kinesis data stream. From that data stream, we use AWS Lambda functions to actually consume all of that data from Kinesis data stream, push it out into the storage layer, which is delivered using API Gateway. So in our design, we actually use a single table design within DynamoDB, and we make use of AWS AppSync for GraphQL capabilities.

So what is AppSync here? So if you think about traditional APIs, there are a lot of complexities in it. One of the biggest complexities is under fetching and over fetching. So if you've ever created RESTful services, RESTful APIs, you will see that in some cases clients get a lot more data than what they expect. In some cases, clients actually get less data than what they expect, in which case clients end up making multiple API calls to get the right information that they need. The other thing you also want to think about is what about real-time APIs if you want to enable pub-sub mechanisms for APIs, how do you do that? So there is some complexity out there, and these are the kind of complexities that AWS AppSync service solves for us using GraphQL.

GraphQL is a query language. It is a strongly typed query language which lets clients now create data structures to make calls to get the right amount of information from each of these APIs. This is a serverless managed offering from AWS, and it makes it really easy for you to actually build highly scalable pub-sub mechanisms for APIs and for websites as well.

So how do we use this in our solution? So the Lambda function, which is used as a trigger to the output Kinesis data stream, reads the data from the Kinesis shard, and it initiates a mutation on the AWS AppSync service. We've already registered DynamoDB resolvers with the AppSync service. Once these resolvers are in place, AppSync automatically pushes that new data that we got from the Kinesis data streams into the DynamoDB table.

Once we persisted this state in the DynamoDB table, using a concept called GraphQL subscriptions, AppSync now delivers all of this data back to the connected clients. In our example, these are custom React front-end web applications which use GraphQL subscriptions to connect to the AWS AppSync service. Once they connect, AppSync establishes a secure WebSocket connection with these clients, and once the mutation happens on the AppSync side, it uses a resolver, pushes data into DynamoDB, and broadcasts all of that data back to the connected clients using WebSockets.

Results and Call to Action: From MVP in Eight Weeks to 534 Million Media Impressions

So let's look at what we've been able to achieve and what kind of results we've seen with this solution. We launched the Burn Bar for the very first race which was broadcast on Prime Video. This was the Coca-Cola 600, the very first time where fuel strategy was really publicly made visible to any broadcaster, and the reviews and the positive feedback has been amazing.

A lot of media attention as well. We've landed about 534 million impressions on all of these articles published on various media outlets. NASCAR and Prime averaged more than 2 million viewers. More importantly, what we've done here is we've actually created a foundation on which we can further build additional capabilities and features upon.

Finally, going back to the success criteria we had, where we wanted to provide fuel analytics and meet stringent KPIs, we've been able to meet all of that. So how did we really get to the solution? It was really not luck. We've actually made use of the AWS Well-Architected Framework. This is a framework which gives you a set of best practices around different pillars.

If you haven't really checked out the Well-Architected Framework, I highly encourage you to take a look at it. This is our very first iteration of the solution. While we've been very successful in meeting the business objectives, we know there are opportunities for us to improve upon in the solution as well, and we will use this framework as a guiding principle for us to get there. For those of you who are interested in this, we do have a Well-Architected Framework tool which is a self-service tool that you can use to perform reviews on your own.

You could also reach out to your account teams or account SAs to get their help if you're interested in getting some of your workloads reviewed as well using this framework. I want to next talk about call to action here. If you look at some of the challenges, this was really an area where we were completely new, so we were getting into uncharted paths. You want to embrace failure here, right? The biggest risk is not about taking an uncharted path and failing, but rather being on a paved road while others take it. So definitely be open to taking this uncharted path and embrace failure.

When you've been doing experiments, look at how you can scale them as well, because scaling is where the actual real value lives, right? So you want to definitely make sure you're scaling. You want to be receptive to two-way door decisions as well, because in many ways when you're innovating, you're going to hit some problems, but you have to have a detour that you want to be able to take and take a different door and different approach to be successful.

Finally, empower teams to accelerate the experimentation velocity. AWS gives you a lot of services and tools for you to do that.

We do have a few blogs that I highly recommend you take a look at if you're interested in understanding how NASCAR sends all of that telemetry data and signals into ERDP. There's a really nice blog that you can look at. Feel free to take a look at some of the driver testimonials and press releases that we have on the Burn Bar innovation.

Quick show of hands, how many, what do you think is the effort we took on building the solution? Any takers? How long do you think it took to build the solution? Six months? Well, we had an MVP in eight weeks. We already started some of the integration testing with real racing data in eight weeks. In about twelve weeks, we had productionized the system with a small team.

Mona and I are going to be sticking around. If you have additional questions, please reach out to us. We'll be here. Thank you so much for your time. Have a great conference, everybody. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community