Kazuya

Posted on Dec 11, 2025

AWS re:Invent 2025-NBA Inside The Game, Powered by AWS - Building the NBA’s New Stats Program-SPF307

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025-NBA Inside The Game, Powered by AWS - Building the NBA’s New Stats Program-SPF307

In this video, the NBA Stats team and AWS present their Inside the Game platform, showcasing how they process 29 skeletal body points per player at 60 frames per second to generate advanced basketball statistics. The team explains their migration to AWS infrastructure, including EKS and Flink applications that add only 50 milliseconds of latency to real-time data delivery. They introduce Player Gravity, a new metric using transformer architecture and factorized attention to calculate defensive pressure scores, revealing which players draw more defensive attention than expected. The leaderboard shows Stephen Curry leading in gravity, validating the model. Future stats include Leverage Score for clutch moments and Handle Score for ball-handling skills, with plans extending to WNBA analytics.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: The NBA Stats Team and the Mission to Educate and Engage Fans

Alright, hello everyone. My name is Ian McKiernan. I'm the lead product manager for the NBA Stats platform, and we're here to talk to you today about the NBA's new Inside the Game platform with AWS. I'm Dashiell Flynn. I'm a principal consultant in our sports practice at AWS. I work on and lead a lot of our AI/ML data and analytics projects across our league, team, and sports customers. And I'm Charlie Rohlf. I get the honor of leading the NBA stats team, and I'll kick it off here with a slide presentation.

This is our 300 level talk about Inside the Game. We're excited to talk about the NBA and what we're doing with AWS. I'm a retired 500 level engineer who can no longer cut it at that level, so I'm appropriate at the 300 level for today. So can everyone hear me okay? Yes, excellent. My first presentation to green earmuffs all around, so thank you. But let's start with our customary hype video as the NBA. We put a lot of effort into our marketing, but this is great to get us excited and we'll get into what the Inside the Game platform does, what our team does, and go from there. We're excited for the presentation. So with that, enjoy.

It's game time. Great. So with that we want to unveil the Inside the Game platform, which is from the NBA and powered by AWS, and we're excited to talk to you about that today. But first we have to start with a little bit of background and context. So you'll see on the loop there this expected field goal percentage, and we'll talk more about that later. But what I really want to highlight before we move into our team are those little green dots all over Luka's body, right?

That represents our player tracking data and it underpins a lot of what we do here. So we are generating 29 data points, wrist, shoulder, elbows, shoulders, knees, hips, etc., 29 body parts of every player 60 times a second throughout the course of the game in three dimensions, and you can see those little green dots. They represent an X, Y, Z coordinate of every player 60 times a second. It's an incredible amount of data. The scope of that data really speaks to the scope of our team, and to understand what Inside the Game does and how we did it, we have to start with what our team does.

So our stats team handles everything on the screen here. It's everything in the back end related to the basketball data, the data that describes the game on the court. So we do the ingestion and storage wherever that data comes from, whether it's a partner, whether it's internal, whatever data describes the game on the court, we're taking it in, we're ingesting it, we're storing it, we're processing it, we're ETLing it, we're keeping it for the historical record books and doing that generation in the middle where we add intelligence to it. And that's really where Inside the Game shines, where we try to add intelligence to the content, to the data. We want to tell stories with the data and then ultimately we have to deliver that data.

We want to take that data and deliver it to our products, deliver it to our teams, to our broadcasters, and ultimately our fans, right? Because if we have one mission, our mission is to educate and engage fans. So we have built this Inside the Game platform to educate and engage our fans and take that data, tell compelling stories, and help our fans understand the game better, understand why great players are so great, what makes them the best athletes in the world, right? Why are these teams succeeding? Why is another team struggling? We want to be able to use data to tell those stories because the more we can educate, the more we can engage, it will lead to better broadcasts, better engagement with our digital products, and generally better business outcomes.

From One Data Point to Millions: The Evolution of NBA Player Tracking Technology

But you know, since we're operating at the 300 level here, you can see the scope of our team is broad. Like we're only going to cover a slice of this with the Inside the Game platform, but I'm lucky to lead a team of extremely talented engineers and product managers that handle all of the steps on here and happy to answer questions about it as we go. So the first thing to drill in a little bit more, and again we're going to go into that content generation, that's where the Inside the Game really shines and where AWS has stepped in to be an incredible partner across the whole stack, our entire tech stack, and amplified our mission to educate and engage fans really from day one. We're going to focus on how Inside the Game renders for our fans.

And in order to do that, we have to understand sort of what are we trying to do? We're trying to quantify a play. So in case you're not a basketball expert, let's just show a play, and we'll jump back to this later. But we have Giannis here, he's going to get a lot of attention on the defense, kick it out on the floor, but the nice little cute pass there for a three in the corner. So that's a play, right, the moment the team got the ball at the end of the possession, but not every play is the same. Some plays are a distinct moment in time, some plays span a lot of time. But ultimately what we're trying to do is quantify the action and quantify that play.

But there's a lot of history here that we need to understand. We've come a long way. You remember that shot of Luka with the 29 data points on his body. It wasn't always that way. We have on the screen here a shot of the scorebook from Wilt's 100 point game. It used to be that a play like that, a play like Giannis with a kick out and a give and go, or all those plays were really at best one data point. Just one data point. If they scored, what was the score? That was the only data point we had.

Over time we added data points. So we got to the point where we had shots and misses and makes, and then we had rebounds, and we had assists, and over the years we added more and more stats, and it became the box score that you knew. But then the box score wasn't enough. There's only so much you could do because you'd have the aggregate of all the stats. But you needed the play by play, you needed the temporal aspect. When did they happen? Who was on the court? And we had the play by play.

So from 1996 onward, we had the play by play. We would know the time of each event, and then we could tell you who was on the court and which lineups were efficient. And we could really measure the success of each player in a different way from '96 onward. But you're still talking about relatively little data compared to the scale we're dealing with now. And it took a big leap forward when we added player tracking data.

A little over 10 years ago, we started with just a single data point for every player, an X, Y, Z coordinate of the player on the court as they moved around in your high school geometry class. That was a big leap, a stepwise leap. And then we took another step forward in the last 3 years and we moved to that pose data as we call it. That's the head, shoulders, knees and toes, the skeletal pose data of the players, 60 times a second, 29 body parts. And then imagine what that unlocks when you're trying to quantify a play that we saw, and you can see our little dancing skeletons as we like to call them, matched up to the video. But that's what the data looks like when you just play it out.

It's an incredible data set. There's so much we can do, and in many ways we're only scratching the surface of what's possible. But we now work with AWS to take in all this data and think of that pipeline we showed in the early slide, ingest it, store it, process it, do live real-time split second data science through our Flink architecture, running seamlessly on AWS and deliver it to our products, our broadcasters, our teams so that they can make use of it. And really, back to that quantifying a play idea, that amount of data is incredible. Millions, literally millions of data points for every single play. We went from 1 to millions.

But we can't take millions of data points and give millions of data points to a broadcaster. We can't give millions of data points to a coach or a fan. They're not going to know what to do with that. So a lot of the work we do when we quantify a play is how do we understand the aspects of the game and quantify the specific basketball things that a broadcaster can then act on, or an insight we can give the fan to engage them and educate them about what they're seeing on the court.

Quantifying the Play: Turning Raw Data into Basketball Insights

And all these aspects go into it and we'll talk more over the course of this presentation about how we prioritize all the things we could do with the stats. But again, it all boils down to this idea of how do we quantify a play. And not every play is the same. As I mentioned, we have our expected field goal percentage, xFG%, that takes in all these features, the pose of the shooter, how closely defended were they when they shot it, were they leaning, were they coming off a screen, how fast were they running. All those aspects of a single moment in time, that moment of a shot. That's part of quantifying a play.

We can tell you how effectively shooters shoot, like what's the expected field goal percentage, which shooters exceed the expected field goal percentage, which don't quite do that. But that's just a moment in time. That's a play. But so is something like a horns action. Double high ball screen, there's 100 variations of this the teams run, but that spans a long time. So there's not one size fits all. There's a lot of different plays to quantify.

But ultimately, if we quantify the play, and we can identify it and use AI and ML to find that shot, to find the features that describe that shot, to build the AI model, the ML model, to pull out the expected field goal percentage, to detect that horns action, once we find it, after we find it, then we can quantify and we can tell you tendencies, frequencies, efficiencies. Where on the court do they do it? How often does LeBron go to his left or his right more often? What happens when he goes to his left and his right?

But the first step is always finding it. You have to quantify the play. From that, then you can generate the insights and you can go from there. And we're so excited to go into how AWS has helped us really amplify that goal of educating and engaging fans through quantifying these plays and generating these cool new stats that we're going to show off today. So with that, I will hand it over to Dash to talk about the program overall.

The AWS Partnership: Building the Basketball Intelligence Platform at Scale

Thank you, Charlie, very much. At AWS, we're very excited to be partnering with the NBA. I think as we've gotten into our overall objective, we really see this as an innovation partnership across the organization, and that the NBA is really living up to its reputation as the most innovative professional league we have here.

What we started to build out is that overall partnership framework and how do we start to develop these capabilities over a five-year period of the partnership and how are we leveraging core AWS services and combining that with the wealth of data Charlie was talking about at the NBA. So how do you take those three years, the third season of pose data, and start to develop insights? How are you starting to look at 14 plus years of tracking data at center of mass and start to develop and pull out insights there, and how do you start to do that at scale? So setting the foundation of really being able to apply different services for different outcomes and where are we leading to with that foundation of integration of NBA's data with AWS services.

On top of that core foundation, we're really looking to build out scale. So how do we start to scale across the organization and help the NBA move faster at higher volume with the same kind of resource inputs? And so that's really where the Basketball Intelligence Platform is designed to build out those core foundations, right? And so to be able to build things once and then scale it across the organization. So if you have an integrated pipeline for pose and video connections synchronization, you use that for advanced stats as we are, but also helpful for player health and looking at biomechanics and those sorts of data. So you can use it once across the organization, or you're building distribution pipelines, you're building those once and scaling them. So really that core engine and that platform that enables the organization to scale its speed and help these guys start to develop rapidly and at an innovation pace that we at AWS built around.

And then from there, then we start to get into the individual products and services that we're building. So you see here what we started with so far is what we call Play Finder, which is the ability to search for any play or any similar play in the history of that tracking data. So you can start to either just say I want to see LeBron dunks, or a fan can design a play. I want to run a Horns Action into elevators into an open wing three, you know, corner three. How do you find that play and how do you put that in to be able to search and pull back the most similar? How do you filter that down to your favorite team, your favorite player involved, those kind of capabilities? And so those products and services are going to be a suite of capabilities that we build out together over time that add to that top layer for more and more inputs and engagement with fans, supporting teams, supporting the organization, et cetera.

And then today, where we're really investing in showing is the Advanced Stats platform, right? And what are those additional new advanced stats that we're able to develop and leverage our AI and machine learning services to start to draw out the insights and start to understand nuances of the game in a measurable way that we haven't been able to previously? And that's where we sort of, you know, how do we get there and how do we start really being able to look at how do the core stats platform migrate over? And welcome Ian up to talk through how that migration happened and how it's led into where we are today and what we're launching with Gravity.

Migrating to AWS: Real-Time Player Tracking Infrastructure and Processing Architecture

Thanks. Yeah, so we're going to start with how we migrated the platform over. There was a whole conversation about moving the whole NBA over yesterday, but a small piece of that was the stats platform. We'll go through that, then we're going to go a little bit through the stats ideation process, and we'll finish up by looking specifically at the new stat that was released last week on a Prime Video broadcast.

So with migrating the Advanced Stats platform, Charlie talked about it, we have many different operations within the stats team, so we had to migrate our historical data warehouse. We have our live stats processing and delivering those via our APIs, and then we have a real-time player tracking data which included moving terabytes of data over to AWS, which there are many tools available to help us do that effectively and efficiently. But that also includes the live processing of that player tracking data and getting that out to our fans, and I'm going to go into details more about that architecture and how we migrated that over on the next slide. But we also have our stats alert system, which is a real-time data science application which is looking for insights using data science to look for outliers, right? Career highs. This player has scored or accounted for 80 percent of his team's points in a quarter, or this team is going on a run, right? And we're able to alert our broadcasters and tell stories to our fans through that. And then we have our data science tools in our development environment that we've migrated over as well. So all of this had to go over, a lot of moving pieces, but there were a lot of tools and AWS and great products on there for us to migrate over to.

So this is a look at our real-time player tracking infrastructure. This is one of our more complicated pieces of infrastructure, right? We have a lot of data stored, terabytes stored in S3 as the background of this, and then we're running a lot of Kubernetes operations in EKS. They need to talk to each other with low latency, right? This needs to move quick from the time the data leaves the RabbitMQ topic in the arena, right? This whole operation is only adding about 50 milliseconds to reaching the end user at the other side of the API, right? So we're moving about 10 streams a game, right, and this needs to scale obviously because we can have 12 concurrent games. That first set of games, two games went into overtime.

They're sending data from the G League game earlier that day. This system needs to scale, right? It needs to scale back down and be cost effective and needs to be reliable and fast. And we've been able to do that with EKS on AWS.

So we have our optimum feed service that tells the shepherd spawner, hey, there's a game coming up. The shepherd spawner spawns a shepherd that says, hey, make sure that everything's running for this game. That shepherd will then spin up a number of pods that we call listeners here that move data from the RabbitMQ topics to an internal Kafka topic. We have listeners that then move that data to S3 for future use. We have Flink applications that are spun up. We spin up close to 100 Flink applications a game to calculate in real time player speed and movement angles and create these building blocks that we use to then create the stats that you see headlining inside the game.

There's a lot of little pieces that go into that in this infrastructure that add up to player gravity, for example, which we're going to talk about a little later. And then we have listeners that move that data to an external facing Kafka topic that feeds our API for users to consume that data. So all of this, and then it's scaled, you know, this should be three dimensional. We have a live version of this, and we have a historical version. So we prioritize that live data for live games and then historical games are processed on a different node pool, but everything's able to happen within the same environment within the same node pool to minimize the networking costs or overhead and get that data through as quickly as possible to the end user. And we're able to do that very effectively using these AWS products.

The Stats Ideation Process: Identifying Gaps and Building New Metrics

So how do we come up with a stat though? We have that infrastructure when you want to build something on it. How do we do that? Well, we start with a couple of questions. Where are the gaps in the stats discourse? What are people talking about? What do we all intuitively know about the game when we watch it, but there's no way to talk about it with numbers? And so one thing is defensive value, and we've been able to target this with our defensive box score.

We have steals and blocks which we have had for a long time, but our defensive box score has allowed us to go a step deeper to see who is guarding who for what period of time and what stats do they accumulate during that period of time. And so this is one way that we can look deeper and add context to that conversation about which players are adding defensive value to their team. Offensive process, which team gets the best shots, not who scores the most, you know that, just look at the score, but which team got the best shots. A team can use that to look and say, well, did we have the right process? Maybe we don't need to change our game plan for the next game because you know what, we got great shots, we just didn't hit them. And we're able to do that with our expected field goal percentage metric and shots over expected there.

And then another thing, impact without scoring. We all know hustle players, you know, steals, rebounds, blocks, assists, the guy who's everywhere on the court but maybe doesn't fill out the stat sheet. How can we quantify that? These are just some areas that we're looking to explore when we come up with a new stat. Then we're talking to our fans, our broadcast partners and teams, and what are they highlighting in the discourse.

Earlier this year, we heard a lot of talk about player gravity, which players are drawing the defense towards them, morphing the defense and creating open shots for their teammates. Well, everyone notices that on the court, the Giannis clip that Charlie showed earlier, but how do we quantify that? Well, with player gravity, which we're going to talk about next, we're able to do just that. Shot makers, our expected field goal percentage, that shots made above expected, this is a way for us to quantify who the best shot makers are and have a discussion about it, add that context for fans and not just say, well, he's my favorite player, that's your favorite player, who's better.

And then lastly, as Charlie talked about, we're in a very data rich environment, but what data do we have available? We have the box score and the play by play. We may need to join back to that. We have player tracking data and then I talked a little bit about these NBA developed stats which are the building blocks towards some of these larger headline stats. One of the things underlying our player gravity metric is our defense and our defensive box score is a defensive pressure score, something that's not really talked about.

Going into a little bit more detail about that later, but essentially quantifying defensive principles. Everyone's heard ball you man, wreck basketball, stay between your man and the basket. We've been able to create an algorithm that measures the amount of pressure that each defender is putting on each offensive player. And through that we're able to use an optimization algorithm that maximizes total team defensive pressure to figure out which player should be matched up with which player at any given moment in time, which helps us make our defensive box scores and also helps us with player gravity.

And the next slide, because everyone loves a good flow chart, here it's really how we go through this process. What do we want to measure? Do we have the data to measure it? If no, can we collect the data? If yes, we go back to the front and start again and hopefully get to the left side of the tree this time.

If not, we have to wait for technology to catch up sometimes, right? And that's where the player tracking data comes in. In this data-rich environment, we do have the opportunity to look at things we've never looked at before and get more to the left side of this tree. If we do have the data, will it require a model? No, great, we can start building it. If it does require a model, does it require labels? And you get the rest, right? But this is basically the workflow that we go through when we're coming up with our new stats, trying to figure out how we can build something for the fans, for the broadcasters, for our teams that's meaningful and will provide more context and insights as part of the Inside the Game platform.

Player Gravity: Quantifying Defensive Attention with Machine Learning

So specifically, let's talk about player gravity. This is the play that we watched earlier. We'll watch it again one more time and go into a little bit more detail. You'll see a moment here where Giannis drives. He's going to draw all three of these defenders, and that starts this turn, right? Myles Turner's open, behind-the-back pass to Rollins, knocks down the three. This moment in time right here, in the traditional box score, Giannis gets no credit, right? Maybe in the new age where we have some secondary assist metrics, you can quantify some of that value. But really, this is what starts the turn and gets the play open and creates that open shot at the end. But we've had no way to quantify that until now, right?

And so here is some of what we're working on under the hood, where as the play's flipped, but as Giannis is dribbling here, you'll see the defensive pressure scores are being calculated in real time, right? And you'll see we have actual DPS, defensive pressure scores, expected DPS, expected pressure scores. It's the key piece of the gravity model that we have developed with AWS. But what this allows us to do is quantify which players are getting more attention from the defense than we would expect from the average player, right?

So a little bit more on the gravity definition. We have that actual DPS, right? So we're doing pairwise defensive pressure scores calculated at 60 frames a second in real time based on players' speed and movement angles, how far they are relative to the basket, other things like that go into the actual defensive pressure score, where the ball is, where you are, where the basket is. We can calculate that, or we've been calculating that for a few years now. What AWS has helped us develop is an expected defensive pressure score model, which is a machine learning model that takes the inputs of the offensive positioning and the ball and tells us where we expect, essentially where we expect the defense to be standing based on where the offensive players and the ball are. And what this allows us to do, like I said, if we take the actual minus expected, we get attention from the defense above expected, which seems like a pretty straightforward definition for player gravity. And so we're really excited for what this can do.

But a little bit more about that expected pressure score model. As I said, we take the defense out, and basically the model learns. This is a similar architecture to the coverage model that was being talked about at the NFL, but using factorized attention and a transformer architecture, we're able to feed the player locations over a two-second period as well as trajectories, time, and score into the model to understand where do we expect the defense to be in this situation. So if your man is pulled in, you know, if that's Joel Embiid at the high post and that guy in the corner takes a step closer, right, this model is going to pick that up as being more attention than otherwise would be had at that high post area, giving more gravity to a player like Joel Embiid. So we're really excited about this application.

And as you can see from the actual leaderboard, current leaderboards for the season as of last week, anyone who's a fan probably expected Steph Curry to be at the top, so we know the model's working, we're done, right? No, but what this allows us to do is have intelligent conversations, right? Everyone expects Steph Curry to be at the top, but maybe you're asking why is Michael Porter Jr. second, right? He's on a struggling team. He may be one of the better players on a struggling team, and therefore the defense is able to throw more attention at him. And so what this allows us to do with teams and fans and broadcasters is have those intelligent conversations about what team strategies are being applied.

Maybe one team produces more gravity in general because they're pressuring up more than is expected, right? And we're able to quantify that now, and we're able to see those differences and evaluate that both in real time for our broadcast and show that to our fans, but also over the course of seasons and different games and compare those numbers. So you see the rest of the list looks pretty good from the eye test. So like I said, really excited. This was launched last week on the Prime Video broadcast, and we're excited to bring it to the Inside the Game platform online in the coming weeks. But I'm going to pass it back to Charlie to talk about what's coming in the future. Thank you, Ian.

The Impact of Gravity and What's Coming Next for NBA and WNBA

But before we go about it, I just want to bring this back to what we talked about at the beginning, let's think through how that data science achievement of analyzing the tracking data, coming up with defensive pressure score, developing the ML model for actual and expected defensive pressure score, and ultimately resulting in Gravity. That data science effort, that achievement, manifests in a lot of different things. You see the leaderboard here. This is a great fan engagement tool. It's a new leaderboard that speaks to an aspect of defense and offensive pull that we've never been able to quantify before, and that's a great fan engagement tool. But it's also every moment of the game, it can turn into something that we generate for highlights, helping broadcasters see a specific Gravity play and the color commentators speak to it as Stan Van Gundy did on the first Prime Video broadcast. It touches so many aspects of our business and it also touches all the aspects of that stats infrastructure, and that's really why AWS has been a great partner and able to amplify our mission of educating and engaging fans.

They've come in and helped us with the database layer, the ingestion layer, the data science aspects, all the parts that go into creating this. It comes out as a leaderboard, but there's so much work along the way. We're so appreciative of AWS's contributions to all the steps that go into this and all the other ways that Gravity will impact our fans, our broadcasters, and help our teams analyze the game in a new way.

So what's to come, looking ahead? Some of our tentpole events here, Rivalries Week, All-Star. You've seen Defensive Box Score, Shot Difficulty, we talked about at the outset the Expected Field Goal Percentage. We just released Gravity here on the Black Friday broadcast. Next up is our Leverage Score, and that's a look at how players contribute to the team's chances of winning. It's not just points, it's the rebounds, the steals, getting a defensive rebound in a clutch moment. We're building a new model that will measure those players' impact and how each player has helped their team win the game.

For All-Star, we're targeting Handle Score. That'll be a fun model that really speaks to how players handle the ball, the great crossovers, the spin moves behind the back, a fun metric that'll speak to the great athletes and the incredible way they can do with the ball and how they can move around the game, and that will come out around All-Star.

But then let's not forget about the WNBA too. AWS is coming in. We're going to work, same stats team on our end. We work with the W. Basketball data is basketball data, and we're excited for what we can do with the WNBA. We're looking at things around chemistry, and we're honestly still kicking around ideas, but how do we speak to the players' chemistry and who interacts with each other and how do they pass to each other, the types of passes? We're looking at rebound prediction, we're looking at triple threat, and we love stats that are easy for fans to understand, but we want to make the most impact we can make with the stats that we do.

So how do we develop stats that are both compelling from a data science standpoint, push the industry forward from basketball analytics, help our teams, but are ultimately approachable? So that fans, no matter your interest level in the NBA, whether you're a novice or you're a hardcore stat fan and basketball nerd like me, is it something engaging, something that will help you, and something that will teach you about the game? We try to find things that are approachable at all those levels.

Q&A: Video Games, Foul Tracking, and Democratizing Analytics Across the League

So hopefully you'll have some questions. Finally, we'll end with where to watch. Right here. Watch right here. We're going to have the happy hour and the game after the next session, but appreciate all our broadcast partners ABC, NBC and Prime Video, Peacock. Hopefully to see you all at the upcoming happy hour where we can turn on the game and learn more, but with that, we would love for some questions from the group. Can we stand up? We have the mics ready in the back. All right. Who's going to break the ice? All right. Yeah. Sir.

Hi there, do you think this data is going to make its way into the next generation of video games? If you introduce these concepts, you'll see that. And kind of the follow up to that is, are you doing any predictive for full game outcomes? Do you see a future in that?

All right, starting with a tough two-parter. Certainly, as the game evolves, the video game, I don't want to speak for the folks at 2K, but we certainly work with them and provide data to them and help them in any way they can, but ultimately they're video game experts and I'm not that. But it's a good question and I'm sure they're looking at all ways to make the game more realistic. On the predictive side, yes, we do. We are the league office. We don't always want to predict who's winning or losing the game because we are the neutral third party arbiters of the game.

But we do use things like win probability models under the hood to help inform what content to serve and things like that. So yes, we absolutely do predictive things like that, and that win probability model is underlying the leverage score. So how do we find those moments that have high swings, potential swings in win probability to be a leverage moment? Exactly. That leverage score, although it's not directly surfaced, you can often think of it as how did the players' contributions move their team's win probability.

Obviously, contributions in clutch close games are going to move the win probability more than at the beginning of the game, but that's part of basketball and which players perform in those situations. A rebound and a steal moves win probability a lot too. So while we may not surface exactly the win probability, it's a very important factor to that leverage score that goes into it.

Over here. Quick question, do you guys track foul calls? We track foul calls. Certainly, I mean, we need to know when a player gets the sixth foul so they can't play anymore. No, I'm saying every time a coach or a team or owner sends that type of information to the league, they say, hey, this is clear-cut footage that this is a foul call. So that's the basketball operations department, not our team. In terms of that, it's an internal process between the teams and the league.

You'll be next. So do you think you'll ever track, if you have sensors on the players, you could technically, if they're pressure points, you could see when they got fouled, if they got fouled, right? So we don't, 29 body points per player is a lot, but not necessarily enough to know things like that in terms of content. But you never know what technology is around the corner, and certainly we're looking at ways to use the data to help automate officiating things and improve the flow of the game and things along those lines.

And just to be clear, the skeletal pose data, there's no sensors on the players. But in the NFL they have sensors in the shoulder pads for Next Gen Stats. In the NBA it's all optical tracking, so it's all generated off of video to build that full pose data set. So there's no actual sensors deployed.

Okay, the last question. Do you ever think we'll have a robot ref? Listen, that's your limit on controversial questions. We've already, Adam Silver I said hi. Yeah, hi, Brian Jackson with InfoTech Research Group. I know that NBA teams have their own analytics departments that might be trying to suss out some of these more advanced metrics even before the NBA would, right? So do you feel that by releasing these advanced metrics that maybe you're sort of democratizing that intelligence across the league?

Certainly, anything we build, our team builds, we are going to share with all of our teams, NBA, WNBA, etc. We want to make sure they have access and that they can do what they want to do with it, and then what teams do and their competitive advantage is up to them. But we definitely talk to our teams about what would be helpful to them and what would be meaningful and help their workflows, not just from a data science standpoint, from a data delivery and a logistics standpoint and cloud infrastructure. How do we get the data to them easier with less data engineering and less ETL process? So we're in touch with teams all the time and absolutely want to help them run more efficiently and get the insights they need to run their business.

Hi, I'm Sam Meyer from BMS. Thank you for this. I have two questions. The first one is, how do you account for the propensity of the defense to help or not when it comes to the Gravity score? I'm curious about that. And then also, are you thinking about anything that's sort of the inverse of the Gravity score for defense, sort of a Wemby factor, if you will?

Technical Deep Dive: Model Architecture, Real-Time Inference, and Strategy Changes

That's a funny one. Someone did bring that up to me. It's a little bit more difficult to tease out on the, I'll take the second question first. It's a little bit more difficult to tease out because you'd have to figure out what, you know, you have to give the information about where the defense is, right? And once you do that, the game's kind of up. You know where they are, not where they would otherwise be, right? It's a tricky question. It's been brought up. We're thinking about it for sure, which players on defense push the offense away. They don't want to come near Wembanyama, right? It's definitely on our minds, but how to solve that problem is something we're still thinking through and talking about.

And then your first question is, by using a season's worth of data, right, we downsample a little. We're doing 10 frames a second here. We account for hopefully all of those different moments in time. The model takes in 2 seconds worth of input data, and we're deploying it with at least, you know, we can have 25% of that masked out. So within half a second of the players coming over half court,

we're calculating that gravity or running that expected defensive pressure score model at 10 frames a second. We hope by training the model over a season of data, we find those situations where help versus no help, because all teams do it differently. Where's the average defender on this play? That's how we can tell that overhelp situation is when that player really steps over and increases that defensive pressure on that player above the expectation. So it's really just a volume question to kind of tease out that help situation. If that's a strategy to take it away, we want to understand that because if you're deploying your help, you're leaving someone else open, and that's a high gravity moment that we want to capture and may hopefully lead to a high expected field goal percentage shot. We can tie all these things together and tell really great stories about how teams are creating value on the offensive end.

If you look at the play we showed right when Clarkson comes over as a third defender on Giannis, you can see the gravity scores going up significantly. We're using all of those 25 pairwise GPS scores from all the defenders to each of their offensive players throughout, and then the secondary defender, the third defender comes over, has a big impact on gravity.

Not to turn this into an engineering talk, but can you give us a quick overview of not just the model you're using, but also how are you guys serving a model fast enough for a real-time prediction during a game? Without going into too much detail, can you give us kind of a quick overview of how the machine learning model actually works and then how are you serving it fast enough to give real-time predictions during a game?

So like I said, we are using an inference server located in the same node pool that we're sending these queries to, making sure we're scaled and that those are equipped with the right machine sizes to handle the load of the concurrent games that are going on. We're sending those queries and we're also using dynamic batching on those inference servers to basically take any queries that come in within, let's say, a few milliseconds of each other to batch those together and run inference on the same thing. So when we have 10 concurrent games going on, we can actually do inference more efficiently with those servers. That's how we're doing it. As far as the details of the model, we've tried a couple of different architectures, and I'm going to have to defer to the data scientists. Based on what I talked about, we did try some simple CatBoost and XGBoost options, but we found over time that with factorized attention, the transformer architecture where the model is able to learn the relationships between players in space and time over the course of that two-second window that we're feeding was the most accurate overall. That's where we decided to go with that architecture.

Next year, a 500 level session, absolutely. How we're getting the subsecond latency bit by bit, absolutely, and maybe they could move the tire changing closer. That would be an extra year. The only other thing I'd highlight too is we're really excited to see what each of the broadcast partners do with that live data. Gravity will be available in subsecond, and you started to see some of the stuff Ian showed in the graphics, but how do you overlay that in real time on the video and see how does that third defender come over and spike Giannis's gravity, for example. Do they show that as a number? Do they show that as the size of a bar? Working with their graphics teams, we're really excited to see where they go and how they start to show that to fans in live games.

That's an interesting point on the subsecond latency. Looking to achieve that through moving that actual defensive pressure score from a calculation of different subscores to a model by including the defense back into the model and including that additional data point, we're actually able to predict more accurately the actual defensive pressure score in the moment. Then by using inference on the raw data as opposed to calculating features, we'll be able to serve up gravity at subsecond latency. It's been really interesting to see how the model reacts differently when you include the defense versus you don't and similar architectures and how the different approaches are better for the different models, but that's what we're moving towards for the subsecond latency gravity.

Thank you again for having us. Nice shoes. I could listen to you talk about NBA stats all day, and you have a great broadcasting voice. My question is, is this data going to be available for fans through an API or something in order to empower them to maybe create their own advanced metrics or develop stats later on? Thank you for the compliments. I appreciate it. Certainly it'll be available through our products and putting the leaderboards on, but the raw data, we're looking at potentially ways of doing that through hackathons and things like that, but nothing formal yet.

I appreciate that there's interest. You can go today to NBA.com and see all of the shots taken and get the expected field goal percentage for both teams. You can see how they shot over expected, under expected, the actual field goal percentage versus their expected field goal percentage. So all of that data is now available on the NBA's digital sites to be able to look at. But 29 skeletal points 60 times a second, that would be amazing if we could get that. We appreciate the interest.

Again, back to the 500 talk, we've got to come back for the arm and spinning up your own cluster to manage all the storage. Going back to the data points that are in the model, like the defensive data points, are those depending on the actual player, like who the player is, or just that they're a defender? No, so the data points in the model, like for if you're talking about calculating actual defensive pressure scores, all of that is player agnostic. It's more based on the 2D locations of the players.

We have goals and aspirations to actually improve that using player orientation and where they're looking over time. Right now that we have this 3D player tracking data, you don't necessarily need to use all 29 points, but it's really cool to be able to see the hip orientation of the players, which way they're facing. So what's going into that actual defensive pressure is all player agnostic on the model front, and that's what allows us to calculate that actual over expected because we just basically assume every player is the same with the model and the calculation, and then we see which specific players are drawing that attention.

Did you also make that decision because you didn't want to have to retrain the model each year as new players came in, or I guess also are you expecting to have to retrain the model based on any changes to rules or anything year over year? Yeah, something that we're discussing is how to best service year over year because there's changes in strategy. One thing we've noticed with the defensive pressure score, or sorry, with the expected defensive pressure score models, is that gravity is higher this year compared to last year, and that's if you've heard the discourse that teams are pressuring up more.

So directionally, gravity is still a very powerful metric, but compared to last year you're going to see different magnitudes of gravity because defenses are taking a different strategy. And funny enough, two years ago, we saw higher gravities there as well. So for whatever reason last year, and you know, this is the discourse we love to have and love to understand the game better, this year that strategy pushed everyone's pressuring more, which leads to higher gravities. Now, like I said, ordinarily the leaderboards and everything still makes sense, but we have had conversations about when do we retrain, when has strategy changed enough that we need to retrain to better serve or to understand the game. Cool, thank you.

Closing Questions: From Audio Tracking to Historical Data and Measuring Success

Hello. Hi, Ben Shalo. Thank you. This is fantastic. I love this. I was wondering if you do any audio tracking and if maybe there was an opportunity to do a smack talk leaderboard. I know that's big on the court. I think people would love that. So I was wondering if you thought about that at all. It would be interesting. I'll say yes, we've thought about it, but I don't think that will be forthcoming. But not the first person to suggest tracking what is said and trying to translate it to data in some way, but you know, a lot of hurdles there.

Yeah, from a technical perspective, there's an ability to do it and to be able to separate out different sounds. Do you want background noise? Do you not want background noise? Using some of the speech-to-speech models and those sorts of things, the ability to actually hear what the players say, I think the players association would have a lot to say about that before it ever comes out publicly for sure. And they might change what they're saying if they knew they were all being mic'd up. So then you're, you know.

As I work at high school sports, I'm curious how important the video equipment itself is in kind of grabbing some of these 29 poses and everything like that. Could I do this with a single camera in a gym? How close could I get, or is the video equipment really helping out? Really it's a question for Hawkeye, our vendor who builds the player tracking system. Our team takes that raw 29 points per player and works with that data, but we don't necessarily do the computer vision. That's our vendor.

However, you know, from a computer vision background, again back when I used to operate at a 500 level, the technology has improved a lot and there's a lot you can do with one camera. You know, but whether you would get 50% or 70% or 80% of what we have with 20 cameras and 60 frames a second, I'm not sure. But you know, it is impressive what some companies are doing with one camera in a more amateur high school setting.

Yeah, it's definitely coming, and I think the problem you run into with one camera is just occlusion. It's one thing that we deal with on the NBA side is that, you know, before this season we were looking at fixed length joints for joint positions that were high occlusion zones, right, the hip joints and the finger joints.

It's hard to determine where they are in space sometimes, but our vendor was able to improve that with the upgrades and the number of cameras they have to be able to go to variable lengths. But that's really the problem you're probably going to run into with a single camera. Once that player runs, once I stand behind Charlie, you can't get any of mine. You're just guessing, right?

So in a former life I did a master's thesis on tracking the basketball in the high school team I was coaching. And then I quickly realized there's a vendor doing it way better than I would ever do it by myself. So while it got me the master's, that was the end of that project. I would say that is the goal though over time, right, as you see this technology. You know, years ago you'd have to have 100 cameras around a football field to start to get some of this data, then it goes down to 40, now in the teens. So I think there's certainly, you know, the pace of innovation is such that getting to a point where you can do it with 3 cameras, then 2, and then hopefully ultimately with 1. It's certainly, you know, I think an industry goal, whether, you know, who gets there, how quickly that happens, we'll all be watching and seeing, but it's certainly where that evolution is happening.

Hi, Chandra from Nielsen Sports, partner with NBA. So the audio tracking, we do track brands on audio, so probably you can partner with us to catch that. I had a question on the APIs. We are partners, so do partners get access to the data through APIs? What do you know? Did you say NESN, the broadcast? Nielsen, Nielsen. Nielsen, I'm sorry, Nielsen. Happy to meet you after we can talk about it. Sure, I'm not the partnership person, know all the ins and outs, but happy to talk with you about it. Perfect. Thank you.

So we've been hearing a lot about Agentic. I love my NBA podcasts, but sometimes, you know, you don't know who's doing their research, and when I go to NBA stats.com, it can be kind of overwhelming to see all the stats. So have you thought about a chatbot agent who really understands the stats so I can ask it detailed questions like why do the Clippers stink this year or why are the Celtics turning around lately, those sort of things? Going to pass the mic to the people in the back corner. Stay tuned. Good question. I don't, I'm not allowed to tip anything, but good, good question, you're on to something. So, yeah.

Can you apply this data to historical footage, kind of going back to the question from before, you know, who has a better Gravity score, Michael or LeBron, those sorts of comparisons would be interesting to see? It's something we'd love to do and certainly something we looked at. It's not quite there yet from a technology standpoint, but, you know, that'd be an incredible world to live in. Well, not that far historically, right, but one of the things is, and the reason we decided to build these metrics the way that we have and focusing on some of these metrics on the 2D like Gravity is built mostly using the 2D tracking data mostly, right, or completely, and that is to be able to backfill it to the seasons, all of the seasons for which we have tracking. But we do not have the broadcast tracking back to the Jordan era unfortunately that's, you know, you know, that grainy footage, yeah, that'd be tough. Thank you. Yeah, no problem. It'd be great to see what Gravity was in the hand checking era and not need to send a double because going back to how many models you imagine with era, yeah, yeah, no 3 point line would be incredible.

Hey guys, I was just wondering, do you guys track any clutch metrics or, can you put the mic up a little higher? Oh, sorry about that. I was wondering, do you guys track any clutch metrics, you know, things that really turn the tide of a game, for example, like Ray Allen's 3 or block by James. Do you guys have any of those scores assigned to each player? So we will. With the leverage score, I think you're going to enjoy what comes out here soon with the leverage score, but you can go on NBA.com right now and add our clutch filter, which looks at the last 5 minutes of games when there's a 5 point or less deficit between teams, and we've had that standard for a long time and look at how players have performed when those clutch situations are happening. That's available right now, but the leverage score will enhance that in an exciting way that I think will be right in line with your question. Awesome, thank you. OK.

Good. Going once, going twice. Oh, wait. One more. Sorry, just one more question since nobody else had one. I didn't see on your roadmap anything about the GOAT score. When is that coming out? We need the tracking data back further, right? Yeah. Perfect. Oh, is everyone right that? No.

Hi, I was wondering for when you release a new stat like Gravity, how do you measure the success of a stat, like how much people are engaging with it?

Yeah, it's a great question that we struggle with as the product management hat of our team. In a lot of ways, we talked about the North Star, which is how do we educate and engage fans. But if you actually go kind by page through the app or the site, stats are everywhere. Our team does the scores, the standings, the schedule, and there are components of lots of different parts of the page, and it's hard to assign an ROI to any specific change we make to the presentation or one new stat in a box score.

It's hard to say how that stat impacted the overall experience and drew page views or any of the classic product management benchmarks that you might look at. And that's why we've adopted this idea of, all right, well if we're helping the discourse and moving the basketball analytics industry forward, and ultimately that phrase that we keep saying, educating and engaging fans, that's leading to good ROI and good business outcomes. And if broadcasters are wanting it and teams are using it and it becomes part of basketball, that's a home run success for us.

To plug our friends at Prime Video, you can go across the way to One Amazon Lane where Prime Video is right now. They've got NASCAR from this summer and Thursday Night Football, but they'll go very much into it with you about how they see these new advanced stats programs driving viewership engagement, how much the producing teams are putting them on screen. So we're really excited to drive that with each of these partners here around how much these advanced stats are actually going on screen, being used, or becoming a part of the dialogue. I think all of those metrics and things will start to look to track both internally and then with the broadcast partners.

All right, all right, thank you. Thanks, everyone. Thank you. This was great. Good work.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community