Kazuya

Posted on Dec 5, 2025

AWS re:Invent 2025 - GenAI game coach: Real-time gameplay feedback (DEV201)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - GenAI game coach: Real-time gameplay feedback (DEV201)

In this video, Gaganjot Kaur Kang, Staff Software Engineer at Sony Interactive Entertainment, presents an AI-powered game coach that provides real-time gameplay feedback. She demonstrates an AWS architecture using Amazon Bedrock with Nova Pro model, AgentCore, Rekognition, Transcribe, and Comprehend to analyze raw gameplay videos. The system processes a 5-minute Destiny 2 clip, identifying performance issues like situational awareness and timing mistakes that occur in 100-150 milliseconds—faster than human reaction time. The solution costs $0.90 per video analysis with 10-minute processing time. She shares a GitHub repository with full implementation details and discusses future improvements for latency reduction and live streaming integration.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Challenge of Real-Time Skill Improvement in Fast-Paced Gaming

Hi everyone. Can you hear me? Good. Okay, perfect. So I think it's time we should start this presentation. First of all, thank you so much for joining me today. I really hope you're having a great start to this conference. I know it's the first day, and today I'm very excited to share something that brings together two of my favorite things: gaming and applied generative AI.

I am from the gaming domain and have worked in it for the last seven years. Applied generative AI is a domain where most of us are quite curious nowadays. The title of my session is "GenAI game coach: Real-time gameplay feedback." The big idea behind it is actually very simple. In this talk, we'll explore an idea where you have your gameplay clips, and we imagine if your gameplay clips could talk back to you and tell you exactly how to improve.

If you like gaming and are into video games, there's another aspect of this talk worth noting. We are not going to talk about SDKs or game engine plugins. We are actually going to take a raw gameplay video and use AWS-powered offerings to see how we can build an intelligent agent that will guide us and act like a coach looking over our shoulders when we are playing a game online.

Before I get into this talk, I'll give a brief introduction about myself. My name is Gaganjot Kaur Kang, and everyone calls me Gagan. I go by Gagan. I am currently working for Sony Interactive Entertainment, which is the gaming division of Sony, known for PlayStation. I have been with PlayStation for almost seven years now and work as a Staff Software Engineer.

My focus is backend engineering, and I have worked in different domains, starting from personalization, then jumping into cloud streaming, and most recently in the AI enablement space. I am very passionate about experimenting with generative AI for gameplay, for developer tools, and for interactive intelligence. This talk is one of those experiments which turned into a super fun prototype.

We have limited time, but I'll try to go over a couple of things in the next few minutes. On a high level, we'll talk about what the skill improvement demand is in the gaming industry and why players are struggling. Then we'll look into how generative AI can help players make sense of fast or chaotic combat. Finally, I'll show you an AWS-powered architecture where we'll look into different offerings, mainly focusing on Amazon Bedrock and AgentCore, which orchestrates the entire coaching workflow.

I'll also show you a demo video of what I built and what it looks like. You'll see what the coach built using AWS says about a sample clip that I'll show to you. Finally, I'll wrap it up. If you're interested in continuing this discussion or have more ideas, definitely let's catch up. Once this talk is done, please meet me afterwards and I would be very happy to take your questions and your further ideas in this area.

So let's get started. The very first thing I would like to talk about is a scenario where you're playing Spider-Man, one of my favorite games. You're swinging through Manhattan, chasing a group of thugs, and one brute suddenly charges at you. A sniper laser lines up from a rooftop, and you get hit. In that moment, you think, "I reacted on time and I did all the right things, but I still got hit."

However, if you think about a game with Spider-Man style combat, it's very fast and chaotic. Human reaction time is usually 250 milliseconds, but these telegraphed attacks resolve in 100 to 150 milliseconds. This mismatch of timing means most players cannot even see their own mistakes. The scope for improvement is very limited because you don't even know what you did wrong or what move caused you to get hit.

This is not just Spider-Man. You can imagine this about most games in the gaming world. Looking at some studies, I've seen that most players actually misdiagnose their own mistakes in a video game.

This is where we have some challenges, opportunities, and a potential solution. Talking about the challenge, gameplay is chaotic. Players don't know what went wrong, and a mistimed dodge happens in 150 milliseconds. A wrong rotation decision happens in half a second. There is a complete mismatch of time between the player's response versus how the game is responding. Raw gameplay is actually very noisy. Most skill mistakes happen far too fast for players to even break down on their own.

That's where we have the opportunity. We have millions of players who want to get better. They don't have coaches and they don't want to invest their money into expensive tools or expensive software, which could also be complex to understand. Even studios don't want to invest in huge GPU clusters to build scalable analysis. This is where GenAI can shine. What if your gameplay could talk back? What if you had a personal AI coach that understands timing, movement, and decisions just from a short clip? That's where we are headed.

Building an AWS-Powered AI Coach: Architecture with Amazon Bedrock and AgentCore

Before we get into the actual solution I built, I wanted to show you a sample game clip. It's a long clip, so I'm not going to show the entire clip, but it's relevant because this is the clip which I used for actually performing my demo and building the website to analyze this clip. This game clip is 578 MB in size and the duration is roughly five minutes, but there is a lot of action, a lot of movement happening in this clip. I'm going to play it for a few minutes to show you what this clip is all about. Afterwards, in the demo, you'll see what the coach that I built using AWS had to say about this clip and what could be improved by this player.

I'll go on to the next slide because I just wanted to give you a glimpse of this video that I used for my analysis. Now comes the AI-powered architectural solution. Before we jump into the demo, I just wanted to show you a high-level flow of what my architecture looks like. Let's start at the very beginning. We have a user interface, which is basically where the user will input a clip. You need an interface where you capture the clip, maybe five minutes or ten minutes, whatever long clip, and you want to upload it.

This is an interface. It's a simple React-based single page application which is delivered globally using Amazon CloudFront, so that's the CDN layer, which will ensure low-latency access for players anywhere. Not only will you upload the clip, you can also see the analysis report on this UI. It also has another functionality where you can trigger an actual conversational agent which can talk to you and give you suggestions on the areas where you need suggestions.

From the next layer, we go onto the API layer. This clip actually goes into S3 storage. When you click the upload button, it will, behind the scenes, use two kinds of gateways. We have API Gateway or Lambda URL function which will perform different functions. API Gateway actually provides you different endpoints. There is an endpoint to get the upload URL of your S3 bucket where this clip is getting uploaded. There is an analyze endpoint which can trigger the analysis by AI. There is also a chat endpoint which will trigger your chat when you want to chat with the AI agent.

On the other hand, I also have Lambda URL function. The purpose of this is actually for bigger size videos. I was actually hitting timeouts on API Gateway, so the default timeout is 30 seconds. But for Lambda, you can actually increase this timeout. So for bigger videos, it's better to use Lambda instead of using API Gateway to upload a video, get the URL, and send it for the analysis. This is what the API layer looks like on a high level.

Next comes the compute layer. The compute layer consists of the different Lambda functions that this application uses. On a high level, there are three different types of Lambdas. The analysis Lambda is used for generating S3 pre-signed URLs for video upload and then processes your video from S3. Next is the chat handler Lambda, which receives chat messages from the front end, invokes the Bedrock agent behind the scenes, and returns the results to the front end. The last is the agent action Lambda, which executes Bedrock agent action group functions. This use case that I built, I created for myself, but even if you want to have multiple users accessing your website, the Lambda is scalable. Multiple instances of Lambda will be created if you have multiple users trying to upload videos to this website.

The storage layer is S3, and finally there is the AI/ML layer. There are several components in this layer. Amazon Bedrock is a key part of this architecture. I used Bedrock in two different ways. First, I give my video directly to one of the models I selected on Bedrock. Bedrock gives you access to a variety of different LLM models. I used Amazon Nova Pro, which can directly take a video, analyze it, and understand what is happening in the video.

The other area I built into this application uses AgentCore, which provides a facility to build a production-grade agentic platform. It takes care of memory and any kind of networking you want to build. I used it to build an interactive coaching chat with two-way communication. You can ask questions, and your chat is stored, so it retains the history of your conversation. In parallel, Amazon Rekognition performs computer vision analysis to identify different objects in a given clip, what the scene could be about, and any text present. If your video has speech, such as you saying certain things while playing the game, you can get a speech-to-text transcription using Amazon Transcribe, which is another parallel analysis that gets triggered. Finally, Amazon Comprehend performs sentiment analysis of the entire clip.

I wanted to quickly show you the prompts I used for this system. Prompts are very important because you can have different personas for your agent. This agent is a coach, and the type of coach you want matters. Do you want a supportive coach, a coach that is motivating and friendly, or a coach that is competitive and really grilling a player? That depends highly on the prompt. The kind of language generated in terms of suggestions from your coach depends on the prompts you give. As an example, I provided: "You're an expert gaming coach with years of experience helping players improve their skills. You have access to the player's gameplay," and so on. I also told it to be conversational, friendly, and motivating, and that you are here to help them level up their game. These are the detailed prompts I gave for different personas.

Demo Results, Cost Analysis, and Future Directions for GenAI Gameplay Feedback

Moving on, I wanted to show you a quick video. This is a demo of the video that takes the same gameplay video I showed you earlier. As soon as I click the start AI analysis, it runs behind the scenes. It uploads your video to S3, the Lambda triggers all kinds of AI analysis on your video, and it generates a report with suggestions. One of the challenges I noticed was that it takes a very long time to upload to S3, and that's where most of the time is spent.

I wanted to forward and show you the actual prompts and results from this analysis. You can see there is computer vision analysis from recognition, which shows there's a person, a road, a weapon, and a gun. This is the AI gameplay analysis, which comes from the Nova Pro model. There's a lot of analysis, but it basically recognizes that this video shows a player navigating various environments in Destiny 2, engaging in combat with enemies. Then if you see player performance, that's where it tells you there are moments where the player could improve in terms of situational awareness and enemy engagement.

At the same time, if you see the tone, it's very friendly and very motivating. The player's gameplay is commendable, especially in terms of aiming and cover usage. This is a high-level AI analysis. There is also overall assessment and sentiment analysis. I also wanted to quickly show you the coach. In the end, there is a button where you can trigger this coach. There are also some custom prompts for this coach, so you can select one of those prompts or frame your own question. If you select one of those prompts, it will generate the question on its own.

For example, if you select Analyze My Performance, it will create the question asking for a detailed breakdown of how your performance was. This is the screenshot of the demos that I wanted to show in the video, but it's not forwarding. I've shown you the screenshots instead. I wanted to quickly touch on the cost side. For my use case, I used all these different offerings and the total cost for this one five-minute video was $0.90. Of course, once you build something at a production level, you have to consider how many concurrent users are in your environment, how frequently you want to do this analysis, and how large the video files are that you're dealing with.

For a casual gamer with five videos per month, it would be $50 per year, or for 20 videos per month, it would be roughly $235 per year. Regarding latency analysis, the upload took a lot of time. It almost takes eight minutes to upload, which is one area where I would like to improve further. The AI analysis is still pretty good and takes almost two minutes to finish the entire parallel analysis. The total time is roughly 10 minutes for this five-minute video to complete the entire analysis with the report you get.

For lessons learned, I've already touched on a few things, including hitting the limits of video size and figuring out other ways to pass videos to models like Nova. For Nova, the limit is one gigabyte, and if you're uploading to S3, you can upload a video as large as one gigabyte. The other thing I wanted to point out is that I did not have to do any frame extraction. Nova recognition, as long as you've uploaded the video to S3, takes care of extracting the frames on its own. As far as I understand, Nova is doing intelligent sampling behind the scenes, so that was all taken care of.

What's next? I would like to work on the latency aspect and reduce it from 10 minutes to make it near real-time. I would also like to explore video compression and caching technologies to improve the overall latency and cost. Another area where this would be very useful is for live streaming platforms because there are a lot of players who stream their videos on platforms like Twitch or YouTube. Can we analyze it and build this conversational experience as the players are playing? Those are the areas I want to explore in the future. With that, I would like to share some links. There is a GitHub repo where I have put this entire code with instructions on how to run it. You can connect with me on LinkedIn if you want to continue this discussion after this conference.

That is all from my side, and thank you so much for coming to the session. I see that I'm right on time. If you have any questions, please meet me afterwards, and I'm more than happy to chat with you all. Thank you so much, and have a great rest of your day.

; This article is entirely auto-generated using Amazon Bedrock.