DEV Community

Cover image for AWS re:Invent 2025 - Rapidly accelerate product development and launches with AI (IND383)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Rapidly accelerate product development and launches with AI (IND383)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Rapidly accelerate product development and launches with AI (IND383)

In this video, VF Corporation and Fanatics Collectibles share how they transformed content creation using generative AI on AWS. VF Corporation reduced B2B catalog production time from 120 to 60 days by replacing traditional photography with AI-generated product images using Stable Diffusion and ControlNet, achieving 70% cost savings. They automated PDP copywriting across 20 languages with Claude, cutting costs by 80%. Fanatics Collectibles built an agentic AI system on AWS Bedrock to automate trading card back copy creation, combining structured player stats databases with web search agents and quality assurance agents. Both companies emphasized reimagining entire workflows rather than just automating existing processes, demonstrating how AI shifts human roles from execution to strategic oversight and creative work.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

The Content Creation Challenge Facing Modern Retailers

Good afternoon everyone. Welcome. I hope all of you had an incredible few days here at re:Invent, and I know by this time on day four, your legs are probably really getting tired. So I really appreciate all of you making time to join us here today. Let's get started.

Today, brands and retailers face unprecedented content demands across different channels and markets, especially as you need to launch new products faster and faster to keep pace with rapidly shifting consumer expectations. Whether it's for internal ideation between your design and merchant teams, or when your sales teams are meeting with your wholesale and retail buyers, or when you're presenting your assortment to the end consumer, compelling content is critical to your success. My name is Shiva, and at AWS I lead industry strategy for our fashion, apparel, and pure play e-commerce retail segments. In today's session, I'm really excited to introduce two of our customers who have leveraged generative AI and agentic AI to solve this important industry challenge in a very innovative way. Let's take a deeper look.

Traditional content creation barriers in retail span three key areas: process bottlenecks, escalating costs, and tackling quality challenges. Take the example of an apparel segment where the long lead time planning cycles often start a year in advance of a new season. A typical brand has four to eight collections each year with thousands of new products, all of which require high quality images, accurate product descriptions, and marketing materials. If you're a brand operating globally, you probably need that in over twenty different languages. As the window to launch new products gets shorter and shorter, this challenge really gets amplified, and that's where innovative brands are looking at AI as a viable solution approach.

Thumbnail 130

Thumbnail 140

Three Critical Themes for AI Transformation Success

Over the next fifty minutes, you're going to hear from two of our customers who are leaders when it comes to adoption of AI across the enterprise. First up, Davide Azzalini from VF Corporation is going to share their set of business challenges across both B2B and B2C segments and how, using generative AI, they've been able to deliver substantial improvements on cost and speed metrics. In fact, they didn't just deliver incremental improvements. They've completely reimagined their go-to-market process and shaved off two months in that whole GTM process, and this is huge because it helps the business become more agile in responding to shifts in consumer and market preferences.

Now, what if content is the most important part of your product itself? That's a completely different story. The expectations in terms of quality and accuracy go up another notch. Following Davide Azzalini, Celia from Fanatics Collectibles is going to share her set of unique challenges and how they leverage not just generative AI but build agents to deliver improvements on cost, speed, and accuracy metrics.

Thumbnail 200

Thumbnail 210

Thumbnail 220

So before I invite our featured speakers on stage, I want to share three important themes that you will notice as you hear these two stories in detail that are critical to the success. The first wave of generative AI in retail largely focused on marketing and customer experience. Think personalized recommendations and chatbots, but that was just the beginning. Today, agentic AI is penetrating deep into retail's core operations. Imagine agents that don't just analyze trends but actively contribute to your merchandise planning process, be it predicting emerging styles or optimizing your assortments. Agents are transforming pricing strategies by delivering real-time market competitiveness into the process or improving inventory management by autonomously balancing supply across complex networks. This shift of AI going from peripheral to core retail execution is a pivotal point in terms of how retailers need to think of AI in the enterprise.

Thumbnail 270

Thumbnail 280

Typical retailers start their AI journey by automating manual processes. While that delivers efficiency, it only unlocks a fraction of AI's real potential. You need to be able to step back and reimagine your entire business process through the lens of what's possible with AI agents in order to really accomplish breakthrough results. You'll actually see that in both of these examples.

They're not looking at AI assisting with existing workflows but reimagining the workflow from scratch if you had AI as a team member. This is going to deliver transformational results because the role of the human associate is going to shift from just doing to orchestrating AI-enabled business processes. The focus is less on execution but more on managing exceptions, handling quality checkpoints, and making strategic decisions.

Thumbnail 340

Lastly, we all know business sponsorship is really important to the success of business transformation initiatives, but it can truly make or break your AI transformation. I'm not talking about having supportive business executives or leaders. Today, technical teams can build AI solutions faster than ever thanks to AI itself. But for success, it's really important for business leaders to understand that we are no longer deploying just software. We are actually onboarding AI agents as team members. They need time to learn your business context, understand your business processes, and adapt to your brand voice.

This requires time. What I see when I speak to industry leaders is that leaders who are expecting instant perfection are going to face disappointments. But those who understand that it's a learning journey and commit to continuous improvements, where the AI solutions, just like your human associates, are constantly learning and improving, they're the ones who are going to see remarkable results.

VF Corporation's Business Problem: The Scale and Cost of Traditional Content Production

With that as a backdrop, please join me in welcoming Davide Azzalini from VF Corporation on stage. Hello, everyone. My name is Davide Azzalini, and I lead AI innovation at VF Corporation. Today, I want to share with you something that for us has truly transformed our 125-year-old company doing footwear and apparel and has transformed the way we go to market.

Together with AWS, we have embedded generative AI into our core business processes across our brand portfolio and elevated the way we create product visuals, we generate product copy, and ultimately, how quickly and efficiently we can bring product stories to our consumers and wholesale partners. You'll see how we have replaced large parts of traditional photography with AI-generated imagery, how we automate the creation and localization of global PDP copy, and how companies like Vans can use these capabilities to move faster, improve operational efficiency, and increase creative and operational flexibility at scale.

Thumbnail 500

Thumbnail 510

This is what we'll go through today: who we are, who is VF Corporation, where were we before embracing this AI journey, what we have built, details on how this works, the impact that we have been able to generate, what failed, and of course, what's next. VF Corporation is one of the world's largest apparel and footwear companies, and within our portfolio, we have iconic global brands like The North Face, Vans, Timberland, and several others. Across these brands, our products enable sustainable and active lifestyles, and most of these brands launch thousands of new SKUs every season.

Here's the business problem. With these many products across so many categories, generating the assets that you need to bring a season to market not only costs tens of millions of dollars, but it's slow, operationally heavy, and a bit painful. This inefficiency creates a big opportunity for an AI-driven transformation, and this is where we start.

Thumbnail 600

For the rest of this session, we will focus on Vans, which was our pilot brand. Here, you can see in a tangible way the scale which we are dealing with. You will need to photograph more than 3,000 products to build a B2B catalog for just a single season, and you will need to generate PDP content for over 1,000 products in 20 languages, just for one season. The one that you see on the left is just an example of one of those B2B catalogs. This was the Spring 2025 B2B catalog for Vans.

Thumbnail 610

Thumbnail 620

Thumbnail 630

So this was pre-AI, and as you can see, there is a lot of content in there. For footwear, we used to photograph prototypes in the studio, while for apparel we just used 2D sketches to save money and to save on time. And as you can imagine, 2D sketches are not great at conveying how materials look and the general appearance of a product.

Thumbnail 650

Thumbnail 660

What you see on the right is just an example of a PDP, and I added it there just to give you an impression of how much PDP copy you need to write for each PDP. So you need to write titles, you need to write headlines, long descriptions, short descriptions, bullet lists, benefits, features, and how to structure this for every product and then localize it in 20 languages. It was expensive, slow, and inconsistent, as you can see.

Thumbnail 680

Transforming B2B Catalogs with Generative AI: From 2D Sketches to Realistic Renders

Let's go to B2B catalogs. So our goal here was simple but ambitious. First thing, finding a way of turning a 2D flat sketch into a realistic rendering, and not only this, but doing this in a fast way, cost-effectively, and ideally faster than what it took to do studio photography. Before AI, our pain points were significant because it was very costly. You have to build a prototype, have the prototype be shipped to a photo studio, book the studio, book the photographer, do the shooting, and do post-production. So it's a lot of steps, as you can imagine, which often resulted in delays. Given the big amount of products that you needed to shoot and to build prototypes for, scalability was for sure not there.

Thumbnail 760

But this was the perfect case for generative AI because you have high volumes, you have high cost, you have slow turnaround, and you have, most importantly, structured inputs. So this was the perfect case for generative AI. This is how we turn one of our 2D sketches into a 3D realistic rendering. And by the way, those 2D sketches were already something that the company was building. So designers were already building these 2D sketches before embracing this journey. So we actually didn't need additional inputs with respect to what we already had.

The first thing that we do is pass this 2D sketch through ControlNet. ControlNet is a plugin for Stable Diffusion, which is the text-to-image model that we use, and what we do is edge detection. And those edges are very important because they will anchor the generation so that you can make sure that everything is going to be there structurally: stitching, panels, every detail is going to be there, just more realistic. In parallel, we take the 2D sketch and product metadata such as compositions, materials, colors, which material goes where in a shoe, and we feed both those things into Claude from Bedrock. And the purpose of feeding that into Claude is to use Claude to do prompt engineering so that you don't need to have a human doing prompt engineering.

And then we bring together the prompt that Claude generated, the edge detection, together with the original 2D sketch, and we give this to Stable Diffusion. This is not the off-the-shelf version of Stable Diffusion. It's a fine-tuned version of Stable Diffusion, and this was very important for us to get visual accuracy. Especially with earlier models, you need to do fine-tuning. Otherwise, the model will not know what studio photography looks like for Vans, for example.

But it's not only that. I remember that with original models, it was impossible for the model to get the Vans checkerboard pattern right. There were always sometimes the same color just one after the other, which is a bit strange. Isn't it easier to just do it right? No, I don't know why. So we did fine-tuning to fix that. And the other big use case for fine-tuning for us has been logos. Those models are trained on blurred logos because it's copyrighted information. So those models will never know how to do The North Face logo the right way. So you need to fine-tune those models if you want them to be able to do that.

With more recent models, they are much better at rendering text. So it's not certain that you will need to do fine-tuning. If you're using one of the latest Stable Diffusion 3.5 Large, or one of the latest Amazon Nova models like Canvas, for example, or you're using Flux or other models, you can probably do that without fine-tuning. But for the other models, fine-tuning was really important. And the result is what you see on the right. So it's studio quality, realistic renderings which are good enough to replace traditional photography.

Thumbnail 950

This is what it looks like from an architectural standpoint. So we have Bedrock for cloud model inferencing. We have S3 for storing input and output. We have Redshift for storing product metadata, and then we have EC2 and ECR for orchestrating both the UI, which is ComfyUI, and also for running and doing inference on Stable Diffusion. That's needed for two purposes. The first one is that if you want to fine-tune Stable Diffusion, you'll need an EC2 GPU-provided instance. And the other reason is that the Stable Diffusion that you will find on Bedrock does not have ControlNet. So if you want to use ControlNet, you'll have to deploy it yourself on EC2 instances. Of course, we have Cognito for authentication. Everything is nicely integrated with our IAM Active Directory, and everything is modular, scalable, brand agnostic, and it's really easy to scale it across brands at VF.

Thumbnail 1020

Thumbnail 1070

Reimagining the Go-to-Market Process: Cutting 60 Days from Production Timeline

All right. Here is the whole process. So before AI, it took 120 days to go from line lock to having your PDF catalog ready to share with your buyers. Those are the steps that you have to go through. So you have to do line lock, wait for line lock, and then you have to build your prototypes, have your prototypes shipped to the photo studio, do two rounds of studio photography and post-production, and then there is pagination. Everything in here takes at least 120 days. And this process was really expensive, constrained design flexibility because every change to the products after prototype building was basically impossible because we would have had to rebuild the prototype and shoot it again, and also it slowed down a lot our go-to-market process.

Thumbnail 1080

This is the new process with AI. Right after line lock, we can start generating realistic renders already. And then we will go through a feedback loop in which we have merchandisers and designers provide feedback on the renders that we do. And based on this feedback, we will adjust them if needed. And then in a month, we are ready to go to pagination. And in six days, the catalog is ready. So this whole process condenses from 120 days to 60 days, and a major bottleneck in our go-to-market calendar simply disappears.

Thumbnail 1110

Thumbnail 1120

Thumbnail 1130

Thumbnail 1140

This is a quick explanatory video, which is not starting. OK, which is just to show you the before and after and how realistic you can actually get with your renders. And the one that you see here is the Holiday 25 B2B catalog, which was also the first one that we did fully with AI. So everything that you see here is AI, and by the way, everything you see there is in stores right now. So if you like something, yeah.

Thumbnail 1150

And I'm showing you this Holiday 25 catalog, which is not even one of the best that we have done. I'm showing you this one because it's the only one I can share with you. But after this, we went on to do the Spring 26 catalog just for EMEIA. Global Advanced Global liked it very much and decided to adopt this capability for the Fall 25 catalog, which is the one that we have just wrapped up. And to give you a sense of how the brand is liking this, for Fall 26, they actually canceled the photo studio, so they were faithful to AI 100%. And now we are already working on the Altitude 26 and on the Spring 27, so this has become standard operating procedure for Vans.

Thumbnail 1200

Automating PDP Copy Generation: From Agency Dependence to AI-Powered Efficiency

Now let's go over to PDP copy. Again, same goal actually, finding a way to automate this and at the same time maintaining brand identity and tone of voice.

While also customizing this across our distribution channels. Before AI, we faced several challenges. The biggest one was how heavily we relied on copywriting agencies because copywriting was done externally. Copywriting agencies were extremely expensive, and sometimes our timelines didn't align with theirs. This meant that we had delays in getting our copy back, which meant delays in launching a product on the website.

Copy was also inconsistent because what we used to do is that if you changed your guidelines for writing copy, we usually didn't rewrite copy for carryovers. Carryovers just used the copy that they already had, so it was also a bit inconsistent. We also had missed opportunities because ideally, what you would want to do is to differentiate the copy that you use for your own website with respect to the copy that you give to your partner marketplaces. Since writing additional copy meant additional cost, what we did was share our PDP copy with them, which, as you can imagine, is not great for search engine optimization purposes and for sure is not great at optimizing traffic.

Thumbnail 1310

This is how we do copywriting with AI. The inputs are product images, as many as you have to reduce hallucinations and the model inferring what it cannot see. Then we have product metadata, like everything that we said before: compositions, materials, everything. We have the brand's advanced brand book, which is where it's written how you should talk, how you should address what the advanced brand means. The most important piece is a carefully engineered prompt. The result is what you see on the right: copy that is on-brand, accurate, SEO-friendly, and fully channel optimized.

Thumbnail 1350

This slide shows the full journey of how we generate product copy at scale. Step one is prompt engineering, and this is the most important one. If you don't get this right, everything downstream will suffer. Even if most people say that prompt engineering is dead and LLMs are smart enough that you don't need to do that anymore, this is certainly not the case for copywriting for many reasons.

Reason number one is that when you do copywriting and you want to do it at scale, you will not chat back and forth with an LLM until you get what you want. You need to do this one shot with no human supervision. Getting things consistent requires a lot of prompt engineering because you need to detail exactly how you want your copy to be structured: formatting, branding, tone of voice, and those things. The other thing that you need to do with prompt engineering is mitigate a bit AI's tendency to over-please, which in this context translates to overselling.

If you don't carefully do prompt engineering on this, you will get 1,000 PDPs in which in every one it's written that, oh my God, this is the best product ever, what you have been waiting for your whole life. It's not always like that, and it cannot work. You need to be careful of that, otherwise it will not work.

Step number two is generation at scale. The tool that we've built is integrated already with our image repository, which is Cloudinary, and also with our product databases. All a copywriter in a brand in advanced needs to do to do copy for the full season is just provide a list of products for which you need copy, and then the tool will do everything else with just one click.

Step number three is localization, and this is done fully with AI again. This is what allows us to maintain accuracy and brand tone of voice while at the same time adapting to regional nuances in how you address your customers. Step number four, which is as important as step number one, is proofreading. Here, it's humans that do proofreading because copy is consumer-facing and it's on your website, so you want to be sure that everything is as it should be. This is the last

quality gate before going live, and you need to get this right. So we are still using humans to proofread our copy.

Thumbnail 1520

Thumbnail 1540

Architecturally, the copy pipeline mirrors almost exactly the catalog one, same as the blessed be born, same security, same scalability, heavily relying on Bedrock. This is what we use. This is the process, how it changed. Before AI, writing copy for a batch of 100 products took 15 days because you had to brief your agency. You would have to wait for them to write the copy and get it back and approve it, and send the English version back and have it translated, and this took two weeks just for 100 products.

Thumbnail 1560

Measuring Impact and Looking Ahead: 70% Cost Reduction and Beyond

This is the new process with AI. From day one, you instantly have all your copy written so that you just need to do proofreading, and then you will instantly get translations, and then you will proofread translation as well, and then you're ready to upload on your website. And this is a 50% reduction in time as well. Now let's take a look at the measure and impact that we have been able to generate with this.

Thumbnail 1590

Here are the headline numbers, so minus 70% cost, minus 50% time on B2B catalogs, minus 80% cost, minus 50% time on PDP copy. And these workflows, as I said, are already standard operating procedure. But what I want to focus on is not just about saving time and money, it's also improving on quality and unlocking new possibilities.

For B2B catalogs, for example, we went from having apparel done in 2D sketches to having realistic renders for that. We went from not being able to change our products after line lock to being able to change them up until the end, right before the catalog is paginated. We went from a very slow process, which meant that sometimes we had to go to our biggest customers, which are the first ones you go to in your open to buy process, with an incomplete catalog because the second photo shooting wasn't done yet. Now we are able to go with a full, complete catalog even before what we used to do before.

And for PDP copy, we have seen how we are now able to dynamically adapt to changing guidelines in PDP copywriting. We have seen how we are now able to differentiate across markets, and it also turns out that AI is better than humans at following guidelines, so even better quality. So it's not just about money and time, it's also about quality and having new possibilities. And if you ask me, this is really the AI superpower. Before AI, cheaper, faster, better wasn't really a thing, but now, as we've seen in many sessions here at the event, it is a thing and it's thanks to AI.

Thumbnail 1720

I'll go super quickly on what failed and what's next. So what failed? At the beginning, for sure, visual fidelity wasn't there. We had to work a lot on finding the right model and the right process to get consistency in our outputs. And if you had to start doing this today, it's going to be much faster for you and much better. But I firmly believe that having started doing this a year and a half ago positioned us in a very nice place where now we can capitalize on the process change that we have already implemented and be ready to use those new models which are really, really good, and we are ready to use them and to create value with them.

Copy governance and proofreading, as I said, we had to change the way in which we do business with copywriting agencies because they're not generating content anymore. They're just proofreading it. And change management actually wasn't even that bad for us because for both use cases you've seen today, external capacity from VF was already doing those things. So convincing people to adopt it internally was actually quite easy, and the quality that we have been able to generate convinced everyone right away.

Thumbnail 1810

What's on the roadmap for us? For sure, expanding this to all the other brands in the portfolio. We're already doing this. We'll keep exploring new AI models. It's not just new AI models, it's also new use cases. Video models are a big thing today. I think conversational editing is particularly exciting. These are models in which you can upload a picture and you can chat with the picture and say, fix this, fix that, which is great if you want to turn your selfie Miyazaki style, but it turns out it's also great for doing post-production and a new way of doing Photoshop on images.

Virtual try-on is another area we're exploring to create lookbooks and to let people know how products fit. So not just doing flat, off-model B2B catalogs anymore, but also having full looks so that people can know how products are supposed to fit and how to pair them together. And then, of course, doing agentic AI. So both an agentic framework for doing this go-to-market content generation process end to end, but also using agents to validate outputs of intermediate steps so that we can remove some heavy lifting for our merchandisers for validating images.

So this was our story, a story of how a heritage company, when empowered by the right technologies and the right partners, can innovate at the speed of a startup. And for us, AWS has been instrumental in this journey, so thank you, AWS and thank you all for being here. Thank you, David.

Thumbnail 1960

Fanatics Collectibles: Leveraging AI for Trading Card Content Creation

Good afternoon, everyone. I'm Celia Chen. I lead the data science and generative AI efforts from Fanatics Collectibles. It's an honor to be here at AWS re:Invent to share with all of you how we leverage generative AI through AWS Bedrock to transform the process of building trading card content. So for those of you who are unfamiliar with Fanatics Collectibles, we are a trading cards and collectibles company. We produce trading cards for major sports and entertainment properties, such as MLB, NBA, UFC, and others, primarily through our iconic Topps brand. So if any of you here has ever collected baseball cards or opened a pack, there's a good chance that you've already come across our products and brands.

As you can see on the screen, here is an example of the trading cards. If you have never seen one, on the front, you see every card captures a moment, a highlight, a player storyline through the image on the front. And some of them even feature a premium element. It could be an autograph from the player, or a piece of jersey that has been worn in a very important game, or even fragments of the bases and bats just as we show in this one. And on the back of the card, that's where the story lives. That's where we recap memorable moments. That's where we acknowledge a season milestone. That could be the place we share details from this player to deepen that connection between our collectors and their players.

Thumbnail 2040

So today, we want to show you how we leverage Bedrock to actually transform, enhance, and automate this card back copywriting process. So here is our plan for today. First of all, I will give you more context about the challenge we're facing in terms of producing the card back copy. And then we didn't just go build a full-on production-ready solution. It's always smart to test the validity with minimal effort first. So that's why we designed a POC experiment, and I will share the results with you. And luckily, the POC went well. That's why we went ahead and built this full solution through AWS Bedrock, but it wasn't a straightforward process. So we had quite a few learnings we'd like to share with you today. And finally, we want to share what's the biggest impact we have observed through this solution.

Thumbnail 2090

So, first of all, actually building and writing the trading card back copy is very time intensive. It takes our editors weeks to do the latest player research, keep up and read all the game summaries, and craft that unique narrative for each and every single player.

And that's just a starting point, because we also work with very strict licenses and agreements with our licensing partners. They have this gigantic, complex, and always evolving rulebook that we have to play by. Our sports rosters also change constantly, so there are always last-minute additions, deletions, or just last-minute requests for changes to players, which adds enormous pressure to our editorial team who are already under enormous pressure from our tight printing timeline.

Each and every card copy needs to go through multiple quality assurance rounds. It goes through our internal quality assurance team, and then it has to go through our licensing partners. That means every single little mistake may result in one more week of iteration. This is just a lot of pain for our tight timeline. So that's why we saw an opportunity. Is it possible that we leverage generative AI to help us automate and enhance this research, check the compliance, and reduce our iterative cycle?

Thumbnail 2180

That's why we went ahead and built a proof of concept first. We wanted to answer a critical question: can the large language model or generative AI produce a card back copy which the quality is actually to the satisfaction of our most important collectors? So we designed an experiment. We took a set of players, and for each player, we created a pair of copies. One was written by a human writer, the other one was generated by the LLM. Then we showed these pairs to our collectors in a focus group and asked for their feedback.

Let me show you one of the examples. On the screen, you can see there are two copies. This is one of the pairs we used in the experiment. Can you take a moment, take a look, and guess which one is actually written by a human and which one is written by LLM? Anyone in the audience willing to share why they chose one or the other? Let me give you the mic.

I was saying that the second one has too much expressive language. I would say that's why I think that one's the AI one, right? This one on the right-hand side, right? That's the one you think is AI. This one is the human writing one. Thank you, sir, and you are absolutely spot on.

Thumbnail 2270

So on the left-hand side is actually written by a human. That's exactly one of the tells we found. LLM tends to use those dramatic big words. If you do this over and over again on sets of players, they start using the same big words and also show a repetitive sentence pattern, which is not great. Another thing I found is if you read the left-hand human-written copy, it says "the next night." Humans naturally have a great sense of chronological order, but actually we saw LLM has been struggling with having a sense of timing to get the timing right. We will talk more about how we address these shortcomings in the solution we implemented.

Thumbnail 2320

Our proof of concept results were actually very compelling. We found our collectors had a very high acceptance rate according to our AI-generated copy, and there was no significant preference between the AI-generated copy and the human-written one. More importantly, it seems like our collectors were satisfied with both of the copies. This really gave us the confidence to go ahead and start building the full solution in AWS Bedrock.

Thumbnail 2350

Building the Solution: Player Research Workflow and Quality Assurance Agents

The first workflow we intended to automate through this exercise is how we want to enhance this player research workflow. The first big challenge we face is we need to combine the accurate, precise player stats with those free speech qualitative narratives of each and every single player. We also need to identify what are the relevant, compelling, or flattering stats for each of the players from the vast amount of information for each of them, both from the stats and from the articles.

That's why we built two parallel systems to address this. On the one hand, we have a structured database to store the player stats with intelligent stats selection built in. On the other hand, we also have a free-form web search agent who is ready to complement this research whenever additional information is needed.

So let me give you more details on this. First of all, we have a player stats knowledge base. We built a data pipeline to pull and refine the player stats from the MLB official database. That's how we get completely accurate player stats. However, this is not good enough if we just have the raw data, because if you watch baseball, you know every player has dozens of stats. We cannot just include all of them. That's why we built a ranking system to find what is the most flattering and relevant stats for each of the players, given their position and their category, to be included in our card copy.

On the other side, we also have a web search agent. This is how it works. Whenever the writer agents initialize the overall workflow, it will make a decision. After looking at the knowledge the writer agent possesses and also checking the player stats, if there's some more information needed, it will invoke a Lambda function to call the web search API to kick off the additional research part. Then the additional research will be returned with the results and the reference URL for the writer agents to use in conducting the further writing process.

Thumbnail 2510

Here, the takeaway I want to bring up is, I know RAG is what everybody is talking about. But I believe if you are looking for your agent to be able to provide information like numbers or you need it to be 100% accurate, I think a structured database is still your best bet in terms of that use case.

Then we went ahead and implemented the quality assurance agent. Remember those complex compliance documents I mentioned earlier? This is where we tackle that problem. AWS Bedrock makes it very straightforward for you to integrate the knowledge base. That's what we initially did. We just loaded that gigantic documentation into the knowledge base, but quickly we discovered we had a problem because the quality assurance agents constantly missed a lot of important rules, even though those rules had already been presented in the knowledge base.

I know these days our LLMs are growing to have larger and larger context windows, but I still believe proactively managing your context window by only providing the most relevant information and reducing the noise as much as possible as a prompt to feed into your LLM is still one of the best ways to improve your LLM's performance. That's why we took that gigantic document implementation and started breaking them down into different sections and topics. We conducted a similar exercise to the card copy we write from the writer agent. We divided them into player names, the teams they belong to, the narrative around the stats, the highlight story, and so on, and then matched those sections up and processed them systematically, instead of just processing everything all at once. The result is we really had a significant improvement on our catch rate of our quality assurance agent.

This is not where we stopped, because I know LLM and generative AI has unlocked so much possibility for us, but I still believe in the fundamental principle that we should always use the simplest solution for a problem. That's why we also implemented traditional NLP methods, specifically a progressive word tracking system to track the usage of words across the card set. The system we built doesn't have the visibility of other card copies generated in a set of cards, so to avoid repetition and to encourage the variety and creativity of card copywriting, we have NLP methods to track what keywords have been used and to avoid repeatedly using those words.

Thumbnail 2680

To further produce engaging, creative, and various contents, we also randomly select a few relevant historical card copies to include in our prompt as an inspiration for our agent. This way the agent will have some great examples we have in the history to emulate.

Implementation Journey and Business Impact: From POC to Production at Scale

When we actually started this project, AWS Bedrock was new to our team. The fastest way to learn a new service and technology is to use the console. So we did the POC entirely just in the GUI, no CDK, no deployment, just hands-on experimentation. I don't think this is a corner-cutting move. I think it really reduced a lot of the friction at the beginning.

Now we can just focus on testing the prompts and comparing the model performance, understanding the system behavior within hours instead of days. For a brand new service, this fast feedback loop is incredibly valuable when you are doing exploration and learning. Once the POC worked and we've already determined the workflow, this is the time we move to the AWS CDK and get on the full deployment and enjoy all the benefits AWS CDK shares with us, like repeatable deployment, multiple environment support, and also the scalability to other sports and brands.

We started the efforts from baseball, but we also have products in soccer, in basketball, and other entertainment properties we want to scale towards. We don't want to implement a new solution from scratch every time because each of these properties have slightly different requirements. But in general, their overall architecture stays very much the same. That's why we built a template from the AWS CDK. Now we can just deploy and expand our efforts to new licenses and sports with minimal effort.

Thumbnail 2800

The takeaway here is don't be afraid to start anything from the GUI. I think that's a great place for you to start exploration and learning, and CDK is a place where you are ready to scale it up. Another challenge we face as a data science team for this project is after we wrap it up and build a nicely working backend solution for the business, how do we make it accessible to our business stakeholders? For a lot of data science teams, we probably don't possess the skill sets or expertise in terms of front-end engineering or UI/UX design.

Our choice was to actually build this clean and functional interface in Streamlit purely in Python. But then we quickly realized we have another issue because when people are writing the copy, they don't just write one. They will submit a set of cards like hundreds of players and want to get the results altogether, which results in us always having long-running jobs for our UI to wait and therefore to retrieve the results. The browser can time out, your laptop can fall asleep, your network may lose the connection, so this type of architecture is not stable for long-running jobs.

That's why we also changed the design to asynchronous. Now the UI, instead of waiting for the job to be finished, will just use the UI as a place to submit the job. Then the worker will process all the work and the requests independently, and the UI will just check and monitor the progress and status of the job and display them progressively whenever the user checks back in. In this way, we finally built an application which is stable, scalable, and user-friendly for our business stakeholders without the full front-end application needed.

Thumbnail 2910

Now let me tie all the components together for you to see how this solution actually works together. It starts with several predefined specifications. You need to provide a player list, what's the theme we're writing about, and any additional requirements. We do have a template for the initial prompt, but the editors can edit and add more specifications or requirements through the prompt.

This will trigger our writer supervisor agent, which is powered by Claude Opus 4, which orchestrates the entire process. The writer agent starts analyzing all the requirements. The first thing it does is it goes to the MLB historical copies knowledge base to retrieve a random set of the relevant historical, great human-written copy as examples. Then it will go to the player stats database to retrieve all the flattering, compelling stats of this certain player.

Then the writer agent needs to make a decision. Given the requirements and all the information I possess and I collected from all these agents, is it enough for me to start writing? If the answer is no, then the writer will send further very detailed instructions to our web search agents to ask them to conduct further deep research on certain topics and then return back with more results.

Once the writer agent is happy with all the material it has already gathered, then it will generate the initial draft copy. And this draft copy will be sent to our QA agent.

So the QA agent will do all the goodness we talked about, right, to check whether the character limit has been fulfilled, how are we doing on adhering to the compliance book, what about the word and sentence pattern, is it overusing a certain pattern. And the QA agent has certain capability to edit and modify to a certain degree. But if the QA agent decides, okay, this is just not up to the standard I wanted to look at, it will send it back to the writer agent with the feedback. So the writer agent will kick off with the revision for another round.

Thumbnail 3050

The whole solution has been proved to be very valuable to our business. First thing we saw is we've already received 40% fewer edits from our QA team. So this is very important because we found the QA agent is more consistently following the rules than the human can, and every little error we caught in this process and reduce means we also reduce the cycle of revision, which also contributes to the time saving for the entire production. And also the entire system is just so much faster than the weeks of manual process we used to adopt. And this is also a game changer for handling the last minute requests and changes to reduce the pressure from our editorial team.

And also we see a significant product creation saving from this solution because AI generated carbon copy just costs minimal. But beyond the cost saving, what this really means is now our editorial team have the time to focus on the high value creative works, like crafting some premium contents for special edition of cards, brainstorming what are the other innovative products we can provide, et cetera. And that's also where I see the generative AI really can bring value in. It can take over, release us from all the mechanical repetitive tasks that probably we're not enjoying that much doing, and then let us get back to focus on tasks which require creativity and humanity. So we can all be the artists we always have been and just didn't have the time for. Thank you for your time, back to Shiva.

Thank you, Celia, and thank you, David. I found this really compelling. I hope you did too. I think it's remarkable to see the consumer AI that you and I use every day, how it's now starting to make its way to enterprise AI, right? I think the themes that I mentioned at the beginning, which is, you know, AI moving from sort of peripheral to core retail functions, how AI is now, you can reimagine the entire business process using AI. I think you heard those themes really come out in these two examples, and I really like how they just not shared the end result but the journey, because I think what's hopefully the most valuable thing for you all to take it back to your business is the learnings, and you don't have to go through the same process, you can anticipate and get ahead of that.

Thumbnail 3230

And for that I just want to leave you with this mental model of how we think of AI transformation in the retail business. It may look like a sequential journey, of course, it's anything but sequential, right? I think the transformation is a function of the organizational maturity, the complexity of the business process that you're trying to transform, and the business sponsorship. So in some cases it may be important to start small, show the pilot results and scale, whereas in some other cases you may want to really step back and reimagine that entire business process with AI as a team member.

Thumbnail 3310

And then I think I want to leave with this last thought. We all have this question, you know, if AI takes over what happens to the human roles. I truly believe it's about changing the human role from doing things better to doing better things. So I know we have a few minutes left and I saw a few hands being raised, so we are going to be available to the side of the podium. So please come up and happy to answer questions, and please do take a moment to fill out the survey in the app if the session was valuable. That's really important for us to bring similar sessions at future events including re:Invent. Thank you so much for your time. Enjoy the rest of the re:Invent and replay.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)