Kazuya

Posted on Dec 6, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Beyond the Hype: Delivering Measurable ROI with Generative AI on AWS (AIM231)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Beyond the Hype: Delivering Measurable ROI with Generative AI on AWS (AIM231)

In this video, a speaker with 20+ years of industry experience discusses Generative AI implementation challenges and best practices. They address a headline claiming 95% of GenAI projects failed to show measurable ROI, clarifying that 40% still reached production and 67% succeeded with expert help. Key issues include choosing poor use cases and lacking internal expertise. The speaker traces GenAI history from 1906 to present, emphasizing that ChatGPT's 2023 breakthrough was simply making models accessible via a website. Major topics include chatbots as primary interfaces, image generation limitations, hallucination rates (40-45% for GPT-5, dropping to 6% with internet connection), and the importance of RAG databases. The speaker cautions against overusing agents when prompts suffice, noting agents are slower and more expensive. Real-world use cases are shared, including IDP solutions for insurance underwriters that reduced processing time by 50%, and IVR systems using Nova Micro, Connect, Lex, and Polly. The company has built 175+ GenAI projects over two years.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Reality Behind Generative AI ROI: Separating Hype from Implementation Success

Good morning. I appreciate the warm welcome. Can you all hear me out there? I'm excited to be here today to talk about Generative AI. Let me tell you a bit about myself. I've been in this industry for over 20 years, which means I've watched the terminology evolve significantly. We used to talk about NLP much more prominently, and now it's all Generative AI. Everyone might remember discussions about image registration versus computer vision and all those related topics. I've been working on this for a long time, helping companies develop these capabilities.

We work with AWS and have won numerous awards. We've been runners-up in Generative AI consulting for the last two years and have received other accolades as well. Our mission is to help you succeed. We've built over 175 Generative AI projects over the last two years. We were the first SI partner to get access to Bedrock, so we've been building on Bedrock for 2.5 years. If you have questions, please feel free to find us at our booth right behind this in the Dev Center. I'm excited to continue the conversation we start today.

How many of you saw this article? How many of you actually read any of these articles? That's better than usual. I love headlines nowadays because they're often clickbait, designed to get people to read them. But what did the article actually say? It did say that 95% did not see measurable ROI, but what does measurable ROI actually mean? It doesn't mean they didn't see ROI. They just didn't see something they felt had a big return, like a 2 or 3X return. It might have only had a 1.1X return, but they still saw value.

What's interesting is that 40% of them did put this into deployment. So even though the article said only 5%, you still saw 40% put into production. The number I really like is that 67% of those deployments were done by companies like Mission that were able to come in and help companies build solutions because we have the expertise to help you. One of the issues the article highlighted was that people chose bad use cases. That's the one thing you should always be looking at: what use case am I solving? AI isn't the problem most of the time. AI does what you tell it, but you might be choosing a bad problem to solve. You might be looking at something like building images for your marketing team, and then you have a hard time quantifying the value because you're only saving a little bit of money.

People who were looking at better use cases, automating routine tasks that weren't particularly interesting, were able to see much more ROI. Having internal experts is important, but it's not sufficient. You also need to be working with people who understand the technology. You've probably seen this over and over, especially with coding tools: this is not just a technology change, it's a cultural change. How many of you have started to look at Cursor as a developing platform? A couple of people. Cursor is really pushing on spec-driven coding. Most of the time when you're coding, people just start building their application without thinking through all the steps. But now as you start using these coding agents, you have to think about what each step needs to be. You need to look at it really early, so your developers need to start thinking more like product managers. You're going to see this real cultural change that has to happen inside these organizations.

Let me give you a little summary. We're going to talk about who we are, which I've already mentioned. We do a lot of AWS work with tons of competencies across all the different spaces. We are a one-stop shop where we provide costs, resell, 24/7 support, and professional services. If you're looking for any of those, feel free to come and talk to us. Now I'd like to talk a bit about Generative AI. Who thinks Generative AI is new? You can see from my slide that I don't think it's new. It really dates back to 1906. Who thought it went back over 100 years? Nobody thought it went back that far. That was probabilistic text generation in 1906, using handwritten algorithms to figure out what text should come next in a sentence. Then you had a lot of early concepts in the 1950s with rule-based systems. You started to see early neural networks, but then we had the AI winter. In the 1980s, everyone had personal computers and you started to see clusters in universities, so neural networks were pushed further and further again. Then in the early 2010s, research on GANs started, and a lot of it was really enabled because the cloud had arrived. Now we had essentially infinite compute and storage.

That's really where you started to see all of these innovations emerge. In the early 2020s, we were doing GPT-1 stuff, and then GPT-2, with various things being done. Now everything is about agents. I used image generation to create a pretty chart for the timeline. Does the timeline really work? Did anyone notice that great timeline? I then tried to prompt it better, thinking I could improve it, but I actually made it worse. How many of us love prompting? I really like a good timeline, but I have to say, a timeline that has the 1500s mixed in with the 1900s is a really powerful tool here. So I'd like to ask this question: What innovation happened in 2023 that made everyone excited about GenAI? Anyone want to shout it out? ChatGPT. What was ChatGPT? Anybody? ChatGPT was someone made a website. That's it. These models existed before, but they were only accessed by people like myself, data scientists who would spin them up in SageMaker, or maybe you were using Hugging Face at the time. But now you have a website that fronts a model that you can talk to. They made it intuitive. It's cool, it works. The innovation, when you really think about it, is that someone just made it easier to access a model. That's the whole craze that started—it's now easy for anyone to think about it, look at the ideas, and understand where you can go from there.

For us, we have our own chatbot. Everyone probably has a chatbot now. You can make them super easy. Bedrock even has OpenSearch, and you can create a chatbot and ask all kinds of fun questions to it. I have a little joke going on with my own chatbot. But why is it a chatbot? It's really because that's the easiest way to do back and forth conversations. You're really looking at how to simplify the interactions, because everyone wants to be able to talk to it and get information out. Now what we've noticed is that people confuse everything with a chatbot because they want to talk to things. They may want to have traditional machine learning models. You still may want to do clustering or prescriptive analytics. But all that people want to say is, "I just want to talk to my data" or "I want to do this." Everything gets buried behind a chatbot. So even though someone's coming to you and asking about GenAI, often it's just an interface. Your chatbots are often just your interface, and now chatbots are going to kickstart agentic workflows. The other thing we see a lot of is GenBI. How many of you have used QuickSight Q or QuickSuite or any of those tools? A couple of people. They're really nice, but they do have limitations. People are using generative systems with cool React libraries to populate all of your graphs, and then you really get into pushing an LLM to write code. There's a lot of really fun things there, but at the end of the day, everyone is looking at how they can just talk to the system. That's why chatbots have become our main interface.

For a minute, I was playing with images, as you can see. I asked for a monkey astronaut riding a bike on the moon. How did we do? Well, it did mostly good. There is a moon in the background if anyone didn't catch that. Early on, it was really bad at letters, but it actually works pretty well with letters now. Then we get into the bad. Anybody have a guess on what I wanted the system to do? Any guesses? Tower Bridge, good guess, but I asked for an AWS architecture diagram of Glue connected to Redshift connected to QuickSight. How did we do? Did we do good? Did we do bad? This woman is very smart. She says we did good, and we did. I asked for an architecture diagram, and it made buildings because I said architecture. I said Glue, and it glued them together. How many people know what Redshift is? Sunset, right? It made a sunset. It did exactly what I asked it to do. It just didn't understand that I needed a diagram with boxes of SageMaker and things like that attached to it. But it did exactly what I asked. That's one of the bad parts about GenAI—it will always give you an answer. It's not necessarily the answer you want or need, but it will give you an answer.

And so you as the individual have to be looking at the answers and you have to come and understand what am I trying to do. Now the ugly. We were doing an event in July and we wanted Rudolph the Red-Nosed Reindeer. Now these are some older Genie images, but I use them for an example because the new ones will create Rudolph the Red-Nosed Reindeer, but before they wouldn't. And we went through a lot of prompt iterations. There's like 30 pictures in the series, so I'll only show a few. And I was like, give me a reindeer with a red nose and nothing. Then I was trying to go closer and it was like, give me a reindeer with a clown nose, and it gave me a balloon with a clown balloon. So it just didn't really get it right.

Now, I was giving this talk a while back and someone goes, do you know what the biggest problem is with your images? And I said, no. I'm sitting there thinking, hey, I'm going to win because it's not Rudolph the Red-Nosed Reindeer, right? And so this guy's like, do you know the real issue? And I don't know how many of you guys know, but this is a white-tailed buck deer. It's not even a reindeer. So if you're asking questions to GenAI and you don't really know the answer or what you're expecting, you can't even validate that it's giving you the right answer because to me I was like, oh yeah, that looks like a reindeer, it just doesn't look like Rudolph.

So it's really important as you're building these systems that you can understand how do I validate it? How do I prove out that it is giving me the answers I want? How am I going to write unit tests? How am I going to make cases that just validate that yes, the answer is coming out, especially when you're going to deliver a system to people that may not know what the answer should be because it will hallucinate. How many of you guys get inundated with all the news, right? Even last week was crazy, right? There are like 10 new announcements. Everything's giving you whiplash.

The advice we often give to people is choose a model that works for your particular use case, and each use case may have a slightly different model that you're using, but choose one family, and then you just keep using that family because you're going to gain expertise. They're all like coding languages, right? You don't want your coders to be switching from Python to .NET to everything because they're just not going to be as adept at getting the answers as if they just always are working in the same family and every model's always leapfrogging the next, right? So wait two weeks and whatever your favorite model is, is going to be the top model on the leaderboards. So you don't have to always do that model hopping.

The other thing that's always kind of interesting to me is, did you guys read any of the reports when OpenAI launched GPT-5? Anybody read the reports? So OpenAI said that GPT-5 hallucinates 40 to 45 percent of the time. Anybody know that? Hallucinations are astronomical in these systems. Now, if you connect GPT-5 to the internet, which is why OpenAI has GPT-5 connected to the internet, the hallucination rate drops to 6 percent. And so that's why. If you're building a model inside of AWS, you need to be either attaching it to the internet or putting it with a RAG database so that you can push your hallucinations down, right? If you're not doing that, you will be looking at a high hallucination rate depending on the questions you're asking. So it's really important to think about that when you're building out these systems.

From GenAI Agents to Real-World Use Cases: IDP, Chatbots, and Practical Applications

GenAI agents, is everyone tired of GenAI agents? No? Wow, I'm tired of GenAI agents. So agents are fun, right? It's going to go do everything. I've got an agent there. He's my group leader. He's got all my other sub agents. I'm going to send them out. I'm going to do all kinds of cool things. We're going to have some autonomous goal achievement, and we're going to decide how to answer all of our questions, right?

Now when you look at agents, it's really important to understand your use case because I feel, especially, you know, most engineers I talked to have just gotten lazy. They're drawing a diagram that's just like agent to do this, an agent to do that, an agent to do this, right? They're not even really putting in the thought like what does each of those agents do? Is it actually an agent or is it a prompt? My team, there's a lot of people and we have conversations all the time. Is that a prompt or is it an agent? And there's a lot of things between the two of those, right?

Like if I go to an agent, I have to build the framework, although there's agent core right now to do the framework to make it easier. But you still have, when you're doing agents, delays in your time. So if you have time critical matters, often you're going to want to just run with prompts versus agents because you're going to get the answer back faster through just a prompt and most of the time you can prompt most of the things you're trying to do that an agent can do. And so that's things to look at is what's my use case? Do I really need an agent? Agents are also potentially going to be more expensive and they take longer.

You really have to think about what your system is and whether you're truly doing something that's an agent or not an agent. We've done a lot of use cases, as I mentioned—tons of cool stuff. I won't get to talk about too many of these, but we've done everything. I haven't seen anything new in at least six months. So if you guys have use cases you want to ask about—whether something would be possible or not—we're here. We're right on this side of the CrowdStrike booth. The biggest use cases we see in the market right now are IDP, about 60% of the market's doing IDP stuff, which is a really cool space. Then you've got chatbots, as I mentioned. But then there's everything else: code generation, recruiting, drug discovery. There are cool things you can do with chatbots where you can do training—educational training or other training of your employees. If they're going the wrong route, you can iteratively change how the chatbot responds.

We had that built in where we were working with a company that does medical training for new doctors. If the doctor was going down the wrong path of questioning, the agent would then get agitated, and the patient would then get agitated and start saying things like, "Hey, you've asked me that already." So there are really cool things you can do. Document translation has been big, with all kinds of cool stuff. If you really want to learn about chatbots or have us build you a chatbot, we have a lot of really cool fast-track packages that we can help you guys build. These are things we've done day in and day out. They're really cool and quick. You'll see a lot of this week—something to look at and talk about.

I'll skip through IDP for a second. This was a really fun use case for IDP. We're working with an insurance company, and their underwriters get applications all the time. The underwriters were spending four to five hours on the system. This company gets ten to fifteen thousand applications, and they were growing. They just got bought by a bigger insurance company, so they can't hire fast enough. That's one of those use cases where you look at how much manual work there is and how you're going to solve that problem. If it's mostly hiring and training somebody, then you can probably automate it. That's a system to look at and automate. You can build out IDP really cool, and then you can put everything into a database and ask questions to it. Now the underwriter gets results, and if he asks questions where he wants further answers, he can then ask about the application to do his work. It was really neat. We were able to take their time from four to five hours down to two hours, saving them 50%. There are massive savings out there when you look at these chatbots. This was just the first phase. We're looking at how we can help continue to improve this to get it down even further.

Another use case is retrieval augmented generation. I'm sure everyone knows retrieval augmented generation. Here's a pretty standard architecture: some kind of interface connected to an LLM connected to your agents at the end of the day. We were working with a company, and this is where we get into all those timing bits. This is an IVR system. You have to think about somebody on the phone calling and talking to your bot, so it has to be fast. I don't know if you guys have used Nova, but Nova Micro is actually pretty awesome. You can sort through really quickly with Nova Micro on the inference side and get to an answer. This was the architecture for an IVR system. We used Connect, which we all love, and Connect Lex Polly to do the text-to-voice. There are a lot of really cool things going on. I have way too many slides, but we do lots of cool stuff. We could have talked about agents and cool things.

On agents, I'm giving a talk with AWS in about two weeks about agents. Anyone that was here should look for a talk about agents. It's a pretty cool use case with lots of fun architecture at the end. Agent Core is something we work with, and I have way too many slides. If you want to work together, you can find us in the booth in the back. I put out a newsletter every week where we talk about fun, cool stuff. If anyone wants to hear my random thoughts, that's how you can hear it. I appreciate you guys all attending today, and they're going to give me the hook in nineteen seconds. If anyone has a question, I'll be over at our booth for the next thirty-ish minutes and happy to talk through anything you guys want. Thank you, everyone.

; This article is entirely auto-generated using Amazon Bedrock.