DEV Community

Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - A leader's guide to AI-powered FinOps (SNR306)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - A leader's guide to AI-powered FinOps (SNR306)

In this video, Katherine Graham and Chris Hennessy from AWS explore both AI for FinOps and FinOps for AI. They share insights from conversations with over a dozen company leaders, revealing that only 10% have scaled AI for FinOps. Key themes include building strong data foundations, buying proven solutions, and keeping humans in the loop for validation. They present case studies showing 20x and 100x increases in user adoption through chatbot interfaces and Slack integration. The speakers emphasize that traditional cloud cost planning will leave organizations 3-5x short for AI budgets, as hidden costs like data preparation consume 30-40% of resources. They introduce a GPU utilization framework using a 2x2 matrix of usage versus criticality, and discuss ROI measurement challenges including the productivity paradox and baseline absence. The presentation concludes with a crawl-walk-run maturity model spanning base camp (foundation), climb (enablement), and summit (automation) phases for both AI for FinOps and FinOps for AI implementations.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: Navigating AI Costs Through Personal Experience and Universal Challenges

In the last year, I've been approached by people asking about their AI costs that I wasn't prepared to answer. For some of you in the audience, this might be a CFO asking about the ROI on model training, maybe an engineer asking for budget approval for GPUs, or a board member asking about the cost of your AI strategy. Whoever is asking and whatever the question is, hopefully you'll be happy to hear you're not alone if you're feeling unprepared.

My name is Katherine Graham. I lead the OPTICS team, which is a FinOps team at AWS. My name's Chris Hennessy, and I'm on the Executive in Residence team. I'm really excited to be here today. Prior to being on that team for the last five years, I was a technology CFO at Capital One for twenty years, so I'm excited to share insights from that experience and from the content we created here today.

Today we're going to talk to you about AI for FinOps and FinOps for AI in the hopes to educate and enable you. In preparation for today's presentation, Chris and I spoke with over a dozen leaders from different companies about their AI and FinOps journeys. In doing so, I shockingly found parallels to my personal life. So I want to start with a quick personal story, and then we'll get started.

Four years ago, my husband and I were planning for our second child, and some of you in the audience may recognize this misplaced confidence, but we were going in thinking, okay, we've done this before. How hard can it be? It can't be that different. Fast forward about two weeks after my son was born, we found out he was deaf. All of a sudden, information is flying at us at a rapid speed. Lots of unknowns, lots of uncertainty. To say we were overwhelmed is an understatement.

But we had people in our corner—subject matter experts, a community—helping us with the transition and helping us make decisions. Why am I sharing this with you? And what on earth does my son being born deaf have to do with AI or FinOps? It was in the conversations that we had with these leaders that we saw not just a parallel into expecting the unexpected or those unknowns that I mentioned, but in the advice that leaders were giving to people just starting this journey.

Thumbnail 160

Thumbnail 170

Thumbnail 190

Thumbnail 200

So I want to share four of those with you, and then we'll dive in. So first, you're not alone. You may feel like you are, but you're not. Second, focus on the real problem. I was focusing on the fact that my son was deaf. That wasn't the problem. The problem was communication. If you can focus on real problems that your customers are having or real pain points your teams are experiencing, you will deliver value.
Enter fullscreen mode Exit fullscreen mode

Third, don't overcomplicate it. FinOps 101—cloud financial management basics—those principles still apply. The variables are different. And then last, never stop learning. I thought we had the proper tools in place for kid number two, and we had a decent foundation. But we still had to learn new and different things. We had to learn a completely different language. And the same thing is true here.

The Perfect Storm: Convergence of Cloud Growth, AI Expansion, and FinOps Evolution

This isn't just parenting advice, but our hope today is that after you leave, if that CFO, board member, or engineer comes up and asks you about AI costs, you'll have a better answer. So let's get started. As we look at the elements of what we're going to cover in terms of the perfect storm, there's a convergence of activity happening. I think all of us see how much cloud growth is occurring. If you look at any cloud provider, it's in that twenty to forty percent per year, but AI growth is at a much larger clip that we're seeing, and that's across all of the ecosystems and all of the players.

Thumbnail 270

The other thing we see as we go through this is that there's a lot that's changing—new nomenclature and dynamics you need to learn as you go through this, specifically on the AI side. So understanding some of the terms, the dynamics, and how that fits into the FinOps practice is something we're going to cover here today. And then lastly, you need to focus on both sides of the equation. This isn't all about cost for cost's sake. It's about the value articulation, and that's an area that we see as a blind spot that came up a lot in our customer conversations.

Thumbnail 290

Thumbnail 300

So today we're going to look at both sides. It's almost two sides of the coin. One is how can you leverage AI to enable FinOps, and then specifically how do you bring FinOps into AI and the capability that exists there. So as we start out, there's this convergence that happens, and there's a good overlap that exists. There are three key areas we saw as we had conversations with customers. The first is on intelligence.

I think a lot of us have leveraged AI in both your personal and professional life as a way to help with interaction and prompting and engaging. It's a way to provide answers and insights to the information that exists. Automation is a big deal within FinOps. I know most of the companies that I meet with have FinOps teams that are usually one to three people, even in large enterprises. It's really trying to scale through the engineering community and help them self-sustain. So automation is a key element of that. And then lastly, it's all about getting scale and democratizing this to the organization.

A lot of the use cases we're going to share that we learned from customers involve finding ways to scale and get more reach and benefit and insight to those who need it inside of the organization. One thing that's important to remember is you don't need all three on day one or even in year one to add value. The importance is remembering that the value lives in the intersection. And that's when AI stops being just an experiment and starts being a business enabler. For FinOps, it stops being about just cost control and starts being about proactive optimization.

So just to set up the next forty-five minutes or so, we're going to start with AI for FinOps. How are we using AI to transform cloud financial management? What are the challenges? How are leaders approaching it? What's actually working? Then we're going to jump to FinOps for AI. How do you budget for something that's constantly changing? How do you determine a return on investment or build a strategy? Then we'll wrap with tips, advice from leaders, and some educational recommendations.

Thumbnail 410

Audience Insights: Where Organizations Stand in Their AI Journey

But before we do that, can everybody get out their phone? I can't believe I'm actually saying that we're going to do a roll the dice analogy on this. So you know, when you bring a survey out, it could either go well or not. So we're hoping it goes well. We'd love to get insight from you all to get perspective from those in the room. So the first question you'll hear and you'll see up there is where is your organization in the AI journey? There are four options. You can only vote once, but we'd love to get your insight, and we're going to share the results here in a minute so everyone can see.

Thumbnail 440

I'm going to keep it up there just for a second, and then there's another one. Where do you think most organizations are in their AI journey? And as you're doing this, I just want to mention that several months ago when Chris and I started planning for this presentation, we decided from the beginning that we wanted this to not just be cloud agnostic, but to reflect the reality of what our customers are going through and to include action items that you all can take as soon as possible. And so that's why we have these conversations, to check our bias and hopefully what we learned and share with you today is helpful.

Thumbnail 480

So let's see if this works. We're going to see if it works. Before I do, I'd love maybe your estimate on the first question if you could go back. I know people are taking pictures. What do you think? What do you think the predominant answer is? I mean, originally, before not to give the answer away, not that there's a right answer, I thought it would be piloting specific use cases. Did you think something different? I thought it was the same as well, but we're going to try the toggle. Here are the answers. Let's see if it works.

Thumbnail 500

Thumbnail 510

All right, almost three hundred of you responded, and it is leaning a little bit more on the lower end of piloting specific use cases takes the majority there. I'm like, smart group. That's what I thought too. All right, you want to go back to the second question? If anybody hasn't answered, feel free to answer that. The fact that it's been smooth is nice. The polls so far, knock on wood. Did the second survey work for everyone? Yes, it did. I see it now. All right, it wasn't on there.

Thumbnail 540

Thumbnail 550

We only have one QR code back to back, so on this one, let's look at the results here. Where do you think most are, and it's in the same wheelhouse, similar vein in that same area, a little bit skewed to the left, yeah. Perfect. And as any good CFO, you'll see throughout the presentation, I love data to help inform information. So as you look at this chart, and again we're specifically focusing on AI for FinOps here on the front end, what do you all think? Any guesses on what this chart may represent? Or like, anyone? So how many organizations are doing AI for FinOps? Yeah, it's a very good estimate, good forecast there.

Thumbnail 580

So as you look at it, of all the companies we talked to, only about ten percent were at scale of using AI for FinOps. All the examples that we're going to talk through in the use cases are very early in the journey. So when you think about that analogy you gave earlier, I think that's where a lot of customers are. I know I meet with hundreds of customers in a year, and a lot of them are looking for use cases and proof points. They want to see and understand where they actually see value for AI's sake.

Thumbnail 620

AI for FinOps: When Not to Use AI and How Leaders Are Building Foundations

Versus maybe just feeling like they have to do AI in everything, which I know is what a lot of customers feel. So the first topic I'm going to talk about is when not to use AI, and specifically generative AI. I know there's a lot of pressure in the ecosystem to use AI everywhere, but when we met with customers, it was very clear that as you think about FinOps, there are a lot better solutions and especially more cost-effective solutions on certain ways of leveraging capability, especially in forecasting. I know many of you have probably done forecasting in your experience, and there are better machine learning capabilities and other analytical capabilities that you can balance. So we thought it was important to frame out what some of the use cases are for when not to use AI.

Thumbnail 660

Thumbnail 680

As you look at this, especially as you think about the financial side, the difference between probabilistic with generative AI and deterministic drives a big difference in terms of what you can use and when you can use it. This decision tree gives a little bit of insight into what you should use when you need it. We're going to dive a little bit more into specifics about what some of the customers shared on this slide as well. The next one, as you look at some of the capabilities that exist in some of the use cases, forecasting is obviously a great one. I've spent some time recently trying to leverage AI on the generative AI side to help predict patterns. I think agentic AI will do a better job of this than generative, but I know a lot of great services exist both within Amazon and within partners that leverage more traditional analytics and do a really good job with that.

Thumbnail 730

You also need to balance what's the value expected versus the cost that's incurred. That framing is really important as you go through this. So we just wanted to make sure and start out the conversation that the answer for everything is not AI. There are different solutions that can be applied to this, and it's something you need to keep in mind as you go through it. So the reality is that there are challenges, and while these aren't all-encompassing, I want to touch on a few that made a repeat appearance.

We've got data quality and preparation. This came up in every single one of our conversations. This is the hard part, not AI, not the tools. Most customers are doing this, but they're doing it retroactively, which can be painful and expensive. Then we've got multi-cloud. Most organizations are running multi-cloud, and if you're doing that, you've got different billing constructs and different terminology. In order for AI to be useful, you have to normalize that data into a common framework. Then we have what I call the trust gap: hallucinations, AI reliability. We had one leader mention to us that their system recommended a twelve million dollar savings by buying document DBRIs. It sounds good, but those don't exist currently. AI has the ability to increase productivity ten times or more, but if we're not careful, it can make more work or worse, cause distress for our customers or our teams.

Thumbnail 840

Thumbnail 850

Skill shortage is another challenge. Most people aren't great at writing prompts, and this doesn't even begin to touch on the technical gap that compounds this challenge. Then there's integration: the people, the processes, the systems, the operational friction points that can add up and demotivate people from using the systems that may actually help them. I'm sharing these to let you know that they're universal, not to discourage you, but to show you how leaders are approaching it. We had a lot of conversations, but these are the four main themes that came out: build the foundation, all about the data; buy proven solutions, focused on the tooling; enable with guardrails, the culture; and then the process, experimentation and iterating.

Thumbnail 870

Thumbnail 890

Let's dive in a little bit deeper on what these look like. So build the foundation, data first. This is fundamental. This is what I call table stakes. No amount of AI can compensate for a poor data foundation, and you'll hear me say that multiple times today. Buy proven solutions. There was a time when buy versus build was a real debate, but it seems that in AI for FinOps, it's mostly resolved and it's buy. The landscape is moving so fast. One interesting thing that we did find out is that customers' commitment timelines are changing from annual to quarterly or even monthly, so customers can keep up with that fast-paced change.

The next one we saw as a trend was around the culture and enablement, this notion of experimentation and freedom to test and have fun and see how you can guide within this and ensure there are guardrails that exist. I know recently I was in

Mexico City and I meet with a lot of banks given my banking background. They're obviously very worried, as most are, around security and data, especially their customer information. They don't want that being trained on LLMs. They wanted to make sure they retain that in-house. While they were building the foundation of the elements of building out LLMs to be available for the workforce, they had a lot of pride in not having access for their internal associates to LLMs. As they went through that, I was thinking, okay, have you set that expectation for your workforce? They said yes, it's very clear. I literally walk out of the conference room and someone's on ChatGPT putting company information into a public ChatGPT and I'm cringing. So you have to have the guardrails in place and the expectations. That doesn't mean every associate can do whatever they want. You have to be clear around those guardrails and the culture that you want to create within that.

Thumbnail 960

The last one is around learning first, this notion of experimentation and starting small. We get a lot of questions in the job that we're in around how do you get started, how can you even begin to experiment using AI for FinOps. I think the whole mantra we've all heard, which is start small, think big, is a good mantra here. I always try to focus on a pain point in an area where there's an opportunity. There are varying ways to do this, and I'm sure you've all heard many of these. How do you have promptathons specifically locally for people to learn about that? How do you encourage peer-to-peer connections with maybe someone who's more advanced and someone less advanced? How do you create the environment and the time? It's usually about time trade-offs that come up with customers of being able to learn and experiment as a part of using this.

Thumbnail 1010

Case Study: Large Bank's Generative AI Chatbot Drives 20x Increase in Reach

The first case study, and we're going to do three or four of these, was actually in a large bank and how they were driving through their AI evolution. The challenge they were having, just like many of us, is you needed to have a little bit of deeper knowledge to be able to answer the questions from the product owners and the engineering community. Some of the harder questions took a little bit more time and there was a very small level of capacity of analysts who could actually support this. So they were looking at a way that AI could really help build this and solve this.

Thumbnail 1030

Thumbnail 1050

They really focused first on one thing: could they have a generative AI kind of chatbot solution on top of the data to enable people to self-serve? They knew it wouldn't answer every question, but they were hoping if they could pair that data along with historical analysis they'd done, using RAG as a way to bring that insight to bear. What they saw when they started creating it, it caught on like wildfire. They had a 20 times increase in reach, specifically around the adoption of the tool. They were able to enable people to self-serve as a part of it and it was a very small level of investment to build this out and see this. The customer quote was, this isn't about just giving people more data, it's trying to provide answers and insights to them to enable this.

Thumbnail 1070

Thumbnail 1080

So this is one of the use cases that we saw from a customer interview that we were really excited to see being put to use, especially for the group that was in scale. Let's look at the value chain and what it looks like in real life. It starts with the raw data and metadata. This is not just your cost and usage reports, this is context: business events, seasonal patterns, organizational structure. Then you've got your data normalization and quality. This is when you're translating across clouds. This is invisible to the end user but critical.

Then you have AI processing and analysis. This is when the machine learning is actually doing the work. Insights and human review: do you want to be confident in the decision or the action? Because if you do, you want to get explainability in there or human review. If you've got a system that says, hey Catherine, let's right-size 500 instances, okay, why? What's the risk if I do? What's the risk if I don't? Is confidence high or is it low? If it's low, let's get a human in there. If it's high, let's kick it over to the developer's queue for approval.

Then you have automated and semi-automated actions. I say semi-automated because even the most mature organizations are keeping humans in the loop. And then last, outcome measurement and learning. Did optimization work? Did it save costs? Was performance stable? You want to feed what you learned back into the system so AI gets smarter with time. You'll notice it's a loop and not a straight line because the actions should train the system and the outcomes should continuously refine the model.

Thumbnail 1180

So there were three themes that popped out in the conversations, and it's from most popular to early stages of development on the other side. The no-brainers are the conversational interfaces. These are the use cases that probably you've seen grow in the last year as generative AI has been fully adopted. How do I use our internal information along with leveraging a large language model to enable capabilities and reach that exists? There are a lot of good examples of customer questions that exist there. The middle is where things are evolving and it's growing much faster, which is how can I have either potentially an agent work on your behalf or escalate and elevate exceptions.

There are many great anomaly detection solutions available, including those from AWS, but this approach allows you to start leveraging these capabilities internally on other datasets beyond just your cloud usage data. The last stage is everyone's goal: how can you automate some of this? How can you enable agentic agents to do this on your behalf? I've been close to how this is evolving in both the startup and large company ecosystems, and it's coming quickly. The pace of change is happening rapidly. This was an area we didn't see much of in the customers we met with, but there's significant discussion about the development work they're doing and where they're beginning to focus.

Thumbnail 1260

Case Study: SaaS Company Achieves 100x User Growth by Meeting Engineers Where They Work

One SaaS company we met with for our next case study was beginning to narrow in on this approach. Similar to the first case, they didn't have many people but faced high demands on their team. I know many of you feel that same way. Engineers avoided FinOps. I led a FinOps team and started a FinOps team for a long time, and it's a hard team to be on. Not because engineers don't want to do the right thing, but because there's a lot on their plates. When you're trying to take their capacity away from developing features and asking them to optimize, there has to be a really good incentive for them to focus on that. In this case, that was true for this SaaS company. The engineers were trying to avoid the FinOps team.

Thumbnail 1320

Thumbnail 1340

Thumbnail 1360

As you think about the solution they built, it's not going for me, but maybe for you. The other big thing I know from talking to customers is that you don't want to create a bunch of dashboards and expect people to look at them. You need to bring information to where engineers work. For me in the past, big areas were both GitHub and Slack, those were places where a lot of engineers spend time. So how do I bring financial data there? This SaaS company did the same. They integrated with Slack in terms of notification capabilities and natural language queries directly within Slack that enabled them to deliver this information. The results were incredible. They had a 100 times increase in the user community relative to what was being reviewed before to what was then being accessed and leveraged. They could monitor the conversations and questions, look for key themes and patterns, and elevate FAQs and other things as part of it. It was really driving a cultural change inside the organization. The quote from the customer was: take data where engineers are. That was the big takeaway for them. That was the breakthrough when they started to see the change specifically there.

Thumbnail 1370

One question we get a lot is about the timeline for this same company. They started out in almost like a pilot, agile sprint kind of construct. This is the timeline they leveraged in terms of building out the capability. They focused on 10 development teams as a start to really work out the kinks, and then scaled it to over 500 users inside the organization. As you're all experiencing, the pace of change with AI and LLMs is moving quickly. It's also true with how people are leveraging it inside the company and trying to bring those mindsets to fail fast and move quickly as part of it.

Thumbnail 1400

So next up is another survey question. If you pull out your phone again, I would love to hear from you on this one: what's preventing faster AI adoption in your FinOps practice? There are four options. Choose one, but if you don't mind sharing your insight, we don't want to taint the audience, but I know we talked a little bit before about what we thought this would be. I don't want to give away what I thought it was going to be, but I was pretty confident. Through these conversations I learned I was wrong, very wrong. But it was humbling to learn.

Thumbnail 1450

Thumbnail 1460

Thumbnail 1470

Thumbnail 1480

Thumbnail 1490

Great, we got some great results here. The results are pretty evenly distributed. Expertise is probably the one that stands out, and that's what I thought it was, so I'm glad I'm aligned with the audience. So while we wrap the first section, if you're going to remember anything, let it be these three things. Data quality over everything. No amount of AI can compensate for a poor data foundation. Trust but verify. Use AI to do 80 percent of the heavy lifting, but keep humans in the loop. And then problems, not wish lists. Focus on something that changes behavior or saves time and money. If it doesn't, delay it.

Thumbnail 1500

FinOps for AI: Managing AI Costs and the GPU Capacity Challenge

All right, so now that we've talked about how we use AI to manage costs, let's talk about managing the costs of AI itself. I want to start with a quote from one of the leaders we spoke to because I feel like it frames up this topic nicely.

Thumbnail 1510

I like this for multiple reasons, but I want to call out two specific ones. First, it's something that I mentioned at the beginning, reminding us to not overcomplicate things. And then the second thing, which was our biggest takeaway in our conversations, is that AI is not that different.

Thumbnail 1550

As you balance the notion of traditional cloud and AI workloads, there are some differences, and I know this comes up pretty actively as we talk to customers. Some may laugh about predictable patterns with cloud. I know I get a lot of questions about how to forecast cloud and what to expect, but relative to AI, it's much spikier in terms of consumption within AI than it is within the cloud. As you read through the rest, there's a little bit more maturity and knowledge on the traditional cloud side, and it's new on the AI side.

I do think you've called out the one which was how to make common nomenclature across providers that you're leveraging. That definitely has come up in our customer conversations. But I think the key is the principles are the same, and that was true in every conversation we had. Although there are different terms and a different dynamic, it was clear that we had the mechanisms in place for how to hold leaders accountable and how to plan for these elements. While it's a little bit different, it was very similar in terms of the optimization structure itself.

Thumbnail 1610

Thumbnail 1630

What better way than the iceberg analogy? You don't want to make the mistake and only budget for what you can see. You also want to plan and consider everything below the water line: data preparation, integration, fine tuning, and training. The hard thing about the things on the bottom is that it's taking existing capacity to do that. So a lot of times customers are saying they can plan for the top because they can point to it, but if you look at the bottom, it's around redirecting your capacity and your teams to do this work.

That takes prioritization and trade-off, which is not an easy thing to do. I know a lot of customers are focusing on that. So just make sure as you go through this you don't ignore that just because it may be viewed as a sunk cost of capacity internally. It is capacity that could be directed elsewhere. So make sure you're considering that as you go through the focus.

Thumbnail 1660

Thumbnail 1680

Thumbnail 1690

Thumbnail 1700

This is a high-level simplified checklist, but we want to run through it. Budget for about 30 to 40 percent for data preparation. Your FinOps team capacity: if you're scaling AI, you're probably going to want to scale your FinOps team as well. Budget and plan for governance, integration, and training, about 20 to 30 percent for experimentation and failure, and then organizational change management. But once you've planned for all that, what next? GPU capacity. This is something we've all seen or heard and or experienced it. It is a big ticket item, a big light item, and it's a big challenge to manage.

Thumbnail 1710

Another case study we did was with a global bank. They had a pretty good analysis and a framework to leverage on making the trade-offs that were necessary around GPU. We thought we'd share that with you. The challenge is that it's scarce capacity, at least in some instances, and a lot of times even when you allocate that capacity, it's unclear how it's actually being leveraged, especially inside of the organization. In this one it was a highly federated organization, so every business had kind of freedom and rights to do what they want. They just allocated capacity in the aggregate, and once you do that you kind of lose insight and visibility into how it's being leveraged.

Thumbnail 1740

Thumbnail 1760

Thumbnail 1770

So they decided to build a framework and an approach to help solve that. This got very detailed. I know when we were reviewing with them they had 19 unique GPU metrics that they were looking at around utilization and the specifics of that. They were trying to aggregate those to understand where the patterns and utilization were. But the big element was how to give visibility around the usage patterns and elevate where that was being done to make sure it was delivering the most value to the organization. They had a really good 2x2 which I loved, and I want to share with you as well. There's nothing better than a good 2x2 to frame a conversation. If you look at the bottom, it's usage on the y-axis and criticality on the x-axis. There are some no-brainers, especially on the bottom left. If it's low utilization of a GPU and it's providing low criticality to the business, get rid of it and redirect that capacity elsewhere. The top right is all about high value and high criticality. There could be ways to optimize that, but I think the key is thinking about what are the use cases in your company, how are you leveraging this kind of scarce capacity that exists, and is there some framing or approach you can put in place to ensure that you and your leaders are making the best decision possible for your customers?

Thumbnail 1810

Thumbnail 1820

The ROI Dilemma: Measuring Value Beyond Traditional Metrics in AI Investments

So now that we've planned for the budget, let's talk about recognizing ROI. Get out your phones one last time. What's preventing you from measuring your AI ROI today?

Thumbnail 1870

This is another one where I felt like we had differing points of view, but I'll be curious what the results are. The response rate has dwindled with every single question, so that's one observation. We won't read too much into it. Now we're back even over the last question. The results show that scattered costs and no metrics are the top concerns, with no metrics being number one and two, but fairly evenly distributed as well.

Thumbnail 1880

Thumbnail 1900

I'm going to dive deep into this because I know for me, I've spent my career creating business cases. If any of you have done that, it's pretty easy to create a business case. It is very hard to track it after you go live. It's a mixed master of information, but from a framework standpoint, the nice thing about a traditional ROI analysis on the front end is there are costs and investments that are necessary. There are expected benefits as you go through them, and you can tie things back pretty easily to a cause and effect for most things, not everything, but most things.

Thumbnail 1910

Thumbnail 1920

Thumbnail 1950

Thumbnail 1960

Once you introduce AI into the mix, things change a little bit because there are a lot of second and third order impacts as you go through this. There's a good example here with an externally facing chatbot. When you add a capability and customers can start getting answers quickly, they feel like it's better service and things are improving. That's awesome and really good. Brand perception is improving and customer satisfaction is improving. But as you think back internally, I meet with a lot of CFOs. One CFO said to me, "I need numbers, not stories." This isn't about feeling good about this. I want to know how that translates to either the P&L or to performance and metrics that are there. I get a lot of questions specifically around how to measure this, and I think there are a couple of things that come as challenges.

One is what we mentioned as the iceberg, which is the denominator effect. There tends to be a lot more cost involved than you're actually quantifying as you go through some of this. On the numerator side, I know most organizations struggle with value articulation, and that can range from being a private enterprise where you're for profit to a government entity where it's mission driven. It's a totally different conversation of value creation. Then the last one is that it's kind of a moving target in terms of timeline and expectations. These are the three patterns that came up and have come up in customer conversations.

Thumbnail 2020

The first one is the productivity paradox. All of us have experienced this. When you've leveraged AI to do something in an everyday task, it can provide a great experience and do things better and faster, which is awesome. But in the example here with code, just because you can code faster, what does that mean? Does that mean better quality? Does that mean more features? What does that actually translate to in terms of customer value? That's one of the challenges that customers have.

The second is baseline absence. I always love to ask the question to IT leaders: do you track time? I know that's a thorn in everyone's side in terms of time tracking. You don't want somebody looking over you and what's going on, but baseline creation is really important as you go into this. I spent a lot of time on total cost of ownership of cloud or IT. If you don't know where you're starting, it's very hard to know what's changed and how it's evolved. There are a lot of activities we do that you just don't measure, and it almost doesn't make sense to measure in this regard, but that's where some of the challenges are coming from: what was it before, what is it after, and what do I do with it.

Thumbnail 2060

Thumbnail 2090

Then lastly is the diffusion of value, which was one of the items that was called out there. There's almost this downstream impact of things that can be unearthed as you go through this, but how do you quantify some of these items? We've always had struggle on the qualitative side. If you're more secure, how do you translate that to an articulation of value? You could say we're more secure, but what does that actually translate to in terms of performance? Obviously a lot of this is reputation risk and softer things because they are mitigating events, but those are the things that come up as a challenge.

There are a couple of things I've seen some patterns of what's working and what's not working. A lot of times people say if you save time you save money, and I know for many of us we've probably gotten more productive in the last year and a half in certain ways, some ways maybe not as much, but in some ways we have. That doesn't mean you don't need us. It's still resources that are necessary.

Thumbnail 2120

So just because you save time, and the stories of "I saved 1000 hours"—most financially minded people are going to ask: that's 1000 hours, that's a lot of money. Where's that going? Where's that being redirected to? So a lot of this is resource reallocation that exists in companies.

Thumbnail 2140

The second is around revenue attribution. When you roll out a capability or a service, I work with a lot of ISV and SaaS companies that are looking to build AI into a lot of their solutions now, so they're thinking about how do I price it? How do I assess this? What's the value that's expected through this? So just because you add that, it's very hard to attribute revenue only to that in isolation.

Thumbnail 2160

The third is around cost avoidance, the age-old question: if I didn't have this, I would have had to hire these people. I would have had to do this. Well, that's a good storyline, but in aggregate, sometimes it can feel like phantom money when you talk about it. However, it's a real thing. So I think it's important as you go through this to use this wisely and specifically. And lastly is around efficiency metrics. I'm a big believer in metrics and ratios, but as you go through this, be careful of the cumulative effect of all the ratios because if you add up all these ratios together, it seems like you don't need a team or a resource or something else that's there. So if that's true, that's great, but if it's not, be really mindful of the words you use and the metrics you use as you go through it.

Thumbnail 2210

So there are three patterns I've seen to be successful for companies as they think about value articulation. One is something called a value driver, think of it like a value tree, which has primary measurable items: lower cost, less losses, more revenue. That's very clear, that's a traditional measure. But the secondary areas, you need to make sure you're articulating as well. You may not have numbers behind all of them, but you need to have the qualitative assessment of that and not ignore that as you're talking about returns.

Thumbnail 2230

The second is around outcome-based metrics. This is something that came up in a lot of the conversations we had: how do I think about what the outcomes of these activities are so that we are measuring that? Sometimes that in isolation won't tell you everything, but this in concert with the value tree will definitely help. But the one that's probably resonating the most is this portfolio approach, which is thinking like a VC firm. As you know, with a VC firm they make a lot of investments. I know there's always the sound bite out there that nine out of ten businesses fail, so that means you need to start ten businesses because one will be successful. It's the same construct here, not that nine will fail with AI, but it is an element of needing multiple avenues and things in the fire because some will work out and some won't work out.

I was at an event two weeks ago in New York around AI and finance, and someone speaking gave an example of where the value is coming from a lot of the AI projects. As you think about it, it's usually the more obscure, far-reaching elements, not the basics that are delivering the most value. So you need to have a wide spectrum of areas you're investing in as you're going through it.

Thumbnail 2280

Thumbnail 2310

To summarize this section, if you're going to remember anything, let it be these things. Traditional cloud cost planning for that will leave you three to five times short. So plan for that from the beginning. You can't control when or if AI features get adopted, so instead focus on cost per user, cost per transaction, or focus on optimizing model selection or token efficiency. And then don't let the fact that you can't measure ROI values stop you from investing in AI. Think outside the box. Think of those proxy metrics: time reduced, errors reduced, things of that nature.

Thumbnail 2330

Thumbnail 2340

Building Your Strategy: The Crawl, Walk, Run Approach for AI and FinOps Maturity

So now let's talk about building the strategy and putting it into action. I'm going to start with AI for FinOps. This follows the crawl, walk, run approach for anyone who's familiar with the FinOps Foundation. So we've got phase one, it's the base camp, it's your foundation. This is when you're building the basics, you're tagging your data quality. What does success look like here? This is when your team can ask simple questions in natural language and get an accurate answer.

Thumbnail 2350

Then you have the climb, the enablement. This is when you're scaling intelligence. This is when you have more conversational interfaces for more teams, and you've got those auto-generated recommendations that are actually useful. Success here is your FinOps teams are no longer pulling reports, but they're doing that strategic, proactive optimization.

Thumbnail 2370

And then the summit, automation. This is a mature state. This is self-healing optimization. This is when you've got predictive controls in place that are preventing waste proactively. Success here is you've got those auto-generated recommendations with autonomous execution, and those cost surprises are starting to become a thing of the past.

Thumbnail 2400

I'm not going to spend a ton of time on this slide because we don't have a ton of time. We've got a pretty good amount of time, but before I hand it over to Chris, I just want to mention:

Thumbnail 2430

Thumbnail 2440

you don't have to be at the summit to get value. We spoke with a lot of companies who are recognizing ROI at base camp. If base camp is where you are, that's where you need to be, and that's where most of the market is. As we think of a very similar construct on the path for FinOps for AI as you think about the base camp, a lot of this is around visibility. I know a lot of organizations are yearning for insight into how they're leveraging AI, how much they're investing in it, and where they're investing in it. It sounds very similar to what we just talked about before. You need to have the right level of perspective and insight into who's consuming it inside of the organization.

Thumbnail 2480

Thumbnail 2490

On the optimization side, it's obviously all about model selection and a lot of choice. I know there's a lot of evolution moving from large language models to more targeted small language models. There are kind of three legs of the stool around speed, accuracy, and cost. As you think about those three legs, not every use case is the same, and there are a lot of levers you can pull to drive optimization as you go through this. Then lastly on the summit, it's around embedded governance, predictive elements, and being able to plan capacity, especially on the GPU side in the right and effective way. We did include a bunch of the details as well, and it's an evolution you would expect as you go through this.

What are the right metrics to look at to assess this? How do I then evolve the right accountability model and link this to business outcomes? Ultimately, how do you have preventative planning and levers as you go through it? There are a lot of levers that exist to drive automation and cost efficiency of using AI, and there are a lot of options that exist there. Taking advantage of those options is an important element as we go through it.

Key Takeaways and Resources: Essential Advice from Leaders for Your AI Journey

We're in the final stretch. We're going to do tips, then we're going to do educational recommendations, and then advice from leaders, and that will wrap it. If you're going to remember anything from today, it's these five things. First, you're not alone. Ninety percent of organizations are still in the early phases. I know I'm a broken record with this, but data quality beats AI sophistication always. No amount of AI magic is going to compensate for poor data.

Thumbnail 2570

Thumbnail 2590

Human judgment still matters. Use AI to accelerate and humans to validate. Solve for real problems. Don't deploy AI because it's cool or because everybody else is doing it. Deploy it because it solves a painful, expensive, or repetitive problem. And then, as I mentioned at the beginning, never stop learning. From the leaders we spoke to, we started to gather people that they follow, podcasts that they listen to. We even got a query from an individual on my team, Adam Richter, that he does almost every day just to stay up with everything going on in AI. Take a look at these resources and follow them. I'll call out one of my favorites. There are a lot of great ones up there, but the AI Daily Brief is a must-listen podcast if anybody doesn't listen to that already.

Thumbnail 2650

They do a really good job highlighting the five minutes of insight that you need to learn about what's happened in AI that day, and then they go into a deeper dive element. I know time is sensitive for everyone, so if you could devote five minutes, the AI Daily Brief is a must-listen from my vantage point if you want to stay informed of what's evolving and happening out there. We'll wrap with some advice that we heard, and these are direct quotes from leaders that we met with as part of the conversation. The first one is around the data side, and the quote was that they thought they could skip this and just go right to it. I know I get a lot of structured and unstructured data questions about how to leverage this.

Thumbnail 2680

Thumbnail 2690

The gold inside of companies is their own data. It's not just the large language model; it's pairing that with the insights and the information you have in your organization and spending time integrating those two together that is key. The second is around humans in the loop. Make sure there's good fact-checking and good frameworks. I loved the loop you had specifically around the process that you can integrate on top of it. Third, it's all about painful problems and repetitive problems. Don't just assume a dashboard solves issues. You need to make sure it's integrated and being shared in a way that's actually solving customers' problems.

Thumbnail 2700

The whole buy versus build question—I know there's a kind of hybrid answer, which is that you're likely going to buy an LLM. Some companies here, many large enterprises, may build a lot of their own.

Thumbnail 2720

Most are buying, but integrating that with your own data is the key, and that's where you can have an in-house solution to integrate that together. This is all about experimentation and taking time to do that. And then lastly, while the details are different, don't overthink it. FinOps basics exist. I know I don't know what your viewpoint is, but I would say FinOps has matured a lot over the last several years, but I still meet with a ton of customers that haven't gotten started yet in this regard as well.

Thumbnail 2750

Thumbnail 2770

I think learning the basics here, you're at a really good time, especially if you're maybe in the early stages, to be able to leapfrog and mature your practices in FinOps by using some of the AI capabilities that exist. And just to bring home the summit analogy, reading through this, the base camps are where you prepare, the climb is where you grow, and the summit's where you see the view, but you can't do it standing still, so it's really important to get started in this regard. Obviously AWS can help you. There's great partners out there that can help you. So if you're interested, I definitely would reach out to your account team to engage on this to learn more and to hear more about it.

Thumbnail 2800

And then I just want to wrap. I was talking about my son at the beginning, and I share this because this is from about 23 weeks ago now, and I share it because he's thriving in my humble opinion. And I don't think we could have gotten him to this state without the subject matter experts and the community that have really helped us, and we want to be that for you. As Chris mentioned, reach out to the account teams, engage us, let us help you. And then, of course, this is our QR codes for our LinkedIn if you want to connect.

We will also stay outside for any questions after, and then please do the survey. Feedback is a gift and we want to continuously get better. So, with that being said, Happy re:Invent. Thank you.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)