DEV Community

Cover image for AWS re:Invent 2025 - A leader's guide to data strategy in the era of agentic AI (SNR202)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - A leader's guide to data strategy in the era of agentic AI (SNR202)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - A leader's guide to data strategy in the era of agentic AI (SNR202)

In this video, Tom Godden, AWS Executive in Residence, and Matt Quinn, CTO of CarGurus, present a data strategy framework for the agentic AI era using Formula One racing as a metaphor. They challenge the volume-focused approach, noting that 99% of organizations invest in data but only 29% see meaningful value. The framework consists of three pillars: reimagine (strategic curation over volume, data products with clear ownership), rewire (embedding data teams near decision-makers, minimum viable governance that enables rather than restricts), and realize (real-time dynamic data, eliminating batch processes). Matt shares CarGurus' transformation journey, implementing lineage, data stewards, and governance to enable innovations like CarGurus Discover and Price Vantage. Key insights include treating 50% of data as open by default, decentralizing teams while centralizing only what speeds execution, and measuring success by decision velocity rather than data volume. The session emphasizes that data is oxygen, not oil—essential for survival in modern business.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Thumbnail 20

Introduction: Learning Data Strategy from Formula One Racing

Thank you very much for spending your time here at the end of the day here at re:Invent. Hopefully, everyone's having a good time at re:Invent. My name is Tom Godden. I am an Executive in Residence here at AWS. I'm going to be joined here a little bit later by Matt Quinn. Matt is the Chief Technology Officer of CarGurus. So if anyone needs to do a little bit of car shopping, some comparison shopping, Matt can hook you up and get you going.

Thumbnail 30

Thumbnail 40

Thumbnail 50

Thumbnail 60

The Executive in Residence team is a team of 15 former Chief Information Officers, Chief Technology Officers, Chief Executive Officers from companies like Coca-Cola, McDonald's, Capital One, Airbus, NASA, and JPL. I was the Chief Information Officer at Foundation Medicine. Foundation Medicine is the world's largest genomics company. And as you can imagine, data was front and center to what we had to do. And I think that's true now for all of us. I think data is front and center for all of us.

Thumbnail 80

Thumbnail 110

So, we're going to talk a little bit about building a data strategy in the era of agentic AI, but we're going to do it through the lens of Formula One. Hopefully, maybe we got a few fans in here. Hopefully you're rooting for the right team. Yeah, okay, yeah. I don't know who I'm rooting for. My team's out, you know, so Ferrari would be the team for AWS. But in Formula One, margins come down to milliseconds. Champions are turning all of this data that they're gathering into success. But how do we learn from that? How can we take some of those lessons and apply that precision and that discipline to our organizations to succeed? So I'm going to take you through a framework that we've been working with organizations on that talks about some of the higher-level things to help you build out a data strategy.

Thumbnail 120

You know, in Formula One, we think it's the drivers, we love the drivers, we love the cars, but it's the data. And as I was building this deck and researching this and talking to the Formula One data strategists, it's so important that the data is the differentiator for these cars. It's the invisible information. We are a title sponsor for Formula One at AWS. We are also a sponsor of the Ferrari team. Each Formula One car captures 1 million data points per second that it's on the track. It's an unbelievable amount of data, but what's important about it is that those teams are translating that into actionable data, into actions that help the cars go faster. Fans get to see the spectacle, but it's that analytical edge that is actually making the difference and creating the winners and losers.

Thumbnail 170

But unfortunately, unlike Formula One, most businesses are struggling and failing to extract the value that they want out of their data strategy. Gartner warns that 80% of data governance initiatives will fail by 2027. Those expensive data lakes are going to sit unutilized, and the reality is many organizations have prioritized data volume, vanity metrics, over the value that we're supposed to be getting out of this.

Thumbnail 200

So have we been approaching data strategy the wrong way? We're going to drive a Formula One car the opposite direction, see how well that works. We celebrate collecting petabytes of data while customers are waiting for value. Guys, I did that at the genomics company. You want to talk data volume, I can talk data volume all day, and I'm so proud of it. I'm like, oh, we just crossed 100 petabytes of data. Isn't that cool? We just got to 500 petabytes of data. That's so cool. Who cares? Who cares? Formula One teams show us a better way. They only collect the data, I know they're capturing a million data points, but they only capture the data that helps them win races.

Thumbnail 250

Here's a sobering statistic from Harvard. 99% of organizations are investing in data. Now, my first question is, who's the 1% that aren't? Let's choose to believe that Harvard did the study correctly, and apparently it is 99%. But here's the sobering part, only 29% say they're seeing meaningful value from those investments. We're chasing this data, but we're not finding ways to get value out of these things.

Thumbnail 280

Thumbnail 300

The Agentic AI Era: A Fundamental Shift in Data Requirements

And now we have agentic AI that's changing everything. It's like when the Formula One teams redesigned the cars. But agentic AI isn't just an extension of AI. It's a fundamental rethinking of how we approach AI and systems. The data requirements are different. The approach is different. The risks are certainly different. And we've seen data strategies evolve. You know, BI gave us reports, some dashboards. Big data.

We built dashboards in the big data era, remember that day? We built a lot of data lakes. Now we're moving farther into the generative AI era where we're generating data and information. And now this fourth era, as we get into the agentic era, is where our data lakes are going to become those training grounds. Those pipelines need to feed those autonomous agents. The infrastructure stays, but the purpose and how we're approaching it needs to expand.

Thumbnail 340

Thumbnail 350

So we're going to talk about a data framework and how to approach this and give you some practical advice and guidance on this, and we'll try to transform how we think about this a little bit. Now, promises are easy, so give me time. I'm going to take you through a little bit of a framework that will help. Hopefully not scattered tactics, but some real strategy on this.

Thumbnail 380

We're going to start by reimagining. I want to challenge some of the assumptions that we've held, that I held, proudly held, around data and its worth. And then we're going to rewire. We're going to talk about how to change how we operate a little bit, and then we've got to realize the results. Results that executives can see, that your customers can see, that they can actually believe in.

Reimagining Data: Ruthless Prioritization Over Volume

So how do we actually deliver on those? We need to start with a solid foundation. That means reimagining how we've thought about data, the assumptions we've held about data, how you manage it, how you extract it, how you get value from it. Unfortunately, a lot of companies are chasing data by using technology as opposed to thinking about the process and the value.

Thumbnail 410

Now, let's take some inspiration from Formula One. In Formula One, every gram on the car matters. They obsess about every single thing. One of the cars changed the way they put the decals on because they could get the decals actually to be 0.10 of a gram lighter. If they add a new sensor to capture data, something else on the car needs to come off. The engineers are asking what data truly drives value. We need to have that same discipline inside of our organization. We need to have ruthless prioritization over what data we're capturing and what data we're using.

Thumbnail 490

Now, let me pause here for just a second. We need to rethink the importance of gathering data because we've fallen into a trap to say we're going to gather everything, we're going to capture everything. And the answer to that is you probably still should because you don't want to miss the opportunity. You may not be able to go back and get the data if you don't capture it, but capturing it and persisting it is different than capturing it, persisting it, and using it and trying to manage it and do all these things with it. We don't want to get into that. We want to let value be our guide. This means every data point must enhance value just like in Formula One.

In Formula One, the races come down to the tiniest margins. 0.8 seconds separated the first car from the last car in qualifying in the recent race in Austria. We need to ruthlessly prioritize. Nothing can make the race car on race day unless it provides value. We need to be thinking of the same thing as we look at this data.

Thumbnail 510

Thumbnail 530

But here's an uncomfortable truth. Again, that sophisticated infrastructure that we're happy to sell you doesn't automatically create value. We need to connect them to customer decisions so we can form real business value. We need to prioritize the impact that we can have on this, but one of the challenges that we see is this translation gap.

Thumbnail 570

Bridging the Translation Gap: Embedding Teams for Measurable Impact

Domain experts know how things operate, but then you have the technology guys that are over hidden over there, and your data experts are, I don't know, they're probably, we didn't let them come. They're in the other office. And we federated this. We're building these sophisticated solutions, but we're building them in isolation in many cases. Success comes when we embed the teams together, when we put them together and give them a sense of ownership and accountability.

So I like to look at this and say every investment should pass these three criteria. One, does it have measurable business impact? Seems obvious. Doesn't always happen. Question two, maybe your teams are different than mine. Maybe Matt's got more success than I did, but we need to prevent overengineering when simpler solutions exist. Gosh, my teams love to tie something in a Gordian knot. It was beautiful. It was awesome. It would have been fun to build. Didn't add value. Question three, we need to focus on speed. Speed is going to be king in this agentic AI era.

Bezos has a good quote: the only thing that your competition cannot match you on is you being first. That's it. Speed wins in this new world, so we need to be thinking about that as we look at this data. When we're gathering this abundance of data, we're not able to go fast to deliver that value and how we go forward.

Thumbnail 640

Thumbnail 680

Now, a great organization that's been able to use this purpose-built value type of thinking is Lonely Planet. They've exemplified this. They began looking at travelers' actual needs, wanting to understand more information on personalized recommendations as they traveled, and they worked back from that. Rather than leading with the technology, they reimagined the experience of that end user as they were traveling. They now use Amazon Bedrock to ask queries like, can I find vegetarian dining in Rome? I guess pasta might be vegetarian, depends. I don't know why you'd want vegetarian in Rome, but maybe you do, and that's the point. They built and they focused and they went after just the data that they needed to answer that question.

Now, do you ever wonder what Formula One drivers stare at on those screens before they leave the garage? It's this: it's their custom lap analysis. It's showing them exactly as they go around the track where they gained or lost time against a competitor. Every turn is mapped. It's color coded. It provides instant clarity to those drivers as they're sitting in their garage before they go. No overwhelming dashboards, just the data they need to go faster on the next lap. This focused clarity is what we need to drive for and strive for as well.

Thumbnail 720

So we need this strategic data curation like those driver teams. We need to focus on how we can eliminate the noise and provide clear data. When data serves decision makers directly and they see value out of it, transformation actually happens. I've built a lot of data warehouses in my time. Okay, let's be honest, Tom didn't build any of them. Tom's teams built them. We have this notion of we'll create this repository, it'll be awesome, master it all, clean all the data, it's going to be the brilliant oracle of information, and people will come to it and glean value out of it. Sounds great. My experience is it doesn't happen. The "if you build it, they will come" approach doesn't work. We need to be more purpose-built and deliberate about how we're going about this data, and this focus is necessary to do it.

Thumbnail 780

Interestingly, pit stops improved from 67 seconds to 1.8 seconds, a staggering 97% gain. Now, for your Formula One fans in the room, they're going to tell me that part of the gain was because they stopped fueling the cars in the pit stops, and you're right. But a lot of the other gain, as I talked to the Formula One strategists, they talked about how they captured data on how the people in the pit stop, the pit crew, moved. Did you put a left knee down or did you put a right knee down? Was the gun to do the wheel one foot off the ground or two feet off the ground, or did you have it turned at a 90-degree angle? They obsessed over all these things, and they went and captured just the data that they needed to be able to do that, to understand how they orchestrate being able to pull a car into the pits, tear four tires off, put four new tires on, and send the car back out in 1.8 seconds. It's unbelievable, but it's because they obsessed about what was the data that they needed on that.

Thumbnail 840

Thumbnail 860

Strategic Data Curation: Context and Connections for Agentic AI

But again, this volume fallacy of how we chase all of this data, messy data, and it's not just that it takes all this time to get the data or it's expensive to get the data, of which it is. But all that excessive data creates complexity. It creates noise. It doesn't create clarity. So we need to scrutinize that and how we look at it. Agentic AI is changing how we look at this. Generative AI is changing things. Unstructured data is having its day in the sun. Those emails, those notes, those documents now are directly shaping how we're thinking about these things. No more forcing a structure and a peer ontology on top of these things. Our entire ecosystem is changing about this.

Thumbnail 880

And as we look at how agentic AI is changing this, agentic AI is changing this because it wants to understand how your data connects, how this piece of data relates to this piece of data. We need to get more dimensionality to our data. You see, when we have one customer complaint, that becomes more powerful when I go and understand, well, is it new customers who are complaining?

Or existing customers? Is it customers in Texas that are complaining, or are they complaining in other areas? Are they happy after they're done complaining because I've helped them and fixed their problem? All of that context information is what agentic AI needs to be able to operate autonomously and make those decisions. Because if it doesn't understand that, it's going to execute on its own rules. So we need to deliver that. We need to connect the data together. We need to provide the context to that data so that we can succeed.

Thumbnail 950

You see, Formula One teams, again, they show us something that's powerful. The driver gets just those eight pieces of data that they need. The tire engineer gets just the 200 pieces of data that they need. Most organizations are struggling with this. One out of six managers in a recent survey said they trust their data enough to use it to make business decisions. Wow. I don't even know where to start with that. When they don't trust their data, what do they do? They make gut decisions. They go on instinct. The problem with instinct is, well, it's not always right. The other problem with instinct is it doesn't scale.

Thumbnail 1000

So we want organizations to have a deeper sense of ownership over their data. And so we advocate that they move more towards a data product model. You know, data products aren't just raw information. They're purpose-built assets. They have an owner. They have quality requirements. But here's the most important thing: they're responsible for you getting value out of the data. They can't just build it and hope you come. Their job is to obsess over whether or not people are using the data that they're responsible for to get value, to go meet those people inside your business, have conversations. Why aren't you using my data? What could I change to help that? How could I associate other data with this data to be able to be done? We need to have that data product mentality and think of data like a product. But it's that ownership and accountability of am I getting value that's the important thing that we need to get.

Thumbnail 1060

But here's the other challenge. There's a mistake I see. And yes, guys, you're probably wondering at this point, did Tom get anything right? I made this mistake also. Don't boil the ocean. What I mean by that, and I've seen organizations do this, you say, okay, the first thing we need to do is find all the data in the company. We're going to gather it all, clean it up, master it. And once we're done, we'll make it available for everyone. It's going to cost you $50 million and quite likely your job.

If you take that approach, instead let value be your guide. Find that specific business use case. Look at how Formula One said, what is the data that the driver needs in order to go faster, and go get just that data. Get that data inside your data lake, clean just that data, deliver value for that use case, declare victory, take your team out bowling or whatever you want to go do, and lather, rinse, repeat, and move on to the next one. Don't boil the ocean. Let value and incremental growth be your guide.

Thumbnail 1120

Thumbnail 1140

You see, with agentic AI, the why behind the what is actually becoming almost as, if not more, important. We need to understand why things happened, and in order to understand that, we need that context and we need those connections. It will help us understand and act more successfully. You know, this strategic curation in action was exemplified by BMW. They were trying to optimize their manufacturing process, and instead of measuring everything, they identified just the exact sensors that impacted quality and stopped using the others. Their focus on that data helped them drive better quality through their cars and increase their manufacturing efficiency. Better cars, greener plants, from smarter data curation and specific focus.

Thumbnail 1170

Thumbnail 1190

Rewiring Organizations: The Proximity Principle and Team Structure

So now that we've reset our focus on what matters and maybe we've oriented around data products and strategic curation, let's let Formula One teach us another lesson. The fastest car means nothing without the team behind it. So let's talk a little bit about how we think of rewiring our organizations. Like that 1.8-second pit stop, we need a perfect choreography for our data. Every handoff engineered, every pathway clear, no wasted time or motion. But most companies have built brilliant insights, but they're dying in silos, never reaching decision makers, because we hoped that they would finally come.

Thumbnail 1210

Thumbnail 1220

It's time to engineer those pathways into value-creating types of things. So back to Formula One, Ferrari's tire specialist is not back at the factory. They're sitting right there in the pit. They have conversations with the driver. The fuel strategist hears every idea, every complaint, probably a lot more complaints about how to optimize the car. The race engineer talks to the driver about handling. They're there together. Every expert is positioned where their decisions will create immediate impact, and we need to have that same discipline with our data scientists.

Thumbnail 1250

This proximity principle transforms how we think about this because proximity is so crucial, but without the right structure, it fails. It's like having a trackside expert who can't communicate with the pit crew. The best platforms become paperweights unless the teams can use them effectively. Anyone can buy the technology. Your edge is going to be how you mobilize your team.

Thumbnail 1270

Now, here's an uncomfortable truth. Focusing on just the technology and neglecting people is like taking the best technology in the world for a car, a Formula One car, and putting it on a dirt track. Now, I came up with this brilliant slide because I thought it was unique, and someone came up to me and pointed out that Red Bull actually took a car and drove it on a dirt track two or three years ago in Australia. I think Danny was driving the car. But the point is you wouldn't do this, right? We can't just focus on technology. Technology alone is not the answer to where we need to go. It's the people.

Thumbnail 1310

And the research backs this up. For five straight years, culture beats technology as a roadblock to succeeding with data. The data experts are staying isolated. Remember, they're behind the curtain because we won't let them come out. We need to fix these cultural barriers in order to be successful.

Thumbnail 1340

So where should your data people sit? I get this question all the time. Do I centralize my data team or do I decentralize my data team? Do both. Here's what I mean. Centralization sounds perfect, right? Why would I want sprawl? But distributed allows me to have more domain knowledge, expertise, and accountability. The problem with that is it works great until standards start to erode and you get different things popping up. So the answer isn't either, it's both.

Thumbnail 1370

And what we want to do is focus on saying, and I encourage people to use HR or finance as your model. So with HR, you have a centralized HR organization. I'm quite certain they establish your policies, your rules. Trust me, if you don't believe me, try to come up with your own on how they're going to operate. But you as a leader are also expected to understand those and know how to apply them as a manager, as a leader inside of your organization. We want that same type of model. We want a centralized team that's worried about governance, about standards around the platform, and I tell you, however big you think that team is, make it smaller.

One of the things that I told my centralized team, all of them, several centralized teams, they had one goal a year, only one goal every single year. And the goal was to put themselves out of business. I wanted them to enable all of my team so well that I didn't need them. That was the goal. Now clearly I told them, don't worry, you're not out of a job. I have plenty of work, there's lots to do. But the point is I don't want to be doing this. I'm not selling that platform to my customers. That isn't my value add.

Breaking Down Data Hoarding: Default Sharing and Decentralization

But one of the other problems that we get in organizational structures is data hoarding. People treat it like their personal treasure. And candidly, knowledge is power. So holding that in, they believe that gives them more power and value to the organization. But we want this smart governance. We want to shift this where sharing becomes the default. All of your data across all of your organization is fully available to everyone, period. Sort of. That's where I want you to start. And then obviously, I want you to apply the appropriate controls for risk, for compliance, for other different things, but start with the assumption of default.

We need to be able to move people past this data hoarding. It isn't malicious. Oftentimes people fear, oh, you're going to misunderstand my data, so I'll just do it for you. We need to transform this culture.

Thumbnail 1510

We need to understand their concerns. We need to share that vision. Your role as a leader in this is to have conviction as to where they're going so people will follow you in this. And then we need to enable by building these easy-to-use, user-friendly self-service types of capabilities.

Thumbnail 1530

And we need to fix this decision distance problem with data products with a three-team structure. That platform team, remember, smaller than you think it is, will provide the tools and the standards. The product teams, then I'm going to embed them out on the edge with sales, with marketing, with wherever, accountable for value. And then obviously the business consumers are going to be the ones using the data to be able to get value about it.

Thumbnail 1560

So my guide is to you, decentralize your teams by default. We want to move all your people out as close to the edge as we can. Why? Because that's where the problems, needs, and opportunities are. Centralize only the things that speed you up, not the things that save money. Not because I don't want you to save money, but unless you're better than me, he centralized lots of things because he thought they'd save money, and good luck calculating it and did it happen?

I can measure speed. I can measure speed really easy. It's harder for me to measure cost. Now, the good news is often when I go faster, I'm going to save money. And we want to be able to centralize those things that only speed things up, and we want to flip the script on the data hoarding and make sharing the default. And we need to help our organizations drive through those things.

Thumbnail 1620

Enabling Governance: Guardrails Instead of Roadblocks

But now let's go back to Formula One again. Formula One uses identical telemetry standards, whether you're in the pit or the factory. Every team member knows exactly what tire temperature means. There's no confusion. These standards help them operate at 200 miles an hour and do amazing things. And this is what an enabling governance looks like.

Thumbnail 1640

We need to be enabling, not restricting. Governance should make things easier, not take things away. I worked in a highly regulated space, FDA Class 3 regulated device, matter of life and death. Trust me, I understand the value of governance. But we need to think of how can we help our people do this.

Let me give you a good example. Pick your favorite app store. Don't care which one. I guarantee you they're governing you. They're restricting what can get installed on your phone, how it can get installed, how you can pay for it, even in some cases the content that goes on it. They're governing you. But by and large, we're all okay with it because it just flat out works.

I can install a new app on my phone in seconds. I can pay for it consistently without worrying about what's happening. So I'm getting commensurate value out of it that allows me to tolerate the governance. The governance is also kind of hidden under the covers. We need to use that same mindset and how we help people use our data. Governed by and able, make it easy for them to do the right thing.

Thumbnail 1710

And I know it feels like an impossible puzzle. 92% of Agentic AI teams worry that they don't have the data they need in quality to do their jobs. Half of data leaders say the poor quality blocks their broad AI implementation. You're not behind. Everyone's struggling with this.

Thumbnail 1730

So what do we do? Companies, well, this is what I did. You lock everything down. You're like, well, we need to have control. Bad data, remember? Garbage in, garbage out. We're locking it down. But here's the irony. Those strict controls create the problems they were meant to prevent. You force people to go underground because your governance is slowing them down and it's burdensome and complex.

Thumbnail 1770

So when those official channels fail, people build their own and they build them without safeguards. The tighter your grip, the more that's escaping through the cracks. So what's a better way? Minimum viable governance. Think guardrails, not roadblocks. Accepting some visible risk actually reduces your overall risk. You do that by keeping this work visible.

Thumbnail 1800

When teams work within your systems, not around them, you maintain true control. And we get them to use these systems by governing by enabling, not by restricting. Make it easy for them to do the right thing. You see, traditional governance asks, are we complying? We want to move that to, are we enabling better decisions? Instead of documenting things after things have been created, build that context into

Thumbnail 1810

Thumbnail 1820

the systems as they run. We need to transform this. Now, this three-tier data classification model, you can come up with your own four tiers, whatever. The important point here is 50%, I think, is a good estimation of your data that should be open, should be available to anyone inside your company, not outside, but anyone inside your company. It's only when we get to the top part, to the protected names and addresses, yes, obviously PII, that we're starting to get into data that needs to be more restricted or redacted. And then that highly protected government IDs, financial health records. I see organizations often treating all of their data as though it's in the highly protected category. And in doing so, we're creating those roadblocks and delaying value, and we're forcing people to go around. Smart classification will help us unlock something bigger.

Thumbnail 1860

Thumbnail 1890

We need to be able to understand the lineage. Where did this data come from? Agentic AI is going to demand that you have that understanding. It needs that real-time transparency, knowing how that decision happened. Remember that context, how many people returned it? Were they happy after they returned it? Did they live in the state of Texas? Without that clarity, we're going to be flying blind in this. And one of the ways that we find to help this model is to have people have clear values. When people know their organizational values, they can act without constant approval. You see documented principles that help empower decisions and improve that data quality because they enable people to be able to make decisions themselves.

Thumbnail 1940

If you had a reality check and you asked your current team, what are your data principles, do you think they're going to come back with a consistent answer? A shocking fact in a recent study, 71% of people surveyed could not name half of their organization's values. Our entire approach we have seen, we've distilled it down into these eight essential tenets that we believe can help work in perfect harmony. Build that foundation. Customer value should drive all investment. We need to manage the flow. We need to minimize the distance from the decisions to the data. And we need to open it up to have shared access, and we need that execution with clear ownership through that monitoring and automation.

Thumbnail 1970

I used to tell my teams, if it's not automated, it's not done, and that has never been more true than it is today. If your data flow is not automated, it is not done. We need to think of that model. So again, traditional governance is holding us back through restriction. But we need to get to that enablement point. We need to have that default for these requests be that automated tracking where we're enabling that data to be fully shared. And that governance that enables instead of blocks.

Thumbnail 1990

Realizing Value: Dynamic Data and Living Strategy

Now we built a solid foundation through reimagine. We've organized our teams a little bit through rewire. But the first two steps mean nothing unless we use it to realize the value. So let's dive into some of the principles that help us drive value. Race day, it's where champions prove themselves. Even the fastest car though, without a perfect strategy, needs flawless execution to win. So let's talk about how we do that.

Thumbnail 2020

Ferrari doesn't stick to a pre-race strategy when conditions change mid-race. You would hope. Safety car, rain, car isn't performing the way they want, they make adjustments. They might start with a strategy like this, a tire strategy. But they adapted based upon what's happening. All too often in business, our quarterly planning process cannot respond to those dynamic changes in the world we live in today. And our data strategy needs the same adaptability.

Thumbnail 2050

What we mean is we need dynamic data, and Agentic AI demands that you have live data. Like Ferrari's real-time adjustments, your data must respond to the conditions and change along with it. After each event, Ferrari goes through and they refine their systems. They look at all the data, they adjust what they're doing, but they can do that because they have real-time data that's put in front of people, not a batch report that will run back in Maranello that they're able to give their people later. It's real time and we need to see that.

Thumbnail 2080

Thumbnail 2100

Agentic AI demands that continuous learning, not that periodic refreshing. Forget batch processing. Obsess over eliminating batch processing. Find those batch processing jobs inside your organization and obsess over eliminating those just like we're looking at inside of Formula One. Yes, Formula One processes millions of data points per second, but it's that instant feedback that they're getting that allows them to make those changes.

It's the speed that they're operating, and it's not optional anymore. So we need to transform that into a competitive advantage. Audit your decision delays. Precisely, where does data bottleneck happen? Find two or three business cases inside of that and drive them to done. Get rid of those bottlenecks, those batch processes. Replace them with real-time capabilities. That fast data will enable success.

Thumbnail 2120

Thumbnail 2140

But dynamic data does not mean capturing every customer interaction as it happens. We need to reduce the time between data creation and decision. Like Ferrari, changing their race strategy mid-race based upon track conditions, we need to replace those weekly reports with real-time dashboards so that we can have that speed. But we ultimately, as we look at this, need to understand the importance of this winning strategy. So I want you to obsess over eliminating those batch processes. I want you to obsess over the time it takes from acquisition or generation of data to the point that you get value. Obsess over it. We need to evolve into that fast.

Thumbnail 2190

Going now, as we think of our data strategy back to Formula One, Ferrari doesn't wait until next season to improve their car. Now, however, I wrote this deck a couple of months ago and I think Ferrari actually stopped working on their current car and went to the one next season halfway through the year. But between qualifying and practice, they adjust the wings, the suspension, the tire pressure. Every session generates data that allows them again to use that dynamic data purpose-built to be able to make changes, and our strategy needs that same continuous behavior.

Thumbnail 2230

Thumbnail 2250

You see, strategy isn't a document. I wrote a lot of them. I should know. It's what you do every day. Picture those Formula One teams huddled around the car making decisions instantly. Our data approach needs that same real-time strategy. Success comes from that constant testing and learning, not PowerPoints, not annual planning cycles. And it turns out strategy documents might be less important than we think. McKinsey found that only 21% of top performers credit a strategy document for their success. Now, I don't know if McKinsey wrote those strategy documents. What actually works is treating strategy as a daily practice. The best companies embed that inside of their organization. It lives in our actions and our outcomes, not in the documents that we lovingly and longingly create. I built several of those. This might have been a picture of the ones that are on my desk.

Thumbnail 2300

Thumbnail 2310

These aren't minor issues that we see here. These things are existential threats to your entire data strategy. If you see these signs, pause your investments, pause what you're doing, and do not move forward until you fix these challenges. We need to be able to improve the data quality on this, and strategy is not that annual process. Just like Ferrari's practicing on every single lap, changing, adapting, our daily decisions need to evolve to be that same way. We need to be testing our strategic assumptions with real-time data and not that batch data that's coming in.

Thumbnail 2330

CarGurus Case Study: Matt Quinn on Sociotechnical Transformation

Now, a company that has mastered some of this data strategy that has really been able to excel and succeed at this is CarGurus, and for those of you maybe not familiar, maybe I should let Matt do this a little bit more. CarGurus is a car shopping and a car comparison website that is dominating inside of their industry, and I want to bring up Matt Quinn, the Chief Technology Officer at CarGurus, to share a little bit about how they are approaching their data strategy. So please join me in welcoming Matt to the stage.

Thumbnail 2370

Thumbnail 2380

Thumbnail 2390

Hi everybody. Thanks Tom. Thrilled to be on stage with you today and share a little bit with all of you about our data journey at CarGurus and how we manifested a lot of what Tom is talking about. We're not done, we're by no means ever done, but we've made a lot of progress, and again for those that aren't aware of who CarGurus is, just a little bit more information, we're an automotive listing site, a marketplace. We connect consumers shopping for cars with car dealers who list their inventory with us. We're number one on a number of different measures including ROI

Thumbnail 2420

for the dealer themselves in terms of what they're paying us for the leads and the profit they're making off of the cars they're selling, and in terms of the most visitors to our site. I'll explain a little bit about how we got there, and most notably, this is a stat that we're just incredibly proud of. We influenced 53% of attributed sales in the US, so if you're shopping for a car, you should consider using CarGurus.

Thumbnail 2430

So the theme of my talk today is about change, and as leaders, we all know change is super hard. I find inspiration through others often. One of my favorite technology leaders in the industry is Charity Majors from Honeycomb, an incredible observability platform if anyone's looking. She's co-founder and CTO, and she has this term sociotechnical. Tom talked a little bit about this, but it's effectively the combination of changing culture and winning hearts and minds and changing people and how they're operating along with the technology change. Changing technology can be hard, but when you add in that socio piece, it gets really hard. Especially with data, it's super important, as Tom already mentioned, to bring people along with that change.

Thumbnail 2480

So let's dig in first a little bit to a brief history of CarGurus and data specifically, and sort of our founding moment, where Langley Steinert and the founding team realized an opportunity to use data to improve the experience for customers and focus on customers. At the time, the other automotive listings marketplaces were advertising models, so consumers would come to the website and the first car that they would see would be the one that the dealer paid the most to advertise for. Langley realized that was a terrible experience for customers and invented technology that showed customers the best car based on a number of different factors that we'll dig into, and it completely changed the game and vaulted us into our leading position.

Fast forward a few years, the team realized they needed better technology for data, so they picked a bunch of different data technologies and actually made some great choices, but they stuck with sort of old ways in terms of how they were using it. The best analogy I can give, I came up as a software engineer, is if you were to write code without doing code reviews or without writing unit tests. We had data technology that we were using without all the best practices that you need to use in order to get the most value out of that data. And finally, the last part of our chapter, which we've been moving to slowly and gradually but materially over the last year or two, is to fix that with that sociotechnical change again, people change, behavior change in addition to technology, and going to more modern ways.

Thumbnail 2580

But first I want to zoom in a little bit to one of our core IPs, our Instant Market Value, and talk a little bit about why data is so important for us and potentially why it is for all of you. Instant Market Value effectively takes dozens and dozens of signals about the car, the age of the car, the trim, the features, the make, the model, how old it is, and various other things that we pull in from many, many different data sources and uses traditional machine learning and data science to create a price for that car, what it should be valued at, and tells the customer is it a great deal, a good deal, or a bad deal. That's how we vaulted ourselves into a leading position. The point of this slide is to say if you have data, there is tremendous value in that data. Our company was founded on it and still thrives today because of that data and bringing value to customers through that data.

Thumbnail 2640

So let's go a little further on sociotechnical and talk a little bit about what I mean here and sort of dig in. So on the technical side, you can see some of the choices that the team made years back. Some great tech up on the slide here, admittedly no Amazon tech. We love Amazon. We host on AWS. Our data stack is based on a number of other things. And then when I talk about socio, these are some of the things that I'm talking about. I'll dig into what those are in a minute.

But first I'll say that I shared some of what I was going to talk about today with someone yesterday, another technology leader, and they said, wow, you've got all the data buzzwords in your talk. And it sort of stopped me in my tracks and made me think, is this motherhood and apple pie? Is this going to actually be helpful for the audience? So I reflected on that, and where I ended up was that it's not. It's actually these are very powerful terms. They're buzzwords if you use them as such, but if you lean into them and you really think about what lineage means and what governance is and how stewards operate and where they should be, it can materially change the game for you in data, and it's done so for us.

Thumbnail 2710

Next, what happened? Tom had some red flags earlier. These are the ones that we experienced. It was not good. Four years ago when I joined the company, we identified data as code red. We were experiencing all sorts of problems. We had data ingestion issues where the data engineering team was getting hammered with requests, and it was impinging innovation. We had quality problems that caused delays in terms of figuring out incidents with no lineage or ability to figure out where the data was coming from. We had limited visualizations, which was hamstringing our entire organization in terms of getting answers to questions and analyzing tests. And finally, we had disjointed analytics where there was a lack of consistency, and effectively everything was going to the same ten people, in some cases the same two people in our organization all the time. Everyone knew their names, everyone knew where they sat, and these guys were working nights and weekends just to keep the organization going.

Thumbnail 2770

So what did we do? Well, we checked off all the boxes. Easy, right? Not easy. So we adopted some additional technology, but in addition to that, we leaned into every one of these items in the socio box, and I call it socio because these are all behavior changes. These are not technology, this is culture. And we had to win the hearts and minds of people and explain to them why we were doing this and actually structurally install it into the organization.

So one at a time, lineage. Where does the data come from? Can you trace where it emanated from? And to Tom's point, with the new products that we're building, our customers are demanding that we're able to explain why we got the answer that we did. AI is complicated and you need explainability, and lineage is one of the ways that you do that. We installed data stewards throughout the organization in non-technical functions. The producers of the data now own the data, not the data engineering team.

Governance allowed us to identify what all of our data is, define it, and produce a common vernacular that all of us can use so that we're all speaking the same language, and when we talk about a KPI we know we're talking about the same thing. Security, an obvious one, but one that we didn't have and we installed as well. And then finally, single source of truth, which you would think, well of course you need a single source of truth. We had three for some of our most important statistics, and there were very good reasons why those three existed. Each of the teams that used them needed them for a very specific purpose, but it was preventing us from getting quality decisions and the right data out of our systems.

Thumbnail 2860

Thumbnail 2870

So how did we get there? First, we built a data culture that was centered around governance and hygiene. We built a data governance team, a very small team to start, but one that knew our data and was able to navigate the organization and use a new tool to tag all of our data and define it so that we were all speaking that same language I mentioned earlier. Second, we democratized it. We had Looker, but we went even further. We empowered citizen data technologists throughout the organization and brought dbt Cloud into the fold so that in addition to our data engineering team, we were able to democratize data pipeline creation out to edge analytics teams that sat within the functions, and that eliminated bottlenecks in tons of places.

And then finally, we looked for leverage. There were a number of tools that we found that enabled our engineering teams in the data space to do more work innovating and going faster and actually bringing new products to market. So the last thing I want to say is, so what? What did this actually do? What value did we get out of it? Well, it enabled tremendous innovation within our company.

The two examples I'll provide, first is a new feature called CarGurus Discover, which is a conversational search experience that sits right on our home page where you can talk in a conversational way about the car that you're looking for and actually do things that our current filters or old filters did not afford you. For example, you can say, find me cars with fancy rims. That's not something we have in our ontology, but we're able to do that now because of this data cleanup.

And then on the dealer side, we recently brought a new product to market called Price Vantage that provides advanced intelligence to dealers on the price of a car and actually tells them if you drop the price of this car by one thousand dollars, it'll move a week faster than it will at its current price, again all through data. And that was the fastest to market product that we've had to date because of all this cleanup that we did. So there's real impact when you actually follow these best practices. With that, I'm going to hand it back to Tom.

Conclusion: Data as Oxygen, Not Oil

Thank you, Matt, for sharing that with us. I encourage you guys when we're done here today, if you guys have some questions about how they've done it, you know, Matt just covered very briefly that they're doing a phenomenal amount of work with data, really exceptional stuff, so use the opportunity

Thumbnail 3010

maybe to catch up with them. Now that we've shared the playbook a little bit, I'm going to talk a little bit about the foundation that will help you invent. At AWS, we're trying to build that proven data infrastructure that hopefully will be able to scale without limits to provide you the value that you need. We're also building those purpose-built databases and processing engines and that ecosystem that can help you with that transformation.

Thumbnail 3020

Thumbnail 3040

But as we put all these pieces together, now we have a complete playbook. Hopefully, reimagine, realize, rewire. The question isn't whether or not we need to transform, it's how fast we can move to do that. So start by reimagining your foundation by treating data through your customer's eyes as products. Focus on that strategic curation over volume and focus on value driving everything you're doing. Then rewire your organization for that people-powered culture. Have those clear tenets, have that minimum viable governance that enables. Decentralize your teams by default, centralize only what helps you go faster, and then help your teams realize those results to that real-time dynamic data and that living strategy.

Thumbnail 3050

Thumbnail 3100

So to my Formula One fans, it's lights out, it's away we go. Let's get started. But I want to leave you with one last thing. Everyone's heard the phrase data is like oil. No, it's not. Data is like oxygen. I can live without oil. I can't live without good data. Companies that are succeeding in this economy are those that are able to drive decisions based upon data in real time. I'd like to thank Matt for joining me here today and sharing a little bit of this. This is my contact information here. If you want to continue the conversation with myself, please feel free to reach out to me or connect with me on LinkedIn.

Thumbnail 3140

I'd like to thank all of you for taking some of your time here today at re:Invent to listen to Matt and I talk about data strategy. Matt and I will be here on the side after we're done if you have any questions. Everyone, thank you very much.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)