Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Explore what’s new in data and AI governance with SageMaker Catalog (ANT308)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Explore what’s new in data and AI governance with SageMaker Catalog (ANT308)

In this video, Shikha Verma, head of product for Amazon SageMaker, introduces Amazon SageMaker Unified Studio and Amazon SageMaker Catalog as solutions for AI-ready data management. The session emphasizes metadata as the foundation for AI success, demonstrating features like automatic metadata generation, column-level metadata forms, glossary term suggestions, and catalog federation with external Iceberg catalogs from Snowflake and Databricks. Leonardo provides a comprehensive demo showing data pipeline creation, metadata documentation, data quality monitoring, lineage tracking, and the new polyglot notebook with built-in AI agents. Karen from NatWest shares their real-world implementation journey, highlighting how they're using SageMaker Unified Studio to modernize their 300-year-old bank's data architecture and enable 72,000 employees to access data safely through centralized governance.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Foundation Challenge: Why Data Governance is Critical for AI Success

My name is Shikha Verma. I'm the head of product for Amazon SageMaker. I see some familiar faces in the room here, and thank you for being here. I'm also joined by two of my colleagues: Leonardo, who's our principal specialist and will show you a very cool demo, and Karen, who's our chief data and analytics officer from NatWest and a customer. We're delighted to have them on stage with us. Thank you all. They'll join us shortly. Let's get started. We have a power-packed agenda for you.

I'm going to start by asking a question. How many of you have been working on new AI initiatives in the last 12 months? You can raise your hands. Many of you. Awesome. That's what I expected. And how many of you consider yourselves successful in these AI initiatives and have generated net new value for your company? Some of you. Well, that's why we're here. That's why we wanted to bring this group together with this agenda because, as you know, Gartner says that by 2027, 60% of organizations will fail to realize the anticipated value if you don't have your data and your data governance together. Do we agree with that statement? Yes. That's why you're here. Awesome.

So we know that data is the foundation for AI. The tip of the iceberg is the AI stuff, the cool stuff that we all want to do, but there's a ton of work underneath that we must get in place first in order to get the true value of it. There's obviously managing your data correctly at the storage layer, then processing it all, then cataloging it all, adding the right metadata to it so that you can discover and use it, and so on and so forth. Let's come back to what you've been telling us at AWS about what you would like to do.

Please chime in with your hands. I know you won't be able to hear me because this is a silent session, but please chime in with your hands. The voice of the customer—I've been in the industry for 25 years, and what I have heard over and over again from customers is that they want really three things to enable their AI initiatives. One is they want a single place for all of their structured and unstructured data. All kinds of data, your data, your models, your dashboards, and even the agents you're creating should go into a place where anybody can discover and use them. Then you want metadata to be the main context that is added to all of these things that you're cataloging so that you can discover them easily, not just as humans, but AIs are not going to call anybody. Our agents are not going to call another human to get context, so we better bake that context into our data. Then of course all of that comes together with consistent governance across all of these datasets. How do you manage it? How do you give permissions to all of these datasets? Who can give permissions to these datasets? How is it being used? Are you able to share it with others or not? It all needs to be applied consistently across this. Does this make sense? I'm going to do another show of hands. Many people. Perfect.

Let's take it back a little bit. I think a lot of us have been in the data space for quite some time, and how does it really start again with structured and unstructured data sources coming in. We process it all. We used to do data warehouses, then we did data lakes, then we brought it all together in lakehouses, and now it's like, just leave the data where it is, and we just manage wherever the data is, but with the right context. What do you need to do this in today's day and age? The central thing is metadata, the data about the data which gives us context and meaning so that you can use it in the correct manner. This is vital for this new day and age. Because, as we talked about earlier, AI agents might call other AI agents, but they're not going to do that. You don't have time to do that because you want agents to have as much automation as possible.

Just like we all need maps and navigational systems, and as that evolution has happened, AI needs metadata. Metadata is cool. Metadata may be the word of the decade. We have to spend a lot more on metadata so that the data becomes usable for humans as well as AI. Let me talk to you about some of the solutions that AWS has around this. How many of you have heard of Amazon SageMaker? Well, that's why you're in the session, of course.

Introducing Amazon SageMaker Unified Studio and SageMaker Catalog: A Metadata-Driven Solution

We have created a new solution that we launched last year: Amazon SageMaker Unified Studio and Amazon SageMaker Catalog. This really gives you the foundation to have all kinds of assets cataloged. In the center of SageMaker Catalog, you'll see data, models, generative AI agents you're creating, dashboards, and all of that can be cataloged. You have a centralized metadata repository for all of these things and end-to-end lineage showing where all of this data and assets are coming from.

You can add context to your assets via the built-in tools that we have. Leonardo, our chief demo officer, is going to show you a demo of exactly that. He'll demonstrate how you can bake in metadata automatically without having to create it yourself, because it's a tremendous amount of work to add all this context to all of these assets. Of course, data quality is important. I hope everybody is paying attention to that because with bad data, you're going to get bad results. Agents with the hallucination that they do can create a lot of interesting insights and a great deal of confidence with very bad data. You don't want that to happen, so data quality is paramount right now.

A centralized way of discovery and sharing is something that we always encourage. There are three things that I would like you to remember from this. What does SageMaker Catalog offer you? Number one, it gives you a metadata-driven approach to all discovery. You can leave the data where it is, but you have metadata come together so that becomes how you search for your data. Then we have AI-ready data. There are two ways that AI plays into this ecosystem. One, you can use AI to get your data ready. Second, you have to get your data ready for more AI. It's a loop that feeds into the other, and we're going to show you both of those things in a demo.

We're keeping it very show and tell today. I'm going to run through some of the material and hand it over to Leonardo, who's going to show us an end-to-end demo. Then Karen will help wrap us up with a real-life customer use case, which I'm sure a lot of you might be dealing with some of the problems that Karen has dealt with and solved. Please engage with her as well after the session if you need to. Last but not least, we have open by design. One thing we have to all accept is that there are more systems than just AWS systems and more databases than just what AWS offers. How many of you use things other than AWS here? Yes, me too. What you really need is to adopt a solution that will offer you options to bring in data through, say, an Iceberg-compatible API from Databricks or Snowflake or wherever into the same ecosystem. You also have things like Open Lineage to capture lineage across the breadth of your data assets that you might have going through your system. We'll show you all three of these things today.

Let me quickly show you what is new in these categories with SageMaker this year. We've been hard at work since we launched last year. Over 200 releases have gone in across the breadth of services. There's Amazon SageMaker, but at the storage layer level, we have added metadata capabilities to S3. At the processing layer level, we have added capabilities with AWS Glue and Lake Formation, and the whole stack comes together in SageMaker. There's a lot of stuff here that I won't call out necessarily, but I just wanted to show you that we are investing tremendously in this space. Metadata being the golden child of the decade, we are recognizing it and investing a lot in this space.

New Features and Capabilities: Column-Level Metadata, Auto-Glossary Terms, and External Catalog Integration

Since last year, a lot of customers, some of you I see in the room, thank you very much. Karen is here from NatWest as well. On our top left corner, I see Tomaso from Hema in the crowd. There are other folks from NatWest in the crowd as well. Thank you all. Does anybody else see their logo here? If you're in the room and you see your logo here, please raise your hand. We'll get you in next year's slide.

Something that is new here that we launched a couple of weeks ago is column-level metadata forms. What this gives you is that your metadata doesn't stay at the table, database, or asset level. At a column level, you can add additional context to your data. You can say this particular column can be used for this kind of analysis or this column is PII, so don't use it for any non-PII purposes.

You can manage it at the column level. I'm super excited about this, and some of our largest customers were waiting for this feature to be available so that they can manage their data at a much more granular level. Leo is going to show us this one in a live demo as well. Going with the same metadata theme, last year we launched automatic metadata generation, which Leo is going to show you in our demo. You pick a table and say generate me a description, and it generates a description based on the content of the columns that it sees. It gives you recommendations on how you can use it. This year we have added automatic glossary term suggestions. Some of you use glossary terms. That's a lot of hard work trying to create all these glossary terms and then make sure that you can associate them with everything that is coming in.

With this feature, if you have glossary terms generated or a standard set of glossary terms, with any new data asset coming in, even at a column level, it can correlate information and show you, like in the screenshot here, that it can generate glossary terms for you and correlate and show you whether you want to apply these glossary terms to this asset. You just click a button and it does it for you. This simplifies and really speeds up the whole process. You can also enforce metadata rules and glossaries. It's not just that we come up with these things but nobody uses them. Business teams are not using our rules to actually upload the data. You can enforce it through this because the advantage of having a centralized system is that you can have producers and consumers playing freely, but then you can build the technology in the middle to help you get the controls that you need for your company. You can relax them as much as you want.

If you're a no controls kind of company, you can relax the rules. But more and more enterprises are becoming very conscious of the data that they use. This is new today. With all of this metadata that is getting generated and all of the assets being used, we are putting this back into the S3 layer. Most of you use S3. One thing that we are doing is that anything that we create on top of the stack in terms of additional metadata, we want to push it down all the way to the storage layer so that no matter where your agents are interacting, either at the top of the stack or at the sources, you get the right metadata to be pulled up and used. This really simplifies a lot of things for you and for us. We are actually using this feature internally for ourselves because we push our data assets back into SageMaker as well. We like to use the feature ourselves as well.

Another advantage that this gives you is where you want to see who is using these assets and how many assets are being used by whom. You get all of that from this particular feature. Down to S3, it can be shared as a reusable asset. You can bring it all the way up to the stack, or you can use it where it belongs. We have talked a lot about the AWS ecosystem. I often hear from our customers that their lives are just always diverse. They have so many third party players that they already have in the system, and they want you to just work with them. Some of these names are familiar to you. We don't want you to think that all of that investment went to waste. If you're using Collibra, Alation, or other catalogs, we have sync solutions that we have built with them.

I'm very excited to share with you that you can sync metadata across the non-AWS catalogs with AWS catalogs such as SageMaker Catalog. This means that the hard work that your data stewards and your governance teams may have done in any of these catalogs can be synced into SageMaker and can be used from SageMaker for your discovery and analysis for your developers. We will show you this in action as well. We're also introducing catalog federation to external Iceberg catalogs.

Almost all of you use Snowflake or Databricks here. With Snowflake or Databricks already in your companies, as long as the commonality is Iceberg, you can pull all of that data and use it with AWS analytics engines. You can bring that data and the metadata over to the SageMaker Catalog, and then you can use Athena, SageMaker, or any of our engines to process and use that data. This data will become visible in the SageMaker Catalog just the same way as all of your AWS data, and you can use all of our generative AI capabilities to generate auto business descriptions on all of this data. Imagine the power of that. This is one of the most exciting things we have done for customers because it really opens up the entire ecosystem for you, and you can truly bring it together.

One-Click Onboarding and AI-Powered Notebooks: Simplifying Data Analysis

Now you put all that data to use. How do you bring it together and actually use it? So far we have talked about bringing your AWS and non-AWS data together. It is all available in one central catalog. Now this new feature in SageMaker is one-click onboarding of existing datasets. How many of you are familiar with IAM and how policies are set up? Every AWS customer is familiar with that. What this allows you to do is if you already have permissions set up in IAM for your datasets, through a single click you can bring it over into SageMaker, and what that gives you the power to do is use this new notebook, this fancy new notebook that we have just launched.

It is a polyglot notebook, serverless, no provisioning required. You can have a cell in which you can have Python, another cell can use SQL, you can correlate the cells, use datasets between them, and you can use visualization. It is super cool. There is also an agent built into the notebooks. With a simple prompt it can generate your entire code base. It can create new pipelines, schedule them for you, and of course provide a great explanation of the step-wise thinking that it is going through. How many of you are using any kind of IDE like Cursor or Windsurf in your environments? You should definitely give this a try because with a prompt I was able to do deep analysis. I just fed it a few spreadsheets, honestly just raw spreadsheets. I did not even apply any data quality rules on it, and it gave me a beautiful analysis showing how this thing correlates to that. I was using dress sales and types of things that I like and the sales data that I just uploaded from somewhere, and it can give you a really cool analysis with very little work.

Demo Part 1: Creating Data Pipelines and Adding Business Metadata with AI

So now, who is ready for a demo? All right, Leo, our chief demo officer for all of you. Thank you, Chika. Hey folks, thank you Chika for all the announcements. Now let us see every single one of them in action. I have a very comprehensive demo showcasing all these features. Let me give me two minutes. I am going to switch to the demo mode and we can start. Perfect. Can you see the screen? Good, perfect.

So we are going to start with the use case. For the use case we have four different personas. We have Leo. He works for sales. He is a data engineer. He is going to create a dataset for us. Then we have Samantha that also works for sales, but she is a data steward. She is going to help us document that data asset with business context. Then we have Sarah. Sarah works in a different department, in this case marketing, and she is going to play with the data. She is going to run some SQL, create dashboards, and so on. And then we have Oliver that works with Sarah on the marketing team. He is a data scientist.

He's going to create a forecasting model using Amazon SageMaker notebooks. To give you more context, Sarah and Samantha will interact on top of the same data, but they don't know each other personally. However, they are going to innovate and collaborate on top of the same data.

Let's see the first use case. As I mentioned before, we're going to work with Leo. I'm going to show you very quickly how he's going to create a data pipeline using SageMaker Unified Studio. We are here on the home page. We go to our build session and select the visual ETL flows. Let's create one very quickly. Here you can see everything that we have available: sources, transformations, and destination targets to create our ETL jobs. But today we are lazy. It's 4 p.m., so we're going to use generative AI in order to generate the pipeline. I'm going to paste here the prompt that I'm going to use. You can see here all the description: join this table with this one, do this aggregation, and so on. I'm going to submit it, and just with that, I create a data pipeline.

Next, I'm going to save it. Just because I use generative AI to generate it doesn't mean that I cannot change whatever I want. Here I'm changing parameters in order to customize my job. I'm going to save it, and after I save it, of course I need to run it. So let's click run. We have the confirmation message that the ETL job ran successfully. Now let's see the result. It's an aggregation of different tables as you saw before. So here we are in our Data Explorer, and you can see the new table that was created: sales performance by buyer. From the Data Explorer, you can see the columns, the schema, and also a sample of the data in order for you to double check that everything is correct.

Now we have our new dataset, and we're going to transition to our next persona, Samantha. As I mentioned before, she is our data steward, and I'm going to show you how to document and add business metadata to this asset. Let's go back to the home page of SageMaker Unified Studio, this time as Samantha. We're going to go to the data part, the Data Explorer that I showed you before, but in this case I'm going to select data sources. Why? Because we're going to harvest the technical metadata from the asset that was created by Leo. I already created this data source for the demo. I'm going to just run it. It's going to take just 3 seconds, and just with that, when it finishes, that means we've harvested the metadata from that source.

So let's go and add metadata to it. Let's go to the asset section, the inventory part, and let's look for the asset that we just created: sales performance by buyer. That's going to take us to the home page of our asset. Here you're going to be able to add metadata at the column level, at the asset level, and so on. Something that you should notice here is that we have this stars icon in different sections of the asset. That means that you can generate descriptions using generative AI. I'm going to show you, but I just wanted to let you know that that's the reason why we have that icon in different sections.

So the first thing that I'm going to do is generate the descriptions. Based on your technical metadata and using generative AI, we're going to generate a description for your asset. You can reject it, edit it, or accept it, up to you. In this case we're going to accept everything. We can also add a rhythm. We support markdown, so you're going to see the format in a minute. In that way you can make it more visual to your consumers.

Now let's move to the glossary terms section. You can see the traditional add terms option , but now we have the generate terms capability using generative AI. This is valuable because based on the glossary terms that you have already defined in SageMaker Catalog Studio, we will automatically associate those glossary terms to your asset .

Something that is very important is that we also have the ability to detect if your data has PII information and based on that we will add a glossary term related to PII to that asset. You can see the suggestions here, which is why they are grayed out. If I click the spark option or the start option, you can accept or reject the recommendations. We will accept it and you can see that the third recommendation is PII. So we identified that your asset contains PII data and automatically added the glossary term. You can also go and add your terms manually the way that we have always supported , but just to let you know that you now have a hybrid between the generative AI recommendation and the manual assignment .

Let's move on to the metadata forms section. Here you can see all the technical metadata that is coming from the source. In this case, because it's AWS Glue, we are showing Glue information, but if the data is coming from Redshift, we will bring the metadata from there . You also have the option to add your custom metadata. This is very important because this is when you add your own flavor to this asset . In this demo, I added a simple example like the date in which this asset was certified, the business owner, the classification , and the SLA . You can customize this form however you want, and you can also add multiple metadata forms to the asset if you want to.

With that we complete the asset level metadata, but now we have more. Let me show you how to document everything at the schema level . We already generated suggestions for the description and the column names and also for the glossary terms . I will accept all of them because I want to show you everything without any grayed out items like the normal thing. So you can see here all the recommendations . Another thing that you have to pay attention to here is that we also automatically associate glossary terms to each column. In this case you can see for example the second column, first name, we identified that it contains PII data and is also related to name. Again, you can edit and change this. Let me show you how .

If you click view and edit now you have your own metadata section at the column level. This is a nice feature because it was inspired by you, and actually the person that inspired the creation of this feature is here in the audience. So it's great that you are seeing this right now. We added the metadata and the description section. You can also see the glossary terms suggested . You can add more glossary terms at the column level if you want to . You can also go down and add metadata forms at the column level. Remember, before we were adding metadata at the asset level with metadata forms. Now we are doing it at the column level. So the columns are becoming a first-class citizen as part of the catalog . We will add some information about ownership and purpose of this specific column, and I will click save. Perfect. Now we have our asset well documented and our columns well documented .

Of course, doing this column by column could be a lot of overhead. We also support APIs for you to do this programmatically . Now let's move on to asset filters. This is the place where we can implement security inside of SageMaker Catalog Studio. Here we create a filter that we can apply when we approve a subscription request, and in that way we can filter the data that a specific requester will have access to. In this case, as you can see, we support column and row, and I'm going to filter by state.

I'm going to put equal Florida. So if I approve a subscription request and I apply this filter, the consumer is going to be able to see data related to Florida only. Again, you can have a combination of column and row filters.

Let's move on to data quality. Here you can see that we have three sections. The first one is the overall score of the asset. The middle part is going to show you a rule by rule which passed and which failed. And the last one is a histogram that shows you how the quality score changed over a period of time.

Now let's go to my favorite part. We have now a data lineage, which is pretty cool because if you remember the pipeline that we ran at the beginning of the demo, just by running the pipeline we automatically generate this lineage diagram for you. If you can see, you can extend the diagram and you can see the source of the asset that you are consuming and not only that, you are going to be able to see also who is consuming that data asset. So you are going to see not only how it was created, but also who is consuming this asset. Remember that our lineage functionality is based on open lineage, so it's based on open standards.

Demo Part 2: Data Discovery, SQL Queries, and Dashboard Creation Using Generative AI

Perfect. So here we have everything that we need as data producers in order to publish our asset. I'm going to click publish asset. When I click this button, this asset is going to be available and visible for the rest of the company. Perfect. It's published and well documented. Now let's switch to the consumer persona, in this case Sarah, the marketing data analyst, and see how she explores the data catalog. I'm going to show you two different ways to do it. The first one is the traditional way, using a search engine.

I'm going to browse by assets. You can see here all the assets that we have in the catalog. You can see here that I can filter by glossary terms. As you remember, we added the PII tag to the asset that we created. So if I filter by PII, I'm going to see all the assets that contain PII. Now let's go to the more innovative way to search for assets. You can use Amazon Q. Just using natural language, you can ask questions on top of your catalog. In this case, give me a list of the assets that contain PII data, and that's it. It's going to give you a list of all the assets. Also, next to the asset, as you can see here, we show you the criteria that we used in order to tell you that that asset is the right one for your search. You get more details about the context that we used in order to identify this asset.

So here, remember we're seeing everything from the consumer point of view. Look at this, it's beautiful. It's a well documented asset at the asset level and schema level, with data quality scores and data lineage diagram, everything. So I can make an educated decision as a consumer based on all the metadata that I have here. I'm going to click subscribe. Here you can see that as a requester and subscriber, you need to fill out information. Just to let you know, you can customize this form and you can ask here whatever you want based on your use case. I'm going to click request and that's all that I need in order to request access to an asset.

Let's switch back to Samantha, the data steward. She has the ownership to review and approve the subscription request. Go back to the home page. I'm going to my data tab. I can go to the subscription request section. I can see my subscription request there. I can see the details. You can see here all the information is coming from marketing. It was Sarah, and the use case that Sarah put here. Here you have the option to select full access or access with filter.

Of course, we are going to implement access with filter. You can see here the filter that we created together. And then I'm going to put here the reason why I'm approving. I'm going to click approve and just with that, starting now, Sarah has the option to access the actual data. Let me show you that very quickly. We are going back to Sarah and now let's play with the data. Here, let me show you now we have a notification from SageMaker Studio.

Sarah received a notification directly from the UI showing that Samantha approved the request. She can see all the information that Samantha added. Now let's consume the data. I go directly to my data explorer and click preview data, which takes us to our SQL experience. I'm going to run a quick discovery query. I can see all the data there , but there is a problem that I haven't told you yet. Sarah doesn't know how to run a SQL query. She got the data analyst job, but she doesn't have that skill. No problem. Sarah can use generative AI to generate the queries that she needs to run. For that, she has to go to the Q agent that we have as part of the SQL experience and ask questions on top of the data.

She asks, "Give me the top 5 cities by revenue." Just by asking that, she was able to execute that query. Not only that, she can keep a conversation. Based on these results, she asks, "Give me the type of event that is more popular in those cities." The system keeps the context of the previous query, and as you can see here, just by executing the query , she gets the result that she's looking for. Sarah is interacting with the data without knowing how to run a SQL query.

Sarah identified that this is the right data that she needs to consume. Now she is going to create a BI dashboard to show some visuals. We go back to our asset and select the action button, then click open using QuickSight. The good thing about this is that it opens a QuickSight instance with everything already associated to the asset. Something I also forgot to tell you is that Sarah doesn't know how to create a dashboard either, so she's going to use generative AI to create it. Here we have the build option. She asks the assistant for a visual that shows the top users by revenue or by spend.

She likes the visual and adds it to the dashboard. Perfect. We can see it here, and now she's going to add a second visual. It's a different question but related, and she got the visual that she was looking for. She's going to add it to the dashboard as well. I'm going to clean it up a little. I'm going to close here and remove this default option, and I'm ready to publish it. I click the publish option and need to put a name for the asset. When we click the publish button, that creates a new asset inside the catalog. You're able to catalog a QuickSight dashboard.

Let me show you how that looks very quickly. Let's go back to SageMaker Unified Studio. We are there. Let's go to data and then to assets again. In this case, I'm going to look for any assets related to QuickSight. As you can see here, I have a new asset called Revenue, and if you see the type, it says QuickSight dashboard. You can also augment that asset using metadata and business metadata. I'm adding the README section here to my dashboard and also adding glossary terms to the dashboard. You can add metadata forms as well. I'm not going to do that as part of this demo, but just to let you know that you can document a dashboard in the same way that we document a Glue table.

So we are going to now publish the asset, in this case the dashboard. As you can see, Sarah was not only a consumer, she became a producer. With the data that she consumed from Samantha, she was able to create an asset, in this case a QuickSight dashboard that she published as part of the catalog. She became a data producer. Now let's go to the last persona, my friend Oliver. He's a data scientist, and I'm going to show you how Oliver is going to create a forecast model using our new notebook experience.

Demo Part 3: Building Forecasting Models with AI-Assisted Notebooks

Remember, Oliver works in the marketing department, just like Sarah. Here you can see that we have access to the Adrias console because that's the different experience we are offering with these notebooks. You can have a hybrid approach where users can work with SageMaker in Studio using IDEs, and you can have more technical users using IAM roles and users to access the experience directly.

Let me show you how this works. I'll click on SageMaker. It identifies my roles—in this case, Oliver—and based on that, it will offer a customized experience. This is a new experience, as I mentioned before. You have the homepage with all the options available , including data pipelines, machine learning, and everything else. Let's focus on the data part for now . You can see that Oliver has the same level of access as Sarah . Why? Because they work in the same project and the same line of business, so they have the same level of access. Even though they're using different experiences, here you can see the same data asset that Sarah got access to , but in this case, Oliver is going to use a notebook to work with the data.

Another thing I should mention is that Oliver doesn't know how to create a forecasting model . So for that, he's going to use the AI system that we have as part of the notebook experience. Here, Oliver is asking to create a forecasting model about sales. This is very interesting because the agent starts analyzing the request and divides the answer into different stages . I can provide the answer, but I'm going to do it step by step. What I love about this is that Oliver is going to interact with the agent to execute every single step. Oliver says yes, let's implement step number one, and you're going to start seeing how the agent populates every single cell, generating and executing the code with my permission, of course.

You can see here visuals showing that after the agent explores the data, it identified that we need to do some aggregations to get the information we need for the forecasting model. We are running the aggregation right now . I need to approve that step. Let's do it and start populating the cells to perform the aggregation. Of course, you can see the results and decide if you want to continue with the steps or change something. In this case, it's a demo, so everything is straightforward . Step two worked beautifully. Now let's execute step three. But again, something I like about this is that you need to validate every single step before you execute the next one.

Now we are executing the predictive analysis . Oliver didn't have to go and run any line of code . Then the last step is executing the recommendations and visualizations. In this case , we are more focused on the recommendations than the visualization. Once I approve the step and scroll down , we are going to see all the recommendations for the forecasting model based on the data. As you see, the only thing I had to do was add the right prompt and then interact with the agent to get all these recommendations. This was a long demo, but if you see, we covered different personas.

As you can see , as I mentioned before, you have four different personas, all of them interacting on top of the same platform. They don't need to know each other, but they are able to collaborate and innovate on top of the same data. Now Karen, please come up. Karen is going to show you that all of this is possible in reality.

NatWest's Data Transformation Journey: A Real-World Customer Success Story

Thank you everybody. I am delighted to be with you this afternoon, and I hope you share my concern for both Sarah and Oliver's apparent lack of skills in their chosen profession. For those of you who don't know NatWest, let me start by bringing to life our story and our journey. We are one of the main high street banks within the United Kingdom, and we are on a mission that I'm proud to be leading a part of.

We have been in existence for 300 years now, and I share that with you because we will celebrate our 300th birthday in 2027. When I join with AWS, my strategic hosting partner, I'm always entertained by the difference in our ages relatively. But as a 300-year-old bank, there is no way we would exist today without having a strong culture of innovation and constantly striving to find new ways to meet and serve our customers' financial needs.

As we stand today, our ambition is to succeed with our customers as a sustainable partner, and we want to be the bank that turns possibilities into progress. I am a banker by trade, and I have actually spent 28 years with NatWest Group. What I would say is that today more than ever before, banking is dynamic, exciting, and full of possibilities. That's why our purpose of turning those possibilities into progress really resonates with where we are today and where our customers are.

We want to use the data that we hold about our customers to understand their hopes and know what they need. We operate in a very dynamic and changing world, so our data and our responsible AI are critical to being able to help our customers when they need us most. I have captured some of our core statistics. We have over 20 million customers in the UK, and we process almost 750 million financial transactions every month.

We would not be able to serve our customers if we didn't use our data as an asset, and we know that in the future, AI is how we will evolve to better meet our customers where they are. As the Chief Data and Analytics Officer, I am leading NatWest's data transformation. As we have been running through our session this afternoon, there have been a number of things that have really struck home to me, and one of which is that now feels like the perfect time to be doing this work.

The capability that Leo just demonstrated is not make believe; it is real. We spent a day with our Chief Information Officer, Scott Marker, as a team, as a data team, and we were able to demonstrate almost exactly what Leo has just demonstrated for you today, but using our data and creating our own assets in our own marketplace. So genuinely as a customer, I am here to say that it is real.

If I can talk to you about our wider data transformation, I am here to deliver high-quality, well-curated data in the cloud. One of the big shifts that we are leading is moving from being a very small focused team of data professionals into making our data accessible for all of the roles and all of the teams who need to use it. I am proud to say we have quite a strong track record in the deployment of AI. We recently achieved 16th global ranking in the Evident AI league tables, which is no mean feat for a bank of our size, and we have been working diligently with AI to ensure that we are able to leverage that capability at scale and indeed to get ready for generative and agentic AI at a much higher rate than we use them today.

We are on a transformation journey, and we need to move to a modern data architecture that is simpler, removing some of our heritage and legacy data platforms. We need to do it really quickly so that we are able to deliver generative and agentic AI solutions to our customers. We will support our organization's strategy of growing our business, simplifying our estate to give us greater agility, and having strong governance.

Having strong governance is at the absolute heart of how we deliver control. In the UK and in Europe, we have stringent regulation that helps us achieve the right level of quality and control over our customer data. I understand that in the States it's mostly financial services and healthcare who care about that, but in Europe and in the UK we have really strong regulation. Some of the demos that you've seen today help us achieve that regulatory standard.

How we're going about it is a little bit unique as well. We have created and launched in July of this year a unique three-party partnership across ourselves, NatWest Group, across AWS, our strategic hosting partner, and across Accenture. This is all to make our journey go faster, so we are bringing the best of each of these three companies to move quicker to that modern data architecture. Our outcome—we'll know we are successful when we are able to deliver generative and agentic AI at scale. How that will feel is that we are personalized, we are relevant, and we are able to be tailored when we're dealing with our commercial and institutional customers and our customers across the whole organization.

We had a provocation to say if we did data transformation differently, could we go faster? Could we ensure that we would deliver the results, and could we make it safer? I genuinely believe we're on a path to do exactly that, reducing our timeline to completion from five to six years to three to four years. This therefore becomes our actual customer mission. So we are looking to move the whole organization to something we call our digital spine, and that's how you take a three-hundred-year-old bank and make it modern, agile, and able to compete with data cloud-born companies.

We will be using radical technology and organizational simplification to get there. If you think about making a change to some of our core systems, we carry a huge overhead. Through simplification, we'll be able to reduce the cost of change and make it faster to keep up with growing and changing customer needs. We want to be known as an AI-powered bank, and we're committed to do that. The thing that NatWest brings to our partnership is this relentless focus on the customer—what do they need, what do they expect, and how can we deliver?

All of this should translate into sustainable returns for our customers, for our shareholders, and for the societies in which we operate in the UK. So a really compelling mission. As a three-hundred-year-old organization with a small number of really important data teams, you can see that I have a lot of data, but it is difficult to find. It isn't organized, and often it's difficult to get the insights or to help support making really good data-driven decisions because of the mess that you can see in front of you.

One of the best examples of this is that today sometimes it is still fastest to find the data you need by phoning someone than searching for it. That is why we need the glossary that's part of SageMaker Unified Studio. I think that Shia did this incredibly well to say we are getting ready for a world where metadata is the way in which our agentic AI solutions will find the data that they need and access it using the control that we're building through the SageMaker Unified Studio. Humans need maps, but AI needs metadata.

We've touched on many of the core technical capabilities that we're seeking to leverage as part of the NatWest Group's journey to that modern data architecture. Let me skip past that to say, realistically, how am I using catalog and how am I using it today? The key thing that I would like to hammer home is that I'm very proud to lead a team of almost three thousand trusted data professionals, whether they're engineers, analysts, or data scientists. I know that I can help them find what they need.

To be faster and to be more customer focused, I need to federate or increase the access that all seventy-two thousand employees of my company have to the data that we hold about our customers. The only way that I will achieve that outcome is having my data and my APIs discoverable through the catalog as part of SageMaker Unified Studio.

It is also the only way in which I will drive reuse, which makes me faster and it makes me more consistent in terms of the products and the services that I'm able to offer. I can share my data without moving my data, without storing duplicate copies of my data in every different team. And most importantly for me, I can keep my most critical asset, my customer data safe because I get to control who is able to access down to a really fine level through the catalog within SageMaker Unified Studio.

Key Takeaways: Metadata, Unified Platforms, and Incremental Innovation

So I'd like to invite Shia and Leo back to the stage, and we're going to share with you our top three takeaways. Thank you Karen. Was that fun, folks? Did you learn something new? All right, so we'll bring this to a wrap. We'll all be around. If you have questions or anything, we'll be hanging out here, so please feel free to grab us. But here are our top three takeaways.

Number one, I think we talked about metadata a lot today. Extend your architecture so that you have metadata added to your mainstream. You have governance added to your mainstream so that you can amplify your journey with AI. All right. Second takeaway, Leo?

Yeah, as you saw, SageMaker Unified Studio is a central place that you can use. It doesn't matter the type of personnel that you have in order to have them innovate on top of the same data. So using SageMaker Unified Studio as a central place, as NatWest I just mentioned, was the magic formula to make everything work. And mine is start small and build incrementally. We've been proud to be part of the journey of innovation with AWS starting with DataZone and moving into SageMaker Unified Studio. And as part of that work we've been a strong voice of the customer to help shape that design. But as a company, it is incredibly important to make progress quickly, supported by our partners in AWS and Accenture. I would say start small, build incrementally, but deliver initial and ongoing value in order to continue to get your organization's support to make this change.

Cool, thank you very much. Thank you. Folks, don't forget to leave your feedback. Please leave your feedback. Also, I want to highlight that we tried something new with this session, so we're looking for your input. We did very little tell and a lot more show, and we included a customer example for you to learn from. So if this was useful, please do put it in the survey so that we can do this more going forward. And of course there is a ton of SageMaker sessions throughout the remaining days. I think we have thirty of them, so you can drill into whatever areas you want to get into and come grab us. We are here. Thank you all. Thank you folks.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community