AWS re:Invent 2025 - Simplify Data Lake Access with IAM Identity Center and trusted identity...

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Simplify Data Lake Access with IAM Identity Center and trusted identity...

In this video, Laura Reith, a Solutions Architect from the Identity Solutions team, explains how IAM Identity Center and trusted identity propagation simplify data lake access control in AWS. She addresses the challenge of managing multiple IAM roles and policies for different user access patterns. Trusted identity propagation enables administrators to assign permissions based on human identities rather than roles, creating a unified access control model across AWS services like QuickSight, Athena, Redshift, and Lake Formation. The solution maintains service roles with minimal permissions while enhancing them with user identity context, allowing precise tracking of user actions in CloudTrail logs through unique user IDs. This approach reduces role proliferation, improves security auditing, and enables granular user-specific permissions across the AWS ecosystem.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The Challenge of Managing Data Lake Access with Traditional IAM Roles

Okay, hi everyone. I know I am between you and the party, so I'm going to try to be very entertaining. My name is Laura Reith. I am a Solutions Architect from the Identity Solutions team, and I'm here to talk to you about how to simplify data lake access with IAM Identity Center and trusted identity propagation.

Now you are probably very familiar with how this looks in AWS, right? You have several data sources. Maybe you have a traditional database, maybe you're streaming in data. I don't know, maybe you have some data sets on premises that you store in your file systems. The first step here is really to ingest, connect, and ingest this data, right? In AWS we do that with Glue. Glue will connect to your external data sources. It also offers crawlers so that you can establish the catalog and extend the schema automatically, right? Then this data set is registered, discoverable, and available to the catalog.

After that, you want to maybe transform that data. Maybe some cleansing is necessary or augmentation before it's ready to be consumed by the end users and the applications, right? But what is really important about the talk today is not really this, because you have tools in AWS to do this and you're probably very familiar. What is important is what happens after, right, when that data is ready and is in a storage like S3 or Redshift, and you have an application that needs to communicate with this data and needs to give accurate answers according to the persona that is interacting with your application.

Let's say an example that everybody's talking about during re:Invent, which is an AI chatbot, right? You have an AI chatbot and we can easily power this chatbot with Athena so that the application in the background actually calls an Athena query, and then we enrich that answer with LLMs before sending it back to the front end, right? But how do we do access control here? There are two parts to this. One is as you, as the administrator for these services, you set them up, right? The second part is the end user. How does Athena know what that end user itself has access to?

In AWS we do this with IAM roles, and you're probably familiar with IAM roles, but we have challenges with that, especially from the end user perspective. Why? Because let's say you have Laura who is a sales manager, and then you have the marketing team, and then you have the data science team. Then you need a role for every single access pattern that you have and a policy for each of those roles. As your access patterns grow, you have a multitude of roles and policies that you need to manage because you're modeling that access instead of on the user, you're modeling it in IAM policies. Your policies can become complicated, right? So you easily can reach policy size limits.

With time and with the proliferation of roles, there is a high risk potential that you might have overly permissive roles or you have permissions that are no longer used, right? But if you think about this from the end user perspective, all of these challenges span from the fact that you don't have a global identity and a unified access control model, right? When the user assumes a role, we have one role that can be assumed by many users, right? Once they are doing this assumed role session, what happens is it's very difficult to track every single action that the user has performed.

Trusted Identity Propagation: Enabling User-Based Access Control Through IAM Identity Center

Now we recognize this challenge and our first step to be able to solve it is trusted identity propagation. Now trusted identity propagation is a feature of IAM Identity Center. What it does is allow you as an administrator of the AWS service to assign permissions based on the human identity, based on the end user or the group that that end user belongs to. Now, how is that possible? Because it's a feature of Identity Center, you first need to have Identity Center itself, right? And where do the identities come from?

You first have Identity Center. You integrate Identity Center with your identity provider, which could be Okta, could be Entra, could be Active Directory, whatever your company is using. And you bring those identities into AWS.

Once that identity is inside AWS, then it becomes a single universal identity in your AWS ecosystem. That identity has a unique user ID and is recognized by all the integrated applications, and pay attention when I say integrated, because applications like what we call AWS applications like QuickSight or Amazon Redshift or QIRO that everybody wants to go to right now to the house of QIRO, all of those applications are able to talk to Identity Center and the identity store and actually extract information about this user. And that information being like what identity store this user belongs to and what groups these users belong to.

And this allows you as an administrator to go inside Redshift, for example, and actually create access to a schema for that user specifically or for a group that exists now in Identity Center. And when you do that, let's say you're using Redshift with Lake Formation, Lake Formation also understands that same user. So it's a unified access control model, and we are building up to this. Now, how does it actually work? If you see here this diagram, thank you, you're very responsive. If you see here this diagram, we see that we have all the AWS managed applications, and these applications first need to be registered with Identity Center.

So when you enable the application for the ones that are integrated, you will see an option to connect it with Identity Center. Once you do that, and you can do that in a member account, because Identity Center is an org construct. However, the way that this was designed is so that it can be delegated. So you probably don't manage Identity Center, but you want to create a Redshift application. So as long as the Identity Center is in the org, then you are able to access that directory. So let's follow the graphic. First, you register that application with Identity Center. Then you assign access to, in this case, Alice inside Redshift.

When Alice logs in and authenticates with Identity Center, then when Alice tries to access Amazon Redshift, Amazon Redshift can go to Identity Center, the identity store, check who is Alice, and check all the groups that Alice has access to and authorize access to the data, to the schema, to the role based on those users or maybe the groups that Alice has access to. So if we compare it before, I think the visual is really important because think about this. We have here three different personas. We have Alice that is the marketing analyst, and we have a role, and this IAM role is the one that Alice is assuming to be able to access the bucket. But not only Alice assumes that role. Maybe Bob can also assume that role. At that point when two of them are assuming that role inside your CloudTrail logs, then you see the assumed role, but you don't see the actual actions to access the data inside Athena or Redshift.

Another pattern that could happen with IAM roles is that that data could have access cross account access to many other data sets. So it's a little bit more challenging to track this. We trusted in the propagation, and don't take this as there are no roles. There are roles, but for the services, not for the end users, and that's the beauty of it. What you do in the application itself is no more roles. Alice will no longer assume a role. Alice would log in into QuickSight or Redshift or whatever application you're using, and the application directly will give control to the data. So you no longer have this one to many relationship.

Now if we still have roles, but the end user doesn't need roles, how does this work? It's a little confusing, right?

Let me clarify this for you with the example I gave in the beginning. Let's say we have QuickSight. In QuickSight, we have enhanced QuickSight to have the actual chat experience in there. You can power that chat experience with Athena, for example, like you have here. All the AWS services you see here still have a role. These are service roles that the AWS services use to communicate with each other. They have the minimum policies that the services need to make the call to grant access to the data. But the difference is that now we enhance that service role with the identity context of the user that logs in.

So we can say when Alice logs in to Amazon QuickSight, Amazon QuickSight assumes its service role and enhances that with Alice's identity context. Alice starts chatting with your chatbot, right? Because your chatbot is powered by Athena, then Amazon QuickSight calls Athena. This is through the service role, but in the call, it also contains Alice's context. So Athena knows this is made on behalf of Alice. If Athena is using Lake Formation, then that's the same thing. It calls Lake Formation and passes on Alice's context, and that happens for every single user.

So now, instead of you managing one, two, three roles for your users in this case, plus all the roles for the services, you manage only the roles for the services with minimum permission, and you grant access directly to the user. The good thing about this too is that it makes it easier to track because every API call, every service API call, is logged in CloudTrail, and the CloudTrail log looks a little bit different. Can you see the CloudTrail log over there? This is the CloudTrail log on an assumed role, and you see this call will come from Athena, the service itself. But you have the typical user identity portion of the CloudTrail log, and you have this portion in the red square. That's the on behalf of portion and the user ID.

This on behalf of portion will be in every single log. It will be in the QuickSight log, in the Athena log, in the Lake Formation log, and the S3 log. This user identity is unique across the AWS organization, so all of these services can actually know who this user identity is. Now, because you have this in every single log, you can query all actions that were performed by this user identity, things that you couldn't do before.

I think the biggest takeaway from all of this really is this user identity. Because if you look at it, you have the user ID which belongs to the directory, and you also have the identity store ARN, so you can identify which exact Identity Center this comes from, just in case if you're doing other POCs and you have more than one instance. You can identify exactly where Alice comes from. Because it's unique, and it's not only unique in your organization, that ID is actually unique across all organizations. It is a unique user ID for Alice.

This makes it so that you now have a centralized identity that is understood across your AWS ecosystem. It is logged so it can be tracked, which before was more challenging. Also, you can create granular user-specific permissions on the user itself that before you had to use several roles to achieve. It allows you to also not have that large number of roles that you needed to have before, and now you only worry about what is important to you, which is what you do with that data.

One more thing that I think is important is this identity-aware path through every single downstream service, because this happens right now from the entry point service. You need to have a client-facing application. In the example that I gave, it was QuickSight, but it could be a custom application that you have. Let's say you have an employee portal where your employees are also checking on ticket support. From there you can do the same thing, and I think that's very powerful because it's not just made for AWS services. You can integrate your application and use the same identity that comes all the way from your identity provider to AWS, to the AWS environment. Well, thank you very much for coming.

; This article is entirely auto-generated using Amazon Bedrock.