Kazuya

Posted on Dec 8, 2025

AWS re:Invent 2025 - KMS over the decade: How architecture evolved to earn customer trust-SEC218

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - KMS over the decade: How architecture evolved to earn customer trust-SEC218

In this video, AWS Senior Product Manager Kevin Lee and security specialist Samuel Waymouth explain how AWS KMS has evolved over 10 years to earn customer trust through three design tenets: security, durability, and availability. They detail KMS architecture including FIPS 140-3 Level 3 validated HSMs that process over 30 billion requests hourly, key management options from native KMS to Custom Key Store with CloudHSM and External Key Store (XKS), and how encryption combined with IAM policies addresses digital sovereignty requirements. The session concludes with an introduction to AWS European Sovereign Cloud, a new partition launching in Brandenburg, Germany with 8 billion euro investment, featuring separate infrastructure, EU-resident staff, independent billing, and enhanced controls for metadata and law enforcement access requests to meet European sovereignty needs.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: AWS KMS Evolution and Digital Sovereignty

Good morning. My name is Kevin Lee. I'm a Senior Product Manager at AWS. I've been with AWS for seven years, and I'll let Sam introduce himself. Good morning, everybody. My name is Samuel Waymouth. I am a specialist in security and compliance based in Amere, UK, but I'm from New Zealand. My other job was working for His Majesty the King in the Royal Signals as part of the Joint Cyber Unit Reserve in the British Army.

So it's a pleasure having you here today. Who's got a first hand? Is this your first re:Invent? Put a hand up if you don't mind. Wow, that's a lot of you. Anyone got more than seven re:Invents? I see a man there. I have the matching t-shirt, so 2017 was a vintage year. Thank you for attending. It's a lot of queuing that you have done, so hats off for being here on Friday morning.

Welcome to our session SEC218, how AWS KMS over the years has evolved to earn customers' trust. Today's agenda will look into how AWS KMS architecture has evolved over the years from the product lens perspective. Next, we'll cover what it means to be having control versus having ownership of your encryption keys, how they differ, and how AWS KMS offers you choice on this matter. In the second half of this presentation, we'll cover how encryption and authorization work together to solve digital sovereignty use cases, and then we'll cover our new partition, the AWS European Sovereign Cloud.

The Genesis of AWS KMS: Addressing Cloud Encryption Challenges

So I'd like to begin with a short history lesson as to why AWS KMS was needed back then. To do that, we need to go back to the state of the world ten years ago. At the time, cloud computing was gaining momentum, and regulated industries such as financial services, government, and healthcare were busy migrating into the cloud. As these industries started migrating their workloads into the cloud, customers were busy trying to analyze the impact of their compliance and regulation requirements while being in the cloud. When it comes to encryption and key management, they were always mandatory.

Back then in AWS, encryption was relatively common, but key management wasn't. For example, services like S3 and EBS offered encryption, but keys were managed by the services themselves. As you know, exerting control over your encryption keys ultimately determines who has access to what data. Trying to offer traditional key management in the cloud didn't make much sense. For instance, we couldn't ask each service team to operate a fleet of HSMs just to do key management. Users are authenticated and authorized differently in the cloud, and HSMs often require special interfaces such as PKCS 11 in order to perform cryptographic operations.

So when we set out to create AWS KMS, not only did we have to design a secure system, but also one that's scalable in the cloud. To help us guide these design decisions, we came up with three design tenets that we strictly adhere to when developing our service. These are security, durability, and availability. First and foremost, security is our top priority. It is not just our responsibility. By entrusting your data and the encryption keys, you're entrusting us with your sensitive data as well.

Second, keys must be durable as well. You rely on us to safeguard your keys from any natural disaster or event, so we have multiple redundancies in place to make sure that your keys are safeguarded. Third, availability. You depend on us to run your most critical workloads, so your encryption keys must be always available at all times.

Building Trust Through Transparency: KMS Architecture and Scale

Underpinning everything is transparency. We are in the business of earning your trust, and we knew from the start that earning your trust would be very difficult. People are used to hugging their encryption keys, and now we're telling them to put them into the cloud. So not only did we have to design a secure and scalable system, but also one that's trustworthy. To help us do so, we do so by being transparent with how we built our service.

For example, through sessions like these, we educate customers on how we operate and how we build our service. Third-party independent auditing helps ensure what we claim to do is rightfully done. We follow standards like FIPS 140-2 Level 3 to ensure that our system meets the security requirements of government and international standards.

So with these design philosophies in mind, we set out to create AWS KMS. Back in November of 2014, we launched AWS KMS with a handful of services such as S3, EBS, and RDS. Here is the actual blog post written by Jeff Barr himself eleven years ago. So where is AWS KMS today?

We're highly scalable to meet your needs. We process over 30 billion requests, cryptographic requests every hour, which is unheard of in the HSM market today. On top of that, we offer five nines of SLA to ensure your critical workload always stays up. KMS is convenient and available in many AWS services through transparent service-side encryption, allowing you to enable encryption in your workload easily. It is also available in some products from vendors such as Salesforce and Atlassian, allowing you to bring your own KMS key and encrypt your own data respectively on these platforms.

KMS is proven and trustworthy. We are FIPS 140-3 validated, independently audited, and we provide public documentation on how our system operates and the processes that we have behind the scenes. Here's a high-level architecture view of the KMS service at the regional level. As with most AWS services, we make a distinction between the control plane and the data plane. When you make an API request to the KMS regional endpoint, we first load balance using DNS amongst the different availability zones. Within the availability zone, we further load balance again among the front-end hosts which process your authentication and authorization on your request itself. If the request requires HSM operation, we forward that to the backend HSM for cryptographic operation.

Understanding Control vs. Ownership: Customer Perspectives on Encryption Keys

Over the years of operating our service, one thing we learned from many customers is that customers have different interpretations around what it means to be in control of their encryption keys. This understanding usually comes from various frameworks, regulations, and compliance requirements. We found that customers who are familiar with the concept of a hardware abstraction have a more favorable view on the resources they own and control remotely. For example, you own an EC2 instance or an S3 bucket within your own AWS account, and you control the access through the IAM policy. On the polar opposite, there are customers who believe that control means physical ownership, and when it comes to encryption keys, the keys must be within their data center to control it directly.

The challenge that we face is that many of these requirements were written before cloud computing became more common. While security is an important factor, performance factors like scalability and availability are often treated in isolation and as an afterthought. Although we have seen over the years that customers have changed their view on this topic, time and time again we still see customers who get too focused on this one aspect, missing the bigger picture and the actual risk that they're trying to mitigate.

The key message here is that while encryption keys are an important part of your data protection strategy, you need to think holistically about your overall security. Consider aspects like IAM policy, resource policy, service control policies, and resource control policies, and use concepts like segregation of duties to ensure that specific users and roles have access to your data. Most importantly, your keys belong to you. AWS has designed its core products and services to prevent anyone but yourself and those authorized by you from accessing your content, and this principle extends to KMS keys as well.

Import Key Feature: Enabling Customer-Managed Key Material

As you'll see in a bit, this foundation of trust and control became the bedrock for addressing the evolving sovereignty needs within our service. From the KMS perspective, we focus on offering customers more ways to control their encryption keys while minimizing the impact to the customer experience. A couple of years after we launched KMS, we began hearing feedback from customers. Customers loved how KMS is integrated with AWS services and how super simple it is to enable encryption in their workflow. But a small set of customers still held out, saying that they wanted to do their own key management. These are important keys, and the customers wanted extra assurance that their keys are well safeguarded.

So we launched import key, a feature that allows you to manage your own key within the HSM you control and provide us with a lease or a copy of the key within KMS to be used. The way it works is that you download the wrapping key from KMS, wrap your secret key within the HSM that you own, and provide the wrapped secret key back to KMS. We'll unwrap it and use it securely within our service.

When we look into why customers did this, their reasons are quite straightforward. Many organizations have a dedicated team for managing encryption keys for years. These teams were responsible for creating and deploying keys throughout the organization, and they just viewed cloud as another location to deploy their keys into. They are comfortable with the existing process and approach that they already had.

But the challenge here is that cloud is designed to be highly agile and adaptable, and having to wait for a single team to deploy something goes against this principle. So what are some common misses when customers did this? They quickly learned that key rotation became a nightmare scenario. When you rotate the key outside, before you can rotate it, you need to decrypt all the data that was dependent on the old key and re-encrypt it under the new key. There's a tremendous amount of work, effort, cost, and risk involved for everyone.

In native KMS, we solve this elegantly by offering automatic key rotation where you just set a time frame from 30 days to 7 years and we'll rotate the key for you. Another pitfall the customers realized with importing a key is the management itself. If you're importing a single key, that's not a big deal. It's relatively simple to do. But what if you're dealing with hundreds of keys, or even thousands of keys? Now you need to independently upload them and configure their use within AWS, and that is a lot of operational headache that most customers don't want to deal with either.

So when should you be doing this? One good use case is asymmetric keys. If you have a signing use case such as code signing or signing a certificate and you want to bring the same workload into AWS, you can upload the private key into KMS securely using this feature and bring the entire workload into AWS. Another use case is if you want to control your own durability of the key itself. While KMS offers 11 nines of durability, if you feel safer knowing that you have a copy of your encryption keys sitting beside you in the HSM, then this is one way to achieve that objective.

Custom Key Store with AWS CloudHSM: Dedicated Hardware for Key Management

Two years after we launched the import key feature, we started hearing from another group of customers. This time, not only did they want to manage the key themselves, they wanted to manage the entire HSM as well. And we launched custom key store backed by AWS CloudHSM. A custom key store is a feature that allows you to bring an external key manager and connect it to KMS. This allows you to own the cryptographic material and control the cryptographic operation within the HSM that you control.

The way it works is that your keys are actually hosted inside the CloudHSM instance, and when the call is made to KMS, we simply relay that call back to the CloudHSM instance. Initially, we couldn't really understand why customers actually wanted this. After all, we spent significant energy developing a highly secure, highly scalable key management service, yet customers still wanted to manage their own HSM for some reason. When we dove deeper as to why, it all stemmed from customer misconception around multi-tenancy.

They were afraid of things like the noisy neighbor problem where the action of one tenant would affect another. And to avoid this kind of issue from happening, they would rather just pay extra cost to get their own dedicated hardware instance. Another concern that we heard was what if KMS accidentally leaked a key or accidentally intermingled a key from one customer to another. In this kind of situation, isn't it better to have an isolated, independent instance to prevent this from happening?

And my answer here is no, and you'll see this in a bit because KMS is designed such that this cannot simply happen. But nonetheless, customers still choose to use custom key store and CloudHSM to mitigate the risks that are unfounded. So what went wrong when customers did this? The main challenge here is the scalability and availability issues. Instead of relying on us to scale for you, now you need to manage your own capacity, set up your infrastructure, set up your networking.

That's a tremendous amount of work and effort on your side to manage all of that, and that comes with an additional cost overhead that most customers often overlook. This kind of setup usually works for low throughput workloads, such as attaching an EBS instance or starting an RDS cluster.

However, anything high throughput, such as analytic or data processing jobs, will often fail because your data requires constant access to the key itself, and the custom key store becomes a bottleneck at this point. So when does it actually make sense to use custom key store and CloudHSM?

Based on our experience, it makes the most sense for the initial migration itself. If you don't have the resources to refactor or rearchitect your application, then custom key store allows you to simply lift and shift your existing workload into AWS without much effort. We've been talking about custom key store and CloudHSM for a while, and I think this is a good time to talk about our own HSM, the KMS HSM itself.

Inside the KMS HSM: Stateless Architecture and FIPS 140-3 Validation

Since day one, we have been investing in and developing our own hardware. We own the entire stack from the hardware and firmware all the way through the application code itself. Throughout the development process, we submit our system for FIPS evaluation. The FIPS standard and the Cryptographic Module Validation Program are overseen by NIST, the National Institute of Standards and Technology, which is a US government agency.

The FIPS standard is important because it sets the benchmark for cryptographic security, helps prevent weak implementation, and is widely trusted by many governments and industries alike. This program itself has evolved over the years. Recently it was updated to FIPS 140-3, incorporating modern security requirements and aligning with international standards. We reached FIPS 140-3 security level 3 back in May of 2023.

So what happens inside the KMS HSM? What makes our HSM a little unique is that it has no concept of tenancy built into it. It's a stateless machine with no persistent state kept inside, meaning we don't store customers' keys permanently in the HSM or partition the hardware in any way on a per-customer basis. You can kind of think of it like a black box where input and output behavior is predefined like a mathematical function.

Each request is load balanced and handled independently from one another, and this is done at tremendous speed and scale, processing over millions of requests per second across thousands of HSM fleets. When handling your requests, your KMS key always remains in encrypted format. If and only if the key must be used, we send an encrypted copy of your key into the memory of the HSM.

It's decrypted within the memory, the operation is performed, and the key is quickly flushed out and deleted from the memory itself, and the payload is returned back to the requester. Every encrypted data or ciphertext produced by KMS has a key ID embedded within it as well. This helps ensure the integrity of the ciphertext itself and also prevents your key from being used without proper authentication and authorization for the ciphertext as well.

Now that we've covered what the KMS HSM is, let's dive deeper into how we protect and secure the software running on our HSM. This is a really dense slide, so let's break it down one point at a time. When the HSM host is loaded with the software serving customer requests, it is in operational mode.

When it's in operational mode, there is zero operator access. No software updates can be made, and customers' keys are provisioned within the memory only. It is simply impossible for our operators to access the HSM at this time as there is no SSH, no private API, and no debugging mechanism whatsoever. As I explained earlier, every input and output is predefined, and only the attested KMS front-end system can make a request to our HSM.

When the host reboots itself or initializes for the first time, it enters a non-operational state. One thing to note is that our entire software stack runs in memory as there is no persistent disk within the HSM itself. So when the HSM loses power or reboots itself, the software along with the keys themselves are all quickly flushed out from the memory itself.

Once we have this blank state, this is the only time when our operators can upload a new version of the software into the HSM. That requires a rigorous code review and multi-party approval process.

This is a two-step process overseen by a quorum of operators to ensure that no one can introduce unapproved changes into our system. Now you might be wondering to yourself, how do we actually validate what we claim to do? And this is where the FIPS validation comes into play. Each time we make a modification to our software, we submit it to the FIPS evaluation for verification, ensuring that we're not deviating away from the standard itself.

In addition, there are SOC controls which help ensure that our processes are always followed. And I have highlighted one here, the SOC 1 Control 414, which says the firmware is always validated or in the process of being validated at all times. There are many more controls all designed to give you transparency and assurance behind our process, and I highly encourage you to take a look at our independent report which can be found in the AWS Artifact.

External Key Store (XKS): Bridging On-Premises and Cloud Key Management

Coming back to the KMS architecture, last but not least we have something called External Key Store or XKS. The way XKS works is similar to the Custom Key Store, but this time we're introducing a proxy in between KMS and your on-premise HSM. And all the cryptographic operations happen inside your on-premise system itself. Now, we truly believe KMS is the most secure, most durable, and most highly available place to safeguard your keys. Nonetheless, sometimes you're put in a difficult situation where you're working with a stakeholder or requirement that is absolutely hard to change. While XKS is not an optimal solution, it provides you a way out of this trap and gives you a path forward.

When we dive deeper as to why a customer wanted XKS, one of the misconceptions that we found was the idea around ciphertext portability or the ability to take encrypted data and decrypt it elsewhere outside of AWS. For example, in case of a serious political event or natural disaster, the idea is that you can take a copy of your backup data and decrypt it locally using the key that you've been holding on to. But the problem with this idea is that there's actually no way for you to download the encrypted data as is from the AWS services.

For example, when you make a get call to S3, everything is decrypted on the server side and provided back to you over a secure transport channel in a plain text format. So holding on to your keys gives you no additional assurance or risk mitigation whatsoever for this kind of scenario. Another reason we heard was the ability to take out your key or yank out your key, preventing anyone, including AWS itself, from decrypting your data. But again, this reasoning is flawed because there are multiple ways you can achieve this objective within AWS, and looking at KMS only, you can do so by importing a key and taking it out later on when needed.

Looking at the challenges, similar to the Custom Key Store, we have the same scalability and availability issues. But what makes XKS even worse is that now you're introducing a single point of failure into your workload. If the proxy goes down or if your on-premise HSM goes down, your entire workload will go down as well. Not to mention every workload you have within AWS will need to make a network call over the Internet to reach your on-premise HSM, which introduces latency and instability into your workload as well.

So when should you use it? And my frank answer is ideally never. However, if you're working with a demanding customer where you're put in a situation where unreasonable demands are pushed upon you, then as a workaround XKS offers you a way out of this trap and gives you a solution for it.

So here's an example of how the External Key Store architecture can look like. Everything to the right side of the KMS in a green box belongs to you that you control. Meaning you need to manage the VPC and the instance that hosts the proxy itself, the link between the proxy and your on-premise HSM, and your on-premise HSM itself as well.

So, taking a step back, we started out by solving simple customer needs. Customers needed encryption keys and needed a way to control their encryption keys while being in the cloud.

As more and more customers start to migrate their workloads into AWS, we encounter customers with different expectations around what it means to be in control of their encryption keys. Some are comfortable with the cloud-native approach, while others believe that physical ownership of the key is crucial. To help bridge this gap, we had to innovate and come up with a technical solution that meets customers' needs. We understand that every customer has a different requirement and comfort level, hence we provide options. But it is important to understand what these options bring and the challenges that they have before pursuing them.

Digital Sovereignty Fundamentals: Technical, Organizational, and Contractual Controls

I've been talking almost exclusively around how you can use encryption to control access to your data, and your key is the gatekeeper in all of this. Now let's apply this knowledge to solve a use case that's hard for most customers, the digital sovereignty use cases. Hello, everyone can hear me, just hands up, just make sure the audio is working. So, digital sovereignty use cases first. I want to talk about what the term digital sovereignty means because it may be something a bit different to what you may understand as data sovereignty, data resiliency, and residency. Digital sovereignty is a term that we have coined at AWS to help cover all of these topics because these four areas tend to be what customers are most concerned about when they talk about digital sovereignty.

They worry about country of residence. Where is my data going to be stored? Where is it going to be processed? Are there any third parties involved in that processing? The second area is around operator access restriction. They want to know who can go into a data center, plug into the back of one of our racks and have a look at something, or who in the service teams can access their data and when they do that. Resiliency and survivability, and this goes a bit beyond the usual concerns of a natural disaster. As we saw a couple of years ago with Ukraine, when they moved a lot of their public sector workloads into AWS commercial cloud, it can deal with situations where there is kinetic warfare, where entire industries or entire countries are failing.

Sovereignty usually refers to country-level law or regional law. The fourth item is independence and transparency, and this is where customers want to be able to say that there's no dependency on US East One, which some of you may have had some challenges with in the last month or so. They want to make sure the technology stacks are truly independent. They want to have special requirements such as only citizens of certain countries or residents of certain countries are allowed to operate these environments. All of these things add together to make the picture of how customers understand digital sovereignty. This also varies by country and by industry, so it's a very complex field. Sometimes in the same country, there are multiple laws dealing with different aspects of sovereignty in different situations.

There are three broad categories of controls that we provide, some on our side of the shared responsibility model and some on your side that you can configure and control. These three categories are technical measures, organizational measures, and contractual measures. Using AWS KMS is a really good technical measure. Protecting your keys that are stored in KMS with resource policies is a technical measure. Protecting access to the KMS API with Identity and Access Management policies is a technical measure. Role-based access control to keys, backups, and logging are all technical control examples.

Organizational measures include policies such as we only use these cryptographic algorithms, or we keep data for this retention period, whether it be from external forces such as laws or regulations that require you to set those values or something your organization sets for itself for its own purposes. Standard operating procedure documentation, hiring procedures, vetting staff, and determining what country people are from and what their background is before you employ them are all organizational controls. We employ these measures. We security clear a lot of our employees. We do background checks on our employees, and so do you.

Contractual measures are the not so fun stuff, the service level agreements. Who's read the service terms for AWS extensively online? Hands up. Has everyone read the service terms? Alright, no, not even one hand. Okay, interesting. Your end user license agreements, third party supplier assessments, all that sort of good stuff, and independent audits are in your contractual controls piece, the bits of paper that lawyers can go and refer to should things go wrong. So sovereignty, technical measures, what do we have?

How does AWS Key Management Service support digital sovereignty based on those requirements of particularly operator access, particularly residency and control? Let's dive a little deeper into that. Data sovereignty and encryption combined to talk about control of location. Perhaps it is enough of a control to say that my data is stored and processed in Frankfurt in Europe. Hands up, who's actually here from Europe out of curiosity in the audience? Okay, I've got a few people. Yeah, me too. Well, London technically not part of Europe, but that's a different story.

The aspect of location of data is very important now. We already have contractual commitments, and our data processing addendum says that when you put data in a particular region, it will stay there unless particular circumstances occur. AWS KMS is a regional service, so we talked a little bit earlier about control plane and data plane. The KMS service, when you create a key, it is available in the region, and therefore it is highly available and highly resilient because we look after that stuff on our side of the shared responsibility model.

Keys don't transfer outside of a region unless you say so. Even multi-region keys, technically there is no point where the keys are available in plain text outside of a KMS Hardware Security Module, even though you've got multi-region keys. So they're always secure within the KMS environment. Now, by protecting the key used to perform encryption, we can ensure that no one, even if they have access to the ciphertext, can get the plain text.

Remember, crypto is a triad of three things: plain text, ciphertext, and a key. You need two of those things to solve the third part of the equation. Even if a bad actor gets your ciphertext, there's no way to extract a key from KMS. Therefore, they cannot complete that critical step of getting to plain text. And if we accept the truth that KMS is audited, our HSMs are reviewed and independently attested, there is no engineering possibility of extracting a key in plain text from that service.

Threat Modeling External Actors: How KMS Protects Against Unauthorized Access

So let's start with an example. We're going to use threat modeling. Who's done threat modeling? Hands up, who's done threat modeling. Familiar with threat grammar? Okay. Threat Composer, anyone used that before? Hand up, okay. As a side, we do have a threat modeling tool that's so good we used it internally and then we open sourced it called Threat Composer, available on GitHub.

Creating threats with particular grammar helps structure your threat so that you're focusing on the issues that are really important, as opposed to sometimes people waffle on when they talk about threats, particularly when it comes to state-sponsored actors. So the threat actor in our case here is any unauthorized third party. It could be a government entity of a government you don't want reading your data. It could be a competitor. We'll cover internal actors in the next threat model, but any unauthorized third party, and we'll assume external in this particular case, with access to your crypto keys. Remember the crypto triad, they'd have to have two of those three things.

Can they decrypt your data? Potentially they can, resulting in reduced confidentiality of the encrypted data. So this is the threat grammar version of it: actor, prerequisites, and the thing they can do. Now we have some assumptions for this particular threat. The unauthorized party has access to your encrypted data. So now they've got two pieces of the crypto triad, and they may have acquired that data through following some due legal process, or they may have stolen it, intercepted it in some way. So there are various ways they might get hold of your ciphertext.

The mitigation in simple terms is use KMS. It's very simple. You can't get the key out of it. Not even AWS employees can extract the keys for a third party to go away and take that ciphertext and turn it back into plain text. Focusing particularly on the Cloud Act, which was published in 2018, we have a report from IDC, an independent report. Even if the encrypted data is transferred to a law enforcement body via due process with all the right warrants and all the right courts and judges approving it, including possibly yourselves as customers, there's no requirement under that to give away the key.

To decrypt that encrypted data, they have to come to you as a customer, and that allows you as a customer to maintain agency over your data. There is no leak here because encrypted data is zeros and ones, it is noise. You can't turn it back into plain text in a reasonable time with today's technology.

Maybe the future technology in 1000 years might be able to make something with it, but not today. So the essential thing is this is what AWS KMS does for you. Trust KMS. The keys don't escape. Encrypt your data with it, and particularly customer-managed keys.

Defending Against Internal Threats: Policy-Based Key Protection and Separation of Duties

Now, let's adjust our threat position a little bit. The malicious actor or processes inside your organization now. It may be perhaps you use a managed services company based in a different country where the laws around data management and key management are different than in your own country of operations or your home country of operations. It may be that you are employing developers either internally or externally to write code for you, putting code in your repositories, and due to some flaw in your code static analysis scanning or code threat scanning, a piece of malicious code has made its way through in your pipeline to run on an EC2 instance or perhaps in a Lambda function or container. Now this threat actor, be it a human or a process, has now got a policy that it can use to access keys. And this particularly gives it the ability to do things like we've seen recently, we've seen ransomware attacks where malicious code was injected into Lambda functions and EC2 services that customers had, and then the key was used to decrypt and then re-encrypt for a different key outside of AWS.

This is a nasty attack, a form of ransomware attack. So this does happen and when that user, that process or that developer has the ability to access KMS and manage keys and get access to your ciphertext, then it can re-encrypt it or decrypt it and exfiltrate that data in various ways. It's very much on your side of the shared responsibility model at this point.

Assumptions, the internal actor must have AWS credentials. Everything relies on a SIGV4 signed request, access key, secret key, session key, whether it be a server, a container, a Lambda function or a human being. And it must have a role with permissions to the KMS service to run crypto calls as a crypto user for those particular keys. So these are our assumptions.

And before we cover the mitigations, let's do a quick micro refresh. Hands up to everyone who's totally comfortable with SIGV4 signing. One hand, okay. I cannot emphasize this enough. Become familiar with it. If you've ever done Python development, you've used the Boto 3 library, put it in debug mode and see what happens. See how keys are used. That access key and secret key is critical, and that's why we tell you so many times as customers, be very careful not to commit those keys to accidentally put them in a GitHub repository or somewhere. This is a bad thing. That's a different talk, but yeah, you don't want to do that.

So we know there is a user, it could be a process, could be a human. We know there's a bucket where data is going to be stored, and we want to maintain the sovereignty of this data. We know there's front-end APIs to the KMS servers itself and where the KMS key lives. So what protects our data and our keys? SIGV4 signed request, that's the essential. The real perimeter of cloud is that authentication and authorization piece. It's not your firewalls, and to a lesser extent your firewalls.

We know that AWS services generally support TLS 1.3 with post-quantum encryption algorithms available in many of the services already. 1.2 is still supported. We recommend you don't use 1.2, so we know that the traffic is encrypted as it passes down through these services. We know that there is an IAM policy that allows you to make calls to the KMS service, a regional service, remember that. We know there is an S3 bucket policy, a resource policy assigned to a particular bucket. And we have KMS key policies as well that determine who the crypto admin users will be and who the crypto users will be, and those two types of accounts are common to maybe on-premises HSMs, not AWS CloudHSM that you've used in previous times.

And additionally to that, this is an optional one. You may not use AWS Organizations, but if you do, there are service control policies that can be applied across multiple accounts and organization units and additional resource control policies to control what principle can do what with a key and with the data as well, separately. So all of these things combined to get to our mitigation example here, which is where we need to make sure that the encryption key cannot be misused, because we know the ciphertext has escaped, that can be pulled up under an official warrant that can actually be taken away, but we know the key can't, so how do we protect the key from these internal actors?

Well, this first one here,

we can see that the principal for the actual encryption, decryption, and generate data key operations is an EC2 role. Our example uses an EC2-based application running in an EC2 instance to make the calls to AWS KMS to decrypt data and process it in some way. This is effectively our crypto user type account. This ensures that only that principal can perform these operations, which essentially involves generating data keys that are used for envelope encryption.

Let me ask, hands up who understands envelope encryption in its entirety? Okay, we've got a few more there. Envelope encryption is pretty cool. If you don't know what it is, there are lots of talks and blogs on that particular topic that go into depth about how it works.

Now the other aspect of this is we've covered crypto users, but we haven't covered crypto admin. That's the second policy statement there where we're saying only this particular role is allowed to do puts, updates, and revokes in the KMS service itself. That's essentially the crypto admin role that does key management, separation of duty, and least privilege. These are critical things in the KMS key policy itself.

Another thing to think about with KMS key policies is that they enforce explicit allows only. What this means is that if the principal is not in the policy, it can't use the key. That includes the root user in your account. So when you set up policies for key admin or crypto admin users, make sure they really work because once you've locked root out, that's bye bye key. Test your policies and check them.

In my case, I love Kro CLI. It does a really good job of doing least privileged policies and optimizing policies. I've seen an SCP policy being reduced by 78% and it still had the same effective controls on data. It's quite very clever.

So in summary, what this policy does is it locks the key usage and key management down to very specific principals. But that's only part of the picture because the other part of this mitigation is a resource-based policy in S3 itself. We've done a defense in depth type of implementation in this case. We have denies unless it's a particular role writing to a particular bucket. We could add denies if it doesn't come through a particular VPC endpoint, so you can add in lots and lots of depth there via networking and via the role that is used.

You can even add one now with a new feature where you can do attested signed EC2 images, much like you used to do with nitro enclaves. So now the usage of a bucket or a key can actually be locked down to whether this signed EC2 AMI image is allowed to use it or container. In this particular case here, what I want to draw your attention to is that we've actually said, okay, not only does it have to be a particular role from a particular VPC, that is an EC2 instance in this case, but it can only use a particular key to write objects to that bucket. So it's absolute control of that key.

This helps facilitate against preventing that very type of ransomware attack we talked about earlier. And particularly this is a resource in the S3 bucket itself. So even if the attacker has KMS and they have a bunch of other permissions, if they don't have permission to the bucket and the bucket policy, they can't steal or otherwise reduce your availability to use your data. Explicit denies in this particular case create a security invariant, that is something that you can very quickly test and validate. So these are super useful in this case.

AWS European Sovereign Cloud: Operational Independence and Regional Compliance

Now, that's technical measures. There are two more types, and this is where European Sovereign Cloud comes into its own because it's important to note European Sovereign Cloud is a partition. Partitions are not a new thing. We have partitions and that's GovCloud, that's a partition. AWS China, that's a partition. There are also partitions for particular customers with super special needs who want to have a slightly different slant on the shared responsibility model called Amazon Dedicated Cloud. I'm not going to ask you if you have a customer who uses ADC here because you're not really supposed to say if you're a customer of ADC, but anyway, I'm in the British Army, what can I say?

So technical controls, we talked about those. Let's talk about the organizational and contractual controls because this is where European Sovereign Cloud is different. We wanted to make sure that there are other sovereign cloud options available to you in the European marketplace. One of our competitors, OVH, does a very good product, other competitors do as well.

But in our view, it's not the AWS experience that you as customers expect in terms of the number of services and the availability of features. So when we build a sovereign cloud, we want it to be exactly like the cloud and commercial partition that you're used to, with the same services, the same reliability and availability. In the case of the European Sovereign Cloud, it's only one region today. That doesn't mean more regions aren't on the way. And remembering a region is a unit of availability, they are highly available in terms of data centers, network, compute, and storage. They meet those many nines SLAs that are required, so the fact that it's a single region isn't really a big deal.

Now we know customers, as I said, they want to adopt sovereign cloud, but they want all the features. They don't want the compromise, and that's why in our view, to build a cloud that can do that, you need to spend a little bit under 8 billion euros to achieve it. Again, when thinking about how competitors do sovereign cloud in Europe, think about how much have they spent. For Amazon, for us, it's 8 billion euros to get something that does what customers need. It's going to launch before the end of the year, so that gives you about three weeks. We won't do a sweepstake on which week it's going to be, but it's a serious investment to make sure that European customers can do what they want to do in terms of organizational controls.

Now, what are some of these organizational and contractual controls? We commit to any region that you put your data there, it stays there. European Sovereign Cloud is the same, but it is a different cloud. I'll talk about it a bit more on the contractual side, but if you create an account in the European Sovereign Cloud and an access key and a secret key, it will not work in commercial cloud. It's a different root of trust. It's a different partition. It's a different IAM. If you want to move data between European Sovereign Cloud and some other region in commercial, you need to choose to do that via the internet or via APIs that you create yourselves or by using something like Snowmobiles or Snowballs or something like that.

So the European Sovereign Cloud, it is separate. Separate root of trust, separate region in terms of infrastructure, network, compute, and storage. Separate partition, and these are very important things to point out because it's not a matter of just saying I'm going to replicate the bucket from commercial to ESC. You can't do that. You need an intermediate process. There are also extra controls that are placed on customer metadata versus customer content. So customer metadata is things like the name of an S3 bucket to use an example. Customer content is all the zeros and ones that create the object that are put into the bucket.

The name of the file, the name of the bucket, the name of policies and things that you use are metadata. They are not necessarily invisible to service teams. The zeros and ones that are encrypted that make up the object that's in the bucket are customer content. That's the best way to think about it. The S3 examples are a very good one, a clear one to try and talk about in that space. So point being is that metadata is not transferred out of region either. In commercial regions, metadata can be visible in other regions to other operations and services teams. So this covers part of the operational access thing we talked about earlier.

Now, physically independent, multiple availability zones, all powered by Nitro architecture, so there's an important point there is that the technology stack does not change. The S3 service in the European Sovereign Cloud has the same API as the S3 service in any commercial region or perhaps GovCloud as well, though sometimes there's not feature parity in those particular partitions. So these are key things and separate in billing, in-region billing and systems. You can create, you have to create a new account in European Sovereign Cloud, new billing. If you already have an enterprise agreement, you may need to revise that because it's now a separate agreement because it's separate sovereign clouds. That's why, and the offshoot of that is that when you're thinking about it from your side of the shared responsibility model, if you want to have a stand up a stack in European Sovereign Cloud and have a stack in commercial cloud, and you want them to be more or less the same because the services are more or less the same between the regions, if we've gone to the bother of separating out a sovereign cloud, do you really want to be using one Terraform stack and one operations team pushing stacks to both environments? Does that still meet sovereignty requirements? Again, this will be driven by your customers' needs.

You may have to do what we have done, which is to stand up proper operational autonomy. And this is another organizational control. So for European Sovereign Cloud, it's not just a separate environment and a separate partition, it's separate people and separate processes based in the state of Brandenburg in Germany. So whatever law applies is European law to the use of data in that region.

Separate management chains, qualified European Sovereign Cloud staff who must be EU residents and in the future will also only be EU citizens available to operate that environment on our side of the shared responsibility model. And this opens the question, if you're a multinational organization that plans on doing business in Europe or needs that European sovereignty, do you need to set up your own sovereign operations rather than using a common operations team or even offshoring type operations when you're actually running workloads in sovereign cloud? These are questions you need to think about on your side of the shared responsibility model. But we would suggest you probably do, because that's what we've done. That's why it costs $8 billion, cost of people, cost of power, among other things.

This is how the residents located in the EU and the full control over operations and support, this is how we can ensure that we can meet European requirements and make sure that there's a separation so that things that are running in European Sovereign Cloud do not have dependencies on commercial clouds or other partitions. Moving on, and this is one that comes up a lot with European customers. For American customers, maybe not quite so much, but it depends again on your industry and where you are based and to a large extent who your customers are.

You know, there will always be talk about law enforcement access requests. We commit to your data staying in region unless required to be looked at or moved around as part of maintaining and operating the service or in compliance with a subpoena from a valid court and from a valid legal jurisdiction. So as part of the technical measures around how we approach these law enforcement access requests, again, technical, organizational, and legal and contractual measures that we enforce as part of European Sovereign Cloud, the technical measures are KMS. You can't pull keys out of that. The key stays there.

Even if German law enforcement required you to give up your data for some reason, it's encrypted data. They still have to come to you for the key. So that integrity of control over your data is maintained, regardless of where that request comes from. All computing in sovereign cloud is based on the Nitro system. We know the Nitro system is independently attested and proven to say there is no operator access to guest operating system images, that is, EC2 running inside the Nitro hypervisor.

There's no console port to go into like we used to do in the old days, go into the data center to reboot things, use Windows servers to make sure that they'd keep working. You just can't do it. None for us in our data centers, none for you, none for any other external entity that might go into our data center, which we generally don't tell people where they are and we don't let them in even if they ask.

Operational measures, we only have AWS European qualified staff running AWS European Sovereign Cloud, that is EU residents in the short term, and in the longer term it'll be only EU citizens. So there's an absolute connection between the people running it on our side of the shared responsibility model and where you want to run your workloads. It's inviolate and it's contractually proved. They are Europe-first employees from the managing director level all the way down to all parts of the organization, and there are many subunits within the organization established in the state of Brandenburg to make sure that control right lives in Europe.

And legal and contractual measures, for example, one thing we have in the European Sovereign Cloud is each and every law enforcement access request is considered individually. It's not like they go in batches or something, or like we've had five requests from the same entity, we're going to load them all up in the same job. No, every single request is treated independently, reviewed by our legal, and then we talk to you about it and your legal about it if we are able to. So AWS, you know, we commit to make every reasonable effort to ensure that you, the customer, are the people that have to interact with law enforcement where data access requests are coming along, because we don't want to in a certain extent. So that's part of the contractual organizational controls we have in that space.

That more or less brings us to the end of the thing. All right, thank you, Sam. So we talked a lot in the last hour, what it means to be in control of your encryption keys and what it means to how it relates to the digital sovereignty need and the European Sovereign Cloud. And you might be asking yourself, where do you actually go from here? So if we have earned your trust and you believe AWS KMS is the right place to safeguard your keys, then great, our job is almost done.

However, whether you're convinced or not, we still recommend you to do your due diligence, read up on our documentation, read up on our audit reports, and run experiments and tell us what we can be doing to earn your trust even more. If there's something that's not explained well, you should ask us to clarify. The only way for us to improve ourselves is to hear from you on what we can be doing better. And thank you very much for attending our session and please fill out the survey as you go out. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community