DEV Community

Cover image for How I Overlooked the Problem and Shot Myself in the Foot
Dmytro Sirant for AWS Community Builders

Posted on • Originally published at sirantd.com

How I Overlooked the Problem and Shot Myself in the Foot

Migration Setup

As part of my work as an AWS consultant this year, I’ve been doing migrations from IAM Users to SSO (yes, I know, but better late than never). There’s a checklist I follow during such migrations, and one of the last stages is to keep IAM Users disabled (keys and web access) for about a month, just to ensure that everything works fine without issues. This approach proved helpful in the case of EKS clusters before AWS introduced the ability to manage cluster access via API instead of the legacy aws-auth configmap.

The Missed Detail

But there was one issue I kept overlooking until it finally caught up with me. Long story short: the migration from IAM to SSO was completed, and after the planned cooldown, IAM users were deleted. Some time later, I decided to upgrade the IaC with a new version of the terraform-aws-eks module. The terraform plan showed expected changes, but during terraform apply I got an error stating that my SSO account had no permission to update the KMS key alias (a minor change due to improved naming conventions).

A quick check showed that the KMS key created by the previous version of the module had a neat, least-privilege key policy — with kms:PutPolicy permission granted only to the IAM user I’d used to create the EKS cluster with KMS envelope encryption. Ironically, that IAM user had been disabled for a month and deleted only a week earlier.

False Sense of Victory

My first thought was that it wasn’t a big deal — I’d just remove the current KMS key object from the Terraform state, let it create a new key, and associate it with the cluster. Sounded good. Even better, terraform plan and terraform apply completed without errors. Problem solved!

or so I thought...

After a few small tweaks, I ran another change and noticed that Terraform tried to update the EKS cluster resource again. The only difference was the KMS envelope encryption key association. Multiple runs, same behaviour — Terraform applied the change successfully, yet it wasn’t reflected in the cluster settings (it still used the old KMS key I had no access to).

A quick check of the documentation confirmed that the envelope encryption key can’t be changed after creation. Fair enough. But why didn’t Terraform respect that? I’m not sure yet whether the AWS API isn’t returning a proper response code, or the Terraform AWS provider doesn’t handle it correctly.

Recovery Attempt

So, I needed recovery access to the KMS key somehow. How do you get maximum access permissions to an AWS account? Correct — use the root login. Unfortunately, even with root permissions, I couldn’t update the KMS key policy. Great for security, but I still needed to regain access to the key. Time to check if anyone else had faced the same issue, and within a few minutes the answer was clear — contact AWS Support.

I opened a support case, explained the problem, and provided the ARN of the KMS key along with the policy I wanted to attach. What could be simpler? The support response surprised me. I had to follow a specific set of instructions:

To recover your unmanageable keys, please follow these steps: 

  1. Create 1 IAM user for every affected key using the following naming convention: kms_key_recovery_ For Example. If the Key ID is “17e51010-cc0f-2268-bccd-2699f10c133a” , then the corresponding recovery user would be “kms_key_recovery_17e51010-cc0f-2268-bccd-2699f10c133a”. Create and attach the following IAM policy to the newly created users: { “Version”: “2012–10–17”, “Statement”: [ { “Effect”: “Allow”, “Action”: [ “kms:ListAliases” ], “Resource”: “*” } ] } 
  2. Once the recovery users have been created, please respond to this case with a list of the ARNs of the keys for which you have made recovery users and the respective users ARNs. 
  3. We will then contact you via the phone number listed in your account Contact Information to verify the One-Time Password listed above. Once verified, we will engage our internal KMS team to initiate recovery and update your case when complete. Your key recovery users will have the ability to modify the key policies of their respective keys. If the process was abandoned, we will inform you which users in your account still have access.

Because it’s not a synchronous operation and the KMS key policy can be updated at any time within a 12-hour window, I was worried the cluster might enter a degraded state before I could apply the proper policy. Luckily, that didn’t happen. The AWS team just granted the required permissions to the key as an additional rule without dropping the existing ones. The next day, I received an update confirming the recovery procedure was complete, and I was able to attach the correct policy to the key.

Lessons Learnt

Now there’s a new step in my IAM-to-SSO migration checklist:
[ ] update KMS key policies before deleting IAM users.


Looking for your comments and let’s connect on LinkedIn

Top comments (0)