I have attached a permission-robbing SCP to an experiment OU, moved the target account to that OU, and observed its behavior.
What is AWS Organizations?
AWS Organizations is an account management service that enables you to consolidate multiple AWS accounts into an "organization."AWS Organizations makes it very useful for managing multiple accounts.
You can integrate and control supported AWS services to accounts that are members of an organization.
Chaos with SCP
Simulate the behavior of AWS services failure on a target workload.
Attach a permission-robbing SCP to an experimental OU, move accounts into it, and observe workload behavior.
Target workload
I set the In-house email notification integrated system as the target workload this time.
permission-robbing SCP
Set the following policies for SCP
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Statement1",
"Effect": "Deny",
"Action": [
"SNS:*",
"SES:*",
"S3:*",
"Lambda:*",
"Dynamodb:*",
"SQS:*"
],
"Resource": "*"
}
]
}
Let's take a guess
What would be the behavior if I moved a workload account to an OU with a permission-robbing SCP attached?
What part of the architecture would fail?
Experimental outcome
I sent an email that this workload should notify Slack.
However, the email has not been sent to Slack.
I found that the (3)Converter failed when it was going to read the email data.
[ERROR] ClientError: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied Traceback (most recent call last):
(1) SES -> (2) SNS -> (3) Lambda calls are connected by events and do not use IAM role access, so they worked correctly.
After that, the Lambda in (3) read email data from S3 using IAM role access, so it failed here. (Got failed by Deny "S3:*")
Incidentally, I checked the Lambda function from the management console, but it displayed "~with an explicit deny in a service control policy," I could not see the function. It is because "Lambda:*" is denied.
It reminded me that if the service fails, I may not be able to check the status of that service resource either.
Conclusion
I have tried easy chaos engineering with AWS Organizations SCP.
Good points of this method
Easy to start, easy to clean up
If it is through an IAM role, you can simulate, to some extent, the failure to be able to access a specific service.
Bad points of this method
SCP with strong restrictions will be attached, and a mistake can cause an accident.
However, limiting the account IDs allowed by the "Organizations:MoveAccount" permission may reduce the incident risk.
Only IAM users and IAM roles can be deny by SCP, so the situations that can be simulated are limited.
At the end
Can I detect the failure? Can I identify the cause?
I thought "Easy Chaos Engineering with AWS Organizations SCP" would be a good starting point for checking.
This post is an English rewrite of a post I wrote in Japanese.
Top comments (0)