Introduction
Hey Folks! Today we have "IaC sandwich" on the menu, and we will look into a guide for ultimate centralized and secure AWS multi-account management with a layered strategy that ensures the best of all worlds—leveraging AWS-native capabilities, infrastructure as code flexibility, and streamlined multi-account governance.
Back in the day, while diving deep into the Terraform/Terragrunt duo and its incredible power, I found myself asking: How can I further automate infrastructure provisioning while ensuring security, scalability, and maintainability across multiple AWS accounts, while also achieving short-lived, secure authorization management (role-based) that remains seamless and controlled? I wanted a solution that would streamline access management, enforce security best practices, and provide a centralized approach to managing infrastructure.
While Terraform and Terragrunt duo already offer fantastic Infrastructure-as-Code (IaC) capabilities, there's one AWS-native service that often gets overlooked—CloudFormation StackSets. Although CloudFormation is AWS-specific and lacks the flexibility of Terraform providers, it has a significant advantage: organization-wide deployments at scale. This made me realize that CloudFormation StackSets could be a powerful addition to a Terraform/Terragrunt-based workflow, addressing the challenge of managing IAM roles securely and efficiently—if added as the first layer of my sandwich during the pre-provisioning phase—where I need an easy "init-hook" to get all the needed roles and permissions in place even before beginning the actual deployment of infrastructure.
So, in this article, I’ll walk you through how I designed a fully automated, secure, and scalable infrastructure management approach using Terraform, Terragrunt, and AWS CloudFormation StackSets—leveraging the best of each tool to create an ironclad AWS setup.
The Challenge
In multi-account AWS environments, access management and security are paramount. When using Terraform and Terragrunt, you need IAM roles that allow infrastructure automation while maintaining strict security controls. However, these roles must be pre-provisioned before Terraform can even begin managing infrastructure. This raises an important question:
How can we centrally create IAM roles across all AWS accounts in a secure and automated way?
Here's what we needed:
- A centralized approach for IAM role provisioning across all AWS accounts.
- A secure way to assume roles with least-privilege access.
- Seamless integration with Terraform and Terragrunt for managing infrastructure.
- Short-lived credentials for increased security via AWS Identity Center (SSO).
The Solution Overview
All Terraform state will be stored in the Shared-Services account. This ensures centralized state management, improving security and consistency across environments.
The Main Terraform execution role and GitHub Actions OIDC Role will also reside in the Shared-Services account. These roles will be able to assume Account-level Terraform execution roles alongside with Management account admins - so the entity that can assume this main role will be able to controle multiple accounts with iac approach.
Account-Based Terraform Execution Roles: Each AWS account (Development, Staging, Production) has its own Terraform execution role, which can be assumed when provisioning infrastructure within that specific account. Highlighting that each individual role per env has to be tailored based on least-privilege principle, yet for demo purposes our roles will have AdministratorAccess policy attached.
GitHub Actions OIDC Role: This role enables secure CI/CD automation by allowing GitHub Actions workflows to assume it for deployments.
To maintain strict security controls, we need to limit the identities that can assume the Shared-Services Terraform Execution Role. In our case:
- The Master Account can assume both the Shared-Services Terraform Execution Role and the individual account-based roles directly.
- Only authorized users from the Master Account will have permission to assume roles within the Shared-Services account, ensuring controlled access.
So, ultimately, the identities that should be able to assume Terraform execution roles should be very limited. In our case, these entities are:
- AWS Management account Administrators
- Shared-Services "terraform-execution-role"
- GitHub Actions OIDC federated role
The only entities that apply anything on the infrastructure-org
path—which stores the organization-level configurations that go through the management account—are AWS Management Account Administrators. Therefore, we are not providing CI automation for this section and are limiting access to this part to very few (Master Account admins).
So, getting back to the pre-provisioning phase to prepare these roles before the actual provisioning—how do we achieve that?
The answer? Leverage AWS CloudFormation StackSets to pre-provision IAM roles across all AWS accounts before using Terraform/Terragrunt against infrastructure-live
. In simple terms, we will have specific management modules utilized under the infrastructure-org
path before switching to infrastructure-live
.
Two-Phase Deployment
- Phase 1: Use CloudFormation StackSets to pre-provision IAM roles across AWS accounts.
- Phase 2: Terraform/Terragrunt assumes these pre-created roles to deploy infrastructure on managed accounts.
GitOps Folder-Based Environment Approach
We also want to leverage a GitOps-style approach, structuring environments using a folder/directory per environment model. This will be combined with Terragrunt functions for seamless mapping of common variables (e.g., dynamically mapping account IDs based on environment folder names).
Closer look
Now that we have established the core concepts, let’s explore the structure of our Infrastructure-as-Code (IaC) project in detail.
All referenced example code-base is stored in GitHub.
.
├── common.hcl
├── infrastructure-live
│ ├── development
│ ├── production
│ ├── shared-services
│ │ └── gha-oidc
│ ├── staging
│ └── terragrunt.hcl
├── infrastructure-org
│ ├── root
│ │ ├── cfstacksets
│ │ └── organization
│ └── terragrunt.hcl
└── modules
├── cfstacksets
├── gha-oidc
└── organization
For simplicity, we have only the skeleton defined for infrastructure-live
, but there is no actual provisioning per environment except for the gha-oidc
module in shared-services
.
Key Aspects of Our IaC Structure
infrastructure-org
(Organization-Level Management) - This path represents the root (management) account and delegated administrator accounts. It includes CloudFormation StackSets (cfstacksets
) and AWS organizatins setup - OUs, SCPs etc. (organization
). Since it manages organization-wide resources, access to this path must be strictly limited.infrastructure-live
(Environment-Specific Infrastructure) - This path contains per-environment configurations (development
,staging
,production
, etc.). It includesshared-services
, which hosts thegha-oidc
module for GitHub Actions OIDC setup.common.hcl
(Shared Variables and Configuration) - This file contains shared variables used by bothinfrastructure-org
andinfrastructure-live
. Common configurations such as account IDs, main region, and global settings|inputs|locals are defined here.
Terragrunt Pathing Pattern
As highlighted earlier, our Terragrunt pathing follows a structured pattern
infrastructure-path/account|env/modules
Parent terragrunt.hcl lives under the infrastructure-path, and child hcl files live on module level and source the parent hcl.
infrastructure-org
Here is a sample parent file for org path intended for organization-level management.
skip = true
terraform {
source = "${get_repo_root()}/modules/${basename(get_terragrunt_dir())}"
}
locals {
common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
global_prefix = local.common_vars.locals.global_prefix
env = "root"
profile = get_env("AWS_PROFILE_ROOT", "${local.global_prefix}-root-sso")
region = local.common_vars.inputs.region
}
inputs = merge(
local.common_vars.inputs,
{
env = local.env
region = local.region
account_id = local.common_vars.inputs.org_account_ids[local.env]
}
)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "${local.global_prefix}-terraform-state-root"
key = "${local.global_prefix}/${get_path_from_repo_root()}/terraform.tfstate"
region = local.region
encrypt = true
dynamodb_table = "root-tfstate-lock-table"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<-EOF
provider "aws" {
region = "${local.region}"
allowed_account_ids =["${local.common_vars.inputs.org_account_ids[local.env]}"]
default_tags {
tags = {
Environment = "${local.env}"
ManagedBy = "terraform"
}
}
}
EOF
}
generate "versions" {
path = "versions.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.74"
}
}
}
EOF
}
Provisioning the Root Account Modules
To apply the root account (call it management account) modules (cfstacksets
and organization
), we assume a management account administrator role and run the following command:
export AWS_PROFILE=your_management_admin_profile
terragrunt run-all apply --terragrunt-working-dir infrastructure-org
cfstacksets
Module
The cfstacksets
module provisions key IAM roles across different AWS Organizational Units (OUs). Let's examine the main components:
stacks-sdlc.tf
(IAM Role Provisioning for SDLC Accounts)
resource "aws_cloudformation_stack_set" "terraform_role_sdlc" {
permission_model = "SERVICE_MANAGED"
name = "${var.tf_role_name}-sdlc"
auto_deployment {
enabled = true
}
capabilities = ["CAPABILITY_NAMED_IAM"]
template_body = jsonencode({
AWSTemplateFormatVersion = "2010-09-09",
Description = "AWS CloudFormation Template to create an IAM Role named '${var.tf_role_name}' and attach the 'AdministratorAccess' AWS managed policy.",
Resources = {
OrgRole = {
Type = "AWS::IAM::Role",
Properties = {
RoleName = "${var.tf_role_name}",
AssumeRolePolicyDocument = {
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
AWS = ["arn:aws:iam::${var.shared_services_id}:root"]
},
Action = ["sts:AssumeRole", "sts:TagSession"],
Condition = {
StringLike = {
"aws:PrincipalArn" = [
"arn:aws:iam::${var.shared_services_id}:role/${var.tf_role_name}",
"arn:aws:iam::${var.shared_services_id}:role/${var.gha_role_name}"
]
}
}
},
{
Effect = "Allow",
Principal = {
AWS = ["arn:aws:iam::${var.root_account_id}:root"]
},
Action = ["sts:AssumeRole"],
}
]
},
ManagedPolicyArns = [
"arn:aws:iam::aws:policy/AdministratorAccess"
]
}
}
}
})
lifecycle {
ignore_changes = [administration_role_arn]
}
}
resource "aws_cloudformation_stack_set_instance" "terraform_role_sdlc" {
stack_set_name = aws_cloudformation_stack_set.terraform_role_sdlc.name
deployment_targets {
organizational_unit_ids = [var.org_ou_ids["sdlc"], var.org_ou_ids["production"], var.org_ou_ids["sandbox"]]
}
}
stacks-shared-services.tf
(IAM Role Provisioning for Shared Services)
This stack provisions an IAM role for shared services that can assume Terraform roles across multiple OUs.
resource "aws_cloudformation_stack_set" "terraform_role_shared" {
permission_model = "SERVICE_MANAGED"
name = "${var.tf_role_name}-shared"
auto_deployment {
enabled = true
}
capabilities = ["CAPABILITY_NAMED_IAM"]
template_body = jsonencode({
AWSTemplateFormatVersion = "2010-09-09",
Description = <<EOT
AWS CloudFormation StackSet template to create an IAM Role named '${var.tf_role_name}' on Shared-Services
account and attach the 'AdministratorAccess' AWS managed policy. The role can be assumed by an external account with
a matching condition. Exclusively this role itself is able to assume '${var.tf_role_name}'s across the SDLC and
Production OUs. Note: Root Administrators are also able to assume target '${var.tf_role_name}'s across the SDLC
and Production OUs.
EOT
Resources = {
OrgRole = {
Type = "AWS::IAM::Role",
Properties = {
RoleName = var.tf_role_name,
AssumeRolePolicyDocument = {
Version = "2012-10-17",
Statement = [
{
Effect = "Allow",
Principal = {
AWS = [
"arn:aws:iam::${var.shared_services_id}:root",
"arn:aws:iam::${var.root_account_id}:root"
]
},
Action = ["sts:AssumeRole"]
}
]
},
ManagedPolicyArns = [
"arn:aws:iam::aws:policy/AdministratorAccess"
]
}
}
}
})
lifecycle {
ignore_changes = [administration_role_arn]
}
}
resource "aws_cloudformation_stack_set_instance" "terraform_role_shared" {
stack_set_name = aws_cloudformation_stack_set.terraform_role_shared.name
deployment_targets {
organizational_unit_ids = [var.org_ou_ids["core"]]
account_filter_type = "INTERSECTION"
accounts = [var.shared_services_id]
}
}
This ensures that the terraform_role_shared
is provisioned only in the Shared-Services account and can assume roles across different OUs securely.
infrastructure-live
Now that the organization-wide IAM roles and policies are in place, we move on to the environment-specific infrastructure provisioning under infrastructure-live
. This is where Terraform/Terragrunt dynamically maps account-specific configurations, enabling seamless role assumption and execution.
Common Configuration (common.hcl
in the root of the project)
The common.hcl
file defines global configurations, including account mappings, environment inference, and shared settings.
locals {
env_regex = "infrastructure-live/([a-zA-Z0-9-]+)/"
env = try(regex(local.env_regex, get_original_terragrunt_dir())[0], "shared-services")
sdlc_account_ids = {
development = "XXXXXXXXXXXXX"
staging = "XXXXXXXXXXXXX"
production = "XXXXXXXXXXXXX"
}
core_account_ids = {
shared-services = "XXXXXXXXXXXXX"
backups = "XXXXXXXXXXXXX"
}
management_account_id = {
root = "XXXXXXXXXXXXX"
}
sandbox_account_id = {
sandbox = "XXXXXXXXXXXXX"
}
global_prefix = "XXXXXXXXXXXXX"
}
inputs = {
global_prefix = local.global_prefix
sdlc_account_ids = local.sdlc_account_ids
core_account_ids = local.core_account_ids
org_account_ids = merge(local.sdlc_account_ids, local.core_account_ids, local.management_account_id, local.sandbox_account_id)
shared_services_id = local.core_account_ids["shared-services"]
backups_id = local.core_account_ids["backups"]
root_account_id = local.management_account_id["root"]
org_units = ["SDLC", "Production", "Core", "Sandbox"]
tf_repo = "XXXXXXXXXXXXX/terragrunt-infrastructure"
tf_role_name = "terraform-execution-role"
gha_role_name = "gha-role"
gha_oidc_enabled = true
repo_root_path = get_repo_root()
}
Terragrunt Configuration for infrastructure-live
(terragrunt.hcl
)
Each environment directory (e.g., development/
, staging/
, production/
, etc.) will reference a common terragrunt.hcl
file, ensuring consistent execution policies and automatic role assumption.
skip = true
terragrunt_version_constraint = ">= 0.66"
terraform_version_constraint = ">= 1.9.0"
retryable_errors = ["(?s).*failed calling webhook*"]
retry_max_attempts = 2
retry_sleep_interval_sec = 30
dependencies {
paths = ["${get_repo_root()}/infrastructure-org/root/cfstacksets"]
}
terraform {
source = "${get_repo_root()}/modules/${basename(get_terragrunt_dir())}"
}
locals {
common_vars = read_terragrunt_config(find_in_parent_folders("common.hcl"))
region = local.common_vars.inputs.region
env_regex = local.common_vars.locals.env_regex
env = local.common_vars.locals.env
global_prefix = local.common_vars.locals.global_prefix
}
inputs = merge(
local.common_vars.inputs,
{
env = local.env
region = local.region
account_id = local.common_vars.inputs.org_account_ids[local.env]
}
)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "${local.global_prefix}-terraform-state-shared-services"
key = "${local.global_prefix}/${get_path_from_repo_root()}/terraform.tfstate"
region = local.region
encrypt = true
dynamodb_table = "shared-services-tfstate-lock-table"
assume_role = {
role_arn = "arn:aws:iam::${local.common_vars.inputs.org_account_ids["shared-services"]}:role/${local.common_vars.inputs.tf_role_name}"
}
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<-EOF
provider "aws" {
region = "${local.region}"
allowed_account_ids = ["${local.common_vars.inputs.org_account_ids[local.env]}"]
assume_role {
role_arn = "arn:aws:iam::${local.common_vars.inputs.org_account_ids[local.env]}:role/${local.common_vars.inputs.tf_role_name}"
}
default_tags {
tags = {
Environment = "${local.env}"
ManagedBy = "terraform"
DeployedBy = "terragrunt"
}
}
}
EOF
}
generate "versions" {
path = "versions.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.74"
}
}
}
EOF
}
Dynamic Mapping of Environments and Role Assumption
One of the key benefits of Terragrunt's DRY (Don't Repeat Yourself) approach is that we dynamically infer account configurations based on folder structure.
- Each environment folder (
infrastructure-live/development
,staging
,production
, etc.) automatically determines its AWS account ID and region. - Local profile or CI/CD execution will dynamically assume the appropriate Terraform execution role for the target environment.
How It Works
-
env_regex
captures the environment name from the path. - The
inputs
block maps the environment name to its corresponding AWS account ID. -
remote_state
ensures each environment uses its own Terraform state stored in the Shared-Services account. - The provider configuration automatically assumes the appropriate IAM role for infrastructure provisioning.
Executing Terraform/Terragrunt in infrastructure-live
With this setup, running Terraform/Terragrunt becomes straightforward. Whether executed locally or via GitHub Actions OIDC, the appropriate role is automatically assumed.
export AWS_PROFILE=your_profile # This can be Shared-Services Terraform Exec Role or Management Admin
terragrunt run-all apply --terragrunt-working-dir infrastructure-live
What Happens?
-
Terragrunt reads the environment directory structure (
infrastructure-live/development
,staging
,production
). - It dynamically assumes the correct IAM role for Terraform execution.
- State files are stored centrally in the Shared-Services account.
- The appropriate infrastructure is provisioned, following AWS best practices for multi-account security.
Farewell 😊
We've navigated the intricacies of multi-account AWS infrastructure automation using Terraform, Terragrunt, and AWS CloudFormation StackSets. By layering pre-provisioned IAM roles, dynamic environment mapping, we've inspected a secure, scalable, and streamlined approach to managing AWS at scale.
I hope this guide provides practical insights and a solid foundation. Keep refining, keep automating, and embrace the power of infrastructure as code! 🚀
Top comments (1)
Such a good read ! I thought about using cfn for the role creation the other day , to be used for deploying some baseline infrastructure and will defs use this as a point of reference 😎