By the end of this guide, you will not just know what CloudFormation does. You will understand why each feature exists, what problem it solves, what breaks without it, and how features chain together to solve real infrastructure problems. Every concept is introduced with a problem first, not a definition first.
Table of Contents
- What is CloudFormation and Why It Matters
- Logical and Physical Resources
- Templates: Portable vs Non-Portable
- Template Parameters and Pseudo Parameters
- Intrinsic Functions
- Mappings
- Outputs
- Conditions
- DependsOn
- Wait Conditions and cfn-signal
- cfn-init
- cfn-hup
- Nested Stacks
- Cross-Stack References
- StackSets
- Deletion Policy
- Stack Roles
- ChangeSets
- Custom Resources
What is CloudFormation and Why It Matters
The Problem It Solves
Imagine you are building a three-tier web application. You need a VPC, subnets, an EC2 instance, a security group, an RDS database, an S3 bucket, and a load balancer. You click through the AWS Console and get it working. Three weeks later, a colleague asks you to replicate the exact environment for staging. You click through everything again. Two hours later, something is different you missed a security group rule, the subnet CIDR is wrong, and the RDS instance has a different parameter group.
This is the core problem. Manual infrastructure is not repeatable, not auditable, and not scalable.
CloudFormation is AWS's answer. You describe your infrastructure in a template a YAML or JSON file and CloudFormation takes that description and builds the real AWS resources. The template becomes the single source of truth.
What CloudFormation Actually Is
CloudFormation is a declarative infrastructure provisioning engine. You declare what you want, not how to create it. CloudFormation figures out the how, including the order in which resources must be created, the dependencies between them, and what to clean up if something fails.
Why It Matters Beyond Convenience
- Repeatability The same template deployed ten times produces ten identical environments.
-
Version control Your infrastructure lives in Git. Every change is tracked. Every rollback is a
git revert. - Accountability Who changed the security group? Check the commit history.
- Speed A 40-resource stack that takes two hours to click through manually deploys in minutes.
- Disaster recovery When a region fails, you re-deploy the template in another region. Your infrastructure is code, not memory.
- Cost Stacks can be deleted entirely after use. Temporary environments cost nothing when they are gone.
CloudFormation is not just a tool. It is a practice the practice of treating infrastructure with the same discipline as application code.
Logical and Physical Resources
The Concept
This is the foundation. If you misunderstand this, everything else will be confusing.
A logical resource is what you write in the template. It is a declaration a name you give to a resource and a description of what you want.
A physical resource is what AWS actually creates when CloudFormation processes your template. It has a real ID an instance ID, a bucket name, a security group ID.
The Relationship
When you write this:
Resources:
MyWebServer:
Type: AWS::EC2::Instance
Properties:
InstanceType: t3.micro
ImageId: ami-0c55b159cbfafe1f0
MyWebServer is the logical resource. The actual EC2 instance that gets created say i-0a1b2c3d4e5f6 is the physical resource.
The logical resource exists only in your template. The physical resource exists in AWS.
Why This Distinction Matters
CloudFormation tracks the mapping between logical and physical resources in something called the stack. This mapping is what enables CloudFormation to:
- Update a resource when you change the template
- Replace a resource when a change requires replacement
- Delete all physical resources when you delete the stack
If you manually delete a physical resource that CloudFormation manages, the stack loses track of it. On the next update or delete, CloudFormation will fail or behave unpredictably because the physical resource it expects to find is gone.
Rule: Never manually modify a physical resource that is managed by a CloudFormation stack. Make the change in the template instead.
What Happens During Stack Creation
- CloudFormation reads your template.
- It builds a dependency graph of all logical resources.
- It creates physical resources in the correct order.
- It maps each logical resource to its new physical resource.
- The stack reaches
CREATE_COMPLETE.
If any resource creation fails, CloudFormation rolls back it deletes everything it already created and the stack reaches ROLLBACK_COMPLETE. Nothing is left half-built.
Templates: Portable vs Non-Portable
The Problem with Non-Portable Templates
Here is a simple template:
AWSTemplateFormatVersion: "2010-09-09"
Description: Simple EC2 instance
Resources:
MyInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t3.micro
ImageId: ami-0c55b159cbfafe1f0
SubnetId: subnet-0abc1234
This works. Once. In one region. For one person.
The ImageId is an AMI ID. AMI IDs are region-specific. ami-0c55b159cbfafe1f0 in us-east-1 is not the same image in eu-west-1 in fact, it probably does not exist there at all. The SubnetId is hardcoded to a specific subnet in a specific AWS account.
If you or anyone else tries to deploy this template in a different region or a different account, it will fail.
This is a non-portable template. It has hardcoded values that only work in one specific context.
What Would Happen If You Ran It
- In your account, same region: Works.
- In your account, different region: Fails. AMI ID does not exist.
- In a colleague's account: Fails. Subnet ID does not exist.
- In a CI/CD pipeline targeting staging: Fails. Both IDs are wrong.
The Solution: Portable Templates
A portable template contains no hardcoded environment-specific values. Instead, it accepts inputs, references dynamic values, and uses lookup mechanisms to resolve environment-specific details at deploy time.
The tools that enable portability are:
- Parameters Inputs you provide at deploy time
- Pseudo Parameters Values AWS provides automatically (account ID, region, etc.)
- Intrinsic Functions Functions that resolve values dynamically
- Mappings Lookup tables built into the template
The portable version of the above template:
AWSTemplateFormatVersion: "2010-09-09"
Description: Portable EC2 instance
Parameters:
SubnetId:
Type: AWS::EC2::Subnet::Id
Description: The subnet to launch the instance into
Mappings:
RegionAMIMap:
us-east-1:
AMI: ami-0c55b159cbfafe1f0
eu-west-1:
AMI: ami-0d71ea30463e0ff49
Resources:
MyInstance:
Type: AWS::EC2::Instance
Properties:
InstanceType: t3.micro
ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
SubnetId: !Ref SubnetId
Now this template works in any region that is in the map, and in any account. The subnet is provided at deploy time. The AMI is looked up based on which region you are deploying to.
This is the difference between a template you write once and a template you write once and use everywhere.
Template Parameters and Pseudo Parameters
Template Parameters
Parameters make your template an interface. Instead of hardcoding values, you expose inputs that the person deploying the template or an automation system provides at deploy time.
Parameters:
EnvironmentName:
Type: String
Default: development
AllowedValues:
- development
- staging
- production
Description: The environment this stack is being deployed to
InstanceType:
Type: String
Default: t3.micro
AllowedValues:
- t3.micro
- t3.small
- t3.medium
Description: EC2 instance type
DBPassword:
Type: String
NoEcho: true
MinLength: 8
MaxLength: 32
Description: Database password will not be displayed
Key parameter types:
| Type | What It Validates |
|---|---|
String |
Any string |
Number |
Numeric value |
AWS::EC2::Subnet::Id |
Must be a valid subnet ID in your account |
AWS::EC2::KeyPair::KeyName |
Must be a valid key pair name |
AWS::SSM::Parameter::Value<String> |
Pulls value from Systems Manager Parameter Store |
The NoEcho: true property is important. It prevents the value from being displayed in the Console or CLI output. Use it for passwords and secrets. Note: it is not encryption. The value is still stored in the stack. Do not use it for genuinely sensitive production secrets use Secrets Manager or SSM SecureString instead.
To reference a parameter inside the template:
Properties:
InstanceType: !Ref InstanceType
!Ref on a parameter returns its value. That is it. Simple.
Pseudo Parameters
Pseudo parameters are values that AWS provides automatically. You do not declare them. They are always available.
| Pseudo Parameter | What It Returns |
|---|---|
AWS::AccountId |
Your 12-digit AWS account ID |
AWS::Region |
The region being deployed to, e.g. eu-west-1
|
AWS::StackName |
The name of the current stack |
AWS::StackId |
The full ARN of the stack |
AWS::NoValue |
Used to conditionally remove a property |
AWS::Partition |
aws, aws-cn, or aws-us-gov
|
AWS::URLSuffix |
amazonaws.com or region-specific suffix |
Usage example:
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub "myapp-${AWS::AccountId}-${AWS::Region}-logs"
This creates a bucket name like myapp-123456789012-eu-west-1-logs. Because account IDs and regions are globally unique, this bucket name will never collide with anyone else's, and the same template deployed to multiple regions will always produce a unique bucket name.
AWS::NoValue is used when you want to conditionally exclude a property:
Properties:
DBSnapshotIdentifier: !If [IsProduction, !Ref SnapshotId, !Ref AWS::NoValue]
If the condition IsProduction is false, the property is completely removed from the resource definition. AWS sees it as if you never wrote it.
Intrinsic Functions
What They Are
Intrinsic functions are built-in functions that CloudFormation evaluates when it processes your template. They let you compute values, reference other resources, join strings, look up mappings, and make decisions all at deploy time.
You cannot use them in every part of a template. They work in the Properties section of resources, in Outputs, and in Metadata. They do not work in the Parameters section.
The Core Functions
!Ref
Returns the value of a parameter or the default identifier of a resource.
# On a parameter: returns the parameter value
InstanceType: !Ref InstanceTypeParam
# On a resource: returns the resource's primary identifier
SubnetId: !Ref MySubnet # Returns the Subnet ID of MySubnet
What !Ref returns depends on the resource type. For an EC2 instance it returns the instance ID. For an S3 bucket it returns the bucket name. For a security group it returns the group ID. Always check the CloudFormation documentation for what !Ref returns for each resource type.
!GetAtt
Returns a specific attribute of a resource not just the primary identifier.
# Get the ARN of a Lambda function
FunctionArn: !GetAtt MyLambdaFunction.Arn
# Get the DNS name of a load balancer
LoadBalancerDNS: !GetAtt MyALB.DNSName
# Get the ARN of an IAM role
RoleArn: !GetAtt MyRole.Arn
!GetAtt is how you wire resources together. You create a load balancer and then pass its DNS name to your Route 53 record. You create an IAM role and pass its ARN to a Lambda function.
!Sub
Substitutes variables into a string. The most readable way to build dynamic strings.
# Simple substitution with pseudo parameters
BucketName: !Sub "myapp-${AWS::AccountId}-${AWS::Region}"
# Substitution with logical resource references
Description: !Sub "This is the ${EnvironmentName} environment stack"
# Substitution with explicit variable map
Command: !Sub
- "aws s3 cp s3://${BucketName}/config.json /etc/myapp/config.json"
- BucketName: !Ref MyBucket
!Sub is cleaner than !Join for most string-building tasks. Use it by default and only reach for !Join when you are working with lists.
!Join
Joins a list of values with a delimiter.
# Join with no delimiter
PolicyArn: !Join ["", ["arn:aws:iam::", !Ref AWS::AccountId, ":root"]]
# Join with comma delimiter
AllowedOrigins: !Join [",", [!Ref Domain1, !Ref Domain2]]
!Select
Returns a single value from a list by index.
# Get the first availability zone in the region
AvailabilityZone: !Select [0, !GetAZs ""]
# Get the second
AvailabilityZone: !Select [1, !GetAZs ""]
!GetAZs returns all availability zones in a region. !Select picks one. This combination is used constantly when creating subnets across AZs.
!FindInMap
Looks up a value in a Mapping (covered in the next section).
ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
!If
Returns one of two values based on a condition.
InstanceType: !If [IsProduction, m5.xlarge, t3.micro]
!And, !Or, !Not, !Equals
Logical operators used when defining conditions.
Conditions:
IsProductionAndEU: !And
- !Condition IsProduction
- !Equals [!Ref AWS::Region, "eu-west-1"]
!ImportValue
Imports a value exported by another stack. This is how Cross-Stack References work — covered in detail in section 14.
VpcId: !ImportValue SharedInfra-VpcId
How Functions Compose
The power comes from nesting. Functions can be composed:
SecurityGroupId: !Select
- 0
- !Split [",", !ImportValue SharedInfra-SecurityGroupIds]
Here: import a comma-separated string of security group IDs from another stack, split it into a list, then select the first one. Three functions, one clean result.
This is not just syntax. It is the way CloudFormation templates become self-describing infrastructure documents that adapt to context.
Mappings
What They Are
Mappings are lookup tables built into your template. They let you define a set of key-value pairs and then look up a value at deploy time based on known context like which region you are deploying to, or which environment was selected.
Structure
Mappings:
RegionAMIMap:
us-east-1:
AMI: ami-0c55b159cbfafe1f0
BastionAMI: ami-0a887e401f7654935
eu-west-1:
AMI: ami-0d71ea30463e0ff49
BastionAMI: ami-08d658f84a6d84a80
ap-southeast-1:
AMI: ami-01f7527546b557442
BastionAMI: ami-0c5199d385b432989
EnvironmentConfig:
development:
InstanceType: t3.micro
MultiAZ: false
DeletionProtection: false
production:
InstanceType: m5.large
MultiAZ: true
DeletionProtection: true
Usage
Resources:
MyInstance:
Type: AWS::EC2::Instance
Properties:
ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
InstanceType: !FindInMap [EnvironmentConfig, !Ref EnvironmentName, InstanceType]
!FindInMap takes three arguments: the map name, the top-level key, and the second-level key. It returns the value at that intersection.
Limitations
Mappings are static. They are baked into the template at write time. You cannot populate them dynamically at deploy time, and you cannot look up values in external systems like SSM Parameter Store.
This means:
- When an AMI is updated, you must update the mapping manually and redeploy.
- You cannot have a different mapping value per account without writing account IDs into the template.
For dynamic lookups, the solution is SSM Parameter Store with AWS::SSM::Parameter::Value<String> parameter types, or Custom Resources (covered in section 19). The AWS-managed AMI parameter store path (/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64) is the standard way to always get the latest Amazon Linux AMI without hardcoding AMI IDs at all.
Outputs
What They Are
Outputs are values that CloudFormation makes available after a stack is created or updated. They serve two purposes:
- Visibility Display useful information about what was created (endpoint URLs, resource IDs, ARNs).
- Cross-Stack References Export a value so other stacks can import it.
Structure
Outputs:
WebServerPublicIP:
Description: Public IP address of the web server
Value: !GetAtt MyWebServer.PublicIp
LoadBalancerDNS:
Description: DNS name for the application load balancer
Value: !GetAtt MyALB.DNSName
VpcId:
Description: VPC ID for use by other stacks
Value: !Ref MyVPC
Export:
Name: !Sub "${AWS::StackName}-VpcId"
When to Use Export
Only add Export when you intend for another stack to reference the value. Not every output needs to be exported. Exports create a dependency you cannot delete the exporting stack while any other stack is consuming its exports.
What Would Happen Without Outputs
Without outputs, you would need to go into the AWS Console or run CLI commands to find the DNS name of your load balancer, the ID of your VPC, or the ARN of your IAM role. Outputs surface these values automatically after deployment, making them available to operators, pipelines, and other stacks.
In a CI/CD pipeline:
# After CloudFormation deploy, grab the load balancer URL
ALB_URL=$(aws cloudformation describe-stacks \
--stack-name my-app \
--query "Stacks[0].Outputs[?OutputKey=='LoadBalancerDNS'].OutputValue" \
--output text)
# Use it to run integration tests
curl -f "http://$ALB_URL/health"
Outputs make your stack queryable. This is essential for automated pipelines.
Conditions
The Problem
You want one template that can deploy to both development and production, but the production environment needs a Multi-AZ RDS instance and an additional NAT Gateway, while development needs neither. Without conditions, you need two templates. With conditions, you need one.
Structure
Conditions are defined in the Conditions section and referenced in resources.
Parameters:
EnvironmentName:
Type: String
AllowedValues: [development, production]
Conditions:
IsProduction: !Equals [!Ref EnvironmentName, production]
IsNotProduction: !Not [!Condition IsProduction]
Resources:
PrimaryDatabase:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceClass: !If [IsProduction, db.m5.large, db.t3.micro]
MultiAZ: !If [IsProduction, true, false]
DeletionProtection: !If [IsProduction, true, false]
NATGateway:
Type: AWS::EC2::NatGateway
Condition: IsProduction
Properties:
SubnetId: !Ref PublicSubnet
AllocationId: !GetAtt ElasticIP.AllocationId
The Condition: IsProduction on NATGateway means: only create this resource if IsProduction is true. The !If inside PrimaryDatabase means: use different property values depending on the condition.
Condition Operators
Conditions:
IsProduction: !Equals [!Ref Env, production]
IsUS: !Equals [!Ref AWS::Region, us-east-1]
# Both must be true
IsProductionUS: !And
- !Condition IsProduction
- !Condition IsUS
# Either must be true
IsProductionOrUS: !Or
- !Condition IsProduction
- !Condition IsUS
# Invert
IsNotProduction: !Not [!Condition IsProduction]
Limitation
Conditions are evaluated at deploy time. They cannot change during stack execution. You cannot create a condition that says "if resource X was successfully created, then create resource Y" that is what DependsOn and WaitConditions handle.
DependsOn
The Problem
CloudFormation builds resources in parallel where possible. It determines the order automatically by following !Ref and !GetAtt references if resource B references resource A, CloudFormation knows to create A first.
But sometimes resource B does not reference resource A in its properties, yet it still needs A to exist before it can be created or function correctly. CloudFormation does not know about this implicit dependency, so it might try to create both simultaneously, and B fails because A is not ready.
The Solution
DependsOn explicitly tells CloudFormation: do not start creating this resource until that resource is complete.
Resources:
MyVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
InternetGateway:
Type: AWS::EC2::InternetGateway
VPCGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref MyVPC
InternetGatewayId: !Ref InternetGateway
PublicSubnet:
Type: AWS::EC2::Subnet
DependsOn: VPCGatewayAttachment
Properties:
VpcId: !Ref MyVPC
CidrBlock: 10.0.1.0/24
Here, PublicSubnet references MyVPC via !Ref, so CloudFormation knows it must wait for the VPC. But the subnet does not reference VPCGatewayAttachment in its properties. Without DependsOn, CloudFormation might create the subnet before the gateway is attached, and routing will not work correctly.
A More Common Use Case — RDS and EC2
Resources:
Database:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceClass: db.t3.micro
Engine: mysql
AppServer:
Type: AWS::EC2::Instance
DependsOn: Database
Properties:
UserData: !Base64
!Sub |
#!/bin/bash
echo "DB_HOST=${Database.Endpoint.Address}" >> /etc/app/config
The AppServer does reference Database via !Sub, which creates an implicit dependency. But DependsOn is also good practice here you want the database to be fully ready before the application server boots and tries to connect.
What Happens Without It
Without DependsOn where it is needed, resources are created in the wrong order. The stack may still reach CREATE_COMPLETE but your application may fail at runtime because the dependency was not ready when the dependent resource was configured.
Limitation
DependsOn only tells CloudFormation to wait until the resource creation is complete. It does not tell CloudFormation to wait until the resource is ready to serve traffic or until a process inside the resource has finished running. For that, you need Wait Conditions and cfn-signal.
Wait Conditions and cfn-signal
The Problem That DependsOn Cannot Solve
CloudFormation considers an EC2 instance "created" the moment the API call to launch it succeeds. From CloudFormation's perspective, the resource is done. But the instance has not finished booting. The operating system is still starting. Your bootstrap script the one that installs your application, configures the web server, and starts your process is still running.
If a second resource depends on the application being ready, DependsOn is not enough. CloudFormation will proceed the moment the EC2 API reports success, not when your application is actually ready.
The Solution: cfn-signal and WaitConditions
cfn-signal is a script that runs inside your EC2 instance and sends a signal back to CloudFormation: "I am done and I succeeded" or "I am done and I failed."
A WaitCondition is a CloudFormation resource that pauses the stack and waits for a specific number of signals before proceeding.
Together, they let you tell CloudFormation: "Wait until the bootstrap process inside the instance has finished before you continue creating other resources."
How It Works. Step by Step
- You create an EC2 instance with a
UserDatascript that runs your bootstrap logic. - At the end of the script, you call
cfn-signalto send a success or failure signal. - CloudFormation sees the signal and either continues or fails the stack.
The Code
Resources:
WaitHandle:
Type: AWS::CloudFormation::WaitConditionHandle
WebServer:
Type: AWS::EC2::Instance
Properties:
ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
InstanceType: t3.micro
UserData:
!Base64
!Sub |
#!/bin/bash -xe
# Install and configure application
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from ${AWS::StackName}</h1>" > /var/www/html/index.html
# Signal CloudFormation that bootstrap is complete
/opt/aws/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource WebServerWaitCondition \
--region ${AWS::Region}
WebServerWaitCondition:
Type: AWS::CloudFormation::WaitCondition
DependsOn: WebServer
Properties:
Handle: !Ref WaitHandle
Timeout: 600
Count: 1
-e $? passes the exit code of the last command. If the script ran successfully, $? is 0 CloudFormation receives a success signal. If the script failed, $? is non-zero CloudFormation receives a failure signal, fails the wait condition, and rolls back the stack.
What Happens Without cfn-signal
Without signalling, CloudFormation marks the EC2 instance as CREATE_COMPLETE the moment the API call succeeds typically within 10-15 seconds. Any resource that depends on the instance being fully configured will attempt to use it before it is ready. Load balancer health checks will fail. Downstream resources will misconfigure. Your application stack reaches CREATE_COMPLETE in a broken state.
This is the most common cause of "CloudFormation says it worked but the application is not working" problems.
Timeout
The Timeout is in seconds. 600 means CloudFormation will wait up to 10 minutes for the signal. If no signal arrives, the stack times out and rolls back. Size this based on how long your bootstrap realistically takes, with a comfortable buffer.
Limitation
cfn-signal handles the moment of creation well. But what about ongoing configuration? What if you need to update the instance configuration after the stack is deployed, or re-apply configuration if it drifts? For that, you need cfn-init.
CloudFormation Init (cfn-init)
The Problem with UserData
UserData is a blunt instrument. It is a script that runs once at instance launch and that is it. If you update the CloudFormation template and the instance is not replaced, UserData does not re-run. If the configuration drifts someone manually changes a file on the instance there is no way for CloudFormation to detect or correct it.
Also, UserData is imperative: you write step-by-step instructions. The result depends entirely on the starting state of the machine. If any step fails partway through, you have a half-configured instance with no clean way to recover.
The Solution: cfn-init
cfn-init is a declarative configuration engine built into the CloudFormation helper tools. Instead of writing scripts that say "run these commands," you declare the desired state: "these packages should be installed, these files should exist with this content, these services should be running."
Configuration is written in the Metadata section of the resource using the AWS::CloudFormation::Init key.
Structure
Resources:
WebServer:
Type: AWS::EC2::Instance
Metadata:
AWS::CloudFormation::Init:
config:
packages:
yum:
httpd: []
php: []
files:
/var/www/html/index.php:
content: !Sub |
<?php
echo "<h1>Environment: ${EnvironmentName}</h1>";
echo "<p>Stack: ${AWS::StackName}</p>";
?>
mode: "000644"
owner: apache
group: apache
/etc/httpd/conf.d/myapp.conf:
content: |
<VirtualHost *:80>
DocumentRoot /var/www/html
DirectoryIndex index.php
</VirtualHost>
mode: "000644"
owner: root
group: root
services:
sysvinit:
httpd:
enabled: true
ensureRunning: true
files:
- /etc/httpd/conf.d/myapp.conf
packages:
yum:
- httpd
Properties:
ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
InstanceType: t3.micro
UserData:
!Base64
!Sub |
#!/bin/bash -xe
# Run cfn-init to apply the configuration
/opt/aws/bin/cfn-init -v \
--stack ${AWS::StackName} \
--resource WebServer \
--region ${AWS::Region}
# Signal success or failure
/opt/aws/bin/cfn-signal -e $? \
--stack ${AWS::StackName} \
--resource WebServerWaitCondition \
--region ${AWS::Region}
WebServerWaitCondition:
Type: AWS::CloudFormation::WaitCondition
DependsOn: WebServer
Properties:
Handle: !Ref WaitHandle
Timeout: 600
Count: 1
WaitHandle:
Type: AWS::CloudFormation::WaitConditionHandle
The Four cfn-init Keys
| Key | What It Does |
|---|---|
packages |
Installs system packages via yum, apt, rpm, or other package managers |
files |
Creates files with specific content, permissions, and ownership |
commands |
Runs shell commands in a specific order with optional test conditions |
services |
Ensures services are started, enabled, and restarted when dependencies change |
configSets Ordering Multiple Configurations
When you have complex configuration that needs to run in phases, use configSets:
Metadata:
AWS::CloudFormation::Init:
configSets:
full_install:
- install_cfn
- install_base
- install_app
- configure_app
install_cfn:
files:
/etc/cfn/cfn-hup.conf:
content: !Sub |
[main]
stack=${AWS::StackId}
region=${AWS::Region}
mode: "000400"
owner: root
group: root
install_base:
packages:
yum:
httpd: []
php: []
php-mysqlnd: []
install_app:
files:
/var/www/html/index.php:
content: !Sub |
<?php phpinfo(); ?>
mode: "000644"
owner: apache
group: apache
configure_app:
services:
sysvinit:
httpd:
enabled: true
ensureRunning: true
Then in UserData, reference the configSet:
/opt/aws/bin/cfn-init -v \
--stack ${AWS::StackName} \
--resource WebServer \
--configsets full_install \
--region ${AWS::Region}
What Happens Without cfn-init
Without cfn-init, you rely entirely on UserData scripts. These are harder to maintain, harder to debug, and run only once at launch. Configuration drift is invisible, and updating configuration requires replacing the instance. cfn-init makes your instance configuration as declarative and auditable as your infrastructure definition.
Limitation
cfn-init applies configuration at launch. It does not continuously monitor or re-apply configuration when the template changes, unless the instance is replaced. For tracking template changes and re-applying configuration without replacing instances, you need cfn-hup.
cfn-hup
The Problem cfn-init Alone Cannot Solve
You update your CloudFormation template specifically, you change the content of a configuration file managed by cfn-init. You run aws cloudformation update-stack. CloudFormation processes the update.
If the EC2 instance is not being replaced (because the change does not require replacement only a metadata change), CloudFormation will not re-run cfn-init. The instance keeps running with the old configuration. The update completes, the stack reaches UPDATE_COMPLETE, and your configuration is silently out of date.
The Solution: cfn-hup
cfn-hup is a daemon that runs on the instance and polls the CloudFormation stack for changes to the resource's metadata. When it detects a change, it re-runs cfn-init to apply the updated configuration.
This gives you the ability to update instance configuration through CloudFormation without replacing the instance.
How to Set It Up
cfn-hup requires two configuration files on the instance, typically created via cfn-init itself:
Metadata:
AWS::CloudFormation::Init:
configSets:
full_install:
- install_cfn
- install_app
install_cfn:
files:
/etc/cfn/cfn-hup.conf:
content: !Sub |
[main]
stack=${AWS::StackId}
region=${AWS::Region}
interval=5
mode: "000400"
owner: root
group: root
/etc/cfn/hooks.d/cfn-auto-reloader.conf:
content: !Sub |
[cfn-auto-reloader-hook]
triggers=post.update
path=Resources.WebServer.Metadata.AWS::CloudFormation::Init
action=/opt/aws/bin/cfn-init -v \
--stack ${AWS::StackName} \
--resource WebServer \
--configsets full_install \
--region ${AWS::Region}
runas=root
mode: "000400"
owner: root
group: root
services:
sysvinit:
cfn-hup:
enabled: true
ensureRunning: true
files:
- /etc/cfn/cfn-hup.conf
- /etc/cfn/hooks.d/cfn-auto-reloader.conf
How It Works
cfn-hup.conf tells cfn-hup which stack to watch and how often to poll (every 5 minutes in this example).
cfn-auto-reloader.conf tells cfn-hup what to do when it detects a change: watch the Metadata.AWS::CloudFormation::Init path of the WebServer resource, and when it changes, re-run cfn-init with the full configSet.
When you update the template — change a file, add a package, modify a service — cfn-hup detects the metadata change within the poll interval and re-applies the full configuration to the running instance.
The Complete Pattern
cfn-init, cfn-signal, and cfn-hup form a complete configuration management pattern:
-
cfn-initapplies configuration at launch -
cfn-signaltells CloudFormation that configuration is complete -
cfn-hupkeeps configuration in sync with the template over time
Together, they turn your EC2 instances into configuration-as-code managed infrastructure that stays synchronized with your CloudFormation templates without requiring replacement.
Nested Stacks
The Problem with Single Large Stacks
A CloudFormation stack has a hard limit of 500 resources. But the real problem appears well before that limit.
A template with 80+ resources becomes difficult to read, difficult to test, and difficult to reason about. Changes to one part of the template require updating and deploying the entire thing. Teams working on different parts of the infrastructure step on each other. Re-using patterns across projects is impossible because everything is in one monolithic file.
This is the same problem that led software engineers to adopt functions, modules, and packages. The solution is the same: decomposition.
The Solution — Nested Stacks
A nested stack is a CloudFormation resource of type AWS::CloudFormation::Stack. It references another CloudFormation template stored in S3 and deploys it as a child stack. The parent stack manages the lifecycle of its child stacks.
Resources:
NetworkStack:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/templates/network.yaml
Parameters:
EnvironmentName: !Ref EnvironmentName
VpcCidr: 10.0.0.0/16
TimeoutInMinutes: 20
AppStack:
Type: AWS::CloudFormation::Stack
DependsOn: NetworkStack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/templates/app.yaml
Parameters:
EnvironmentName: !Ref EnvironmentName
VpcId: !GetAtt NetworkStack.Outputs.VpcId
SubnetIds: !GetAtt NetworkStack.Outputs.SubnetIds
TimeoutInMinutes: 30
DatabaseStack:
Type: AWS::CloudFormation::Stack
DependsOn: NetworkStack
Properties:
TemplateURL: https://s3.amazonaws.com/mybucket/templates/database.yaml
Parameters:
SubnetIds: !GetAtt NetworkStack.Outputs.SubnetIds
DBPassword: !Ref DBPassword
The parent template is thin it orchestrates the child stacks and passes data between them via outputs and parameters. Each child template is a focused, independently testable unit.
Passing Data Between Nested Stacks
Notice !GetAtt NetworkStack.Outputs.VpcId. This is how you pass data from one nested stack to another:
- The
network.yamltemplate has anOutputssection that exportsVpcId. - The parent references it via
!GetAtt NetworkStack.Outputs.VpcId. - The parent passes it as a parameter to
AppStack.
# network.yaml Outputs section
Outputs:
VpcId:
Value: !Ref MyVPC
SubnetIds:
Value: !Join [",", [!Ref SubnetA, !Ref SubnetB]]
What Would Happen Without Nested Stacks
All resources live in one template. It grows. Team members modify the same file. Changes to the network layer require touching the application template. Testing one component requires deploying everything. The template becomes the infrastructure equivalent of a 5,000-line monolithic application technically functional but practically unmaintainable.
Limitation
Nested stacks solve the organization and size problem within a single deployment. They do not solve the problem of sharing infrastructure across multiple independent teams or projects. If Team A creates a VPC that Team B also needs, nested stacks are not the right tool Cross-Stack References are.
Cross-Stack References
The Problem Nested Stacks Do Not Solve
Nested stacks keep related resources together. But what about shared infrastructure that multiple independent stacks need to reference?
Your networking team creates a VPC stack: VPC, subnets, route tables, NAT gateways. This is the foundation. Three separate application teams each deploy their own stacks that need to run inside that VPC.
With nested stacks, the networking stack would need to be the parent of all three application stacks. That creates an artificial coupling the networking team owns the deployment of all application stacks. This does not reflect how real teams work.
The right model is: the networking stack exists independently, exports its values, and each application stack imports what it needs.
The Solution: Exports and ImportValue
A stack can export named values. Any other stack in the same region and account can import those values.
Exporting stack (networking stack):
Outputs:
VpcId:
Description: VPC ID for use by application stacks
Value: !Ref MyVPC
Export:
Name: SharedNetwork-VpcId
PrivateSubnetIds:
Description: Comma-separated private subnet IDs
Value: !Join [",", [!Ref PrivateSubnetA, !Ref PrivateSubnetB]]
Export:
Name: SharedNetwork-PrivateSubnetIds
AppSecurityGroup:
Description: Security group for application instances
Value: !Ref AppSG
Export:
Name: SharedNetwork-AppSecurityGroupId
Importing stack (application stack):
Resources:
AppServer:
Type: AWS::EC2::Instance
Properties:
SubnetId: !Select
- 0
- !Split [",", !ImportValue SharedNetwork-PrivateSubnetIds]
SecurityGroupIds:
- !ImportValue SharedNetwork-AppSecurityGroupId
VpcId: !ImportValue SharedNetwork-VpcId
Naming Convention
Export names must be unique within a region and account. The convention StackName-ResourceName is standard. You can also use !Sub with pseudo parameters for uniqueness:
Export:
Name: !Sub "${AWS::StackName}-VpcId"
The Critical Constraint
You cannot delete an exporting stack while any importing stack exists. CloudFormation prevents it. This is by design it enforces that shared infrastructure cannot be removed while it is in use.
This means Cross-Stack References create a real dependency at the infrastructure level. Your networking team cannot tear down the VPC stack without first removing all application stacks that import from it.
Plan your exports carefully. Only export what genuinely needs to be shared. Do not export everything.
Cross-Stack vs Nested Stacks. When to Use Which
| Scenario | Use |
|---|---|
| Related components in one deployment, managed together | Nested Stacks |
| Shared infrastructure consumed by independent teams/stacks | Cross-Stack References |
| One team controls everything | Nested Stacks |
| Multiple teams share a foundation | Cross-Stack References |
StackSets
The Problem
You have a security baseline: CloudTrail enabled, AWS Config rules in place, specific IAM roles for your operations team, and a default VPC security configuration. You need all of this in every AWS account and every region in your organization.
You have 12 accounts and deploy to 4 regions. That is 48 stacks. You could deploy them one by one. Or you could use StackSets.
What StackSets Are
A StackSet lets you deploy a single CloudFormation template across multiple AWS accounts and multiple regions in a single operation. You define the template once, specify the target accounts and regions, and CloudFormation handles the deployment everywhere.
# CLI command to deploy a StackSet
aws cloudformation create-stack-set \
--stack-set-name SecurityBaseline \
--template-url https://s3.amazonaws.com/mybucket/security-baseline.yaml \
--permission-model SERVICE_MANAGED \
--auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false
Permission Models
SELF_MANAGED You manually create IAM roles in each target account that trust the administrator account. Full control, more setup.
SERVICE_MANAGED Uses AWS Organizations integration. CloudFormation assumes roles automatically. Supports auto-deployment: when a new account joins the organization, the StackSet deploys to it automatically. This is the recommended model for most organizations.
Deployment Options
# Deploy to specific accounts and regions
aws cloudformation create-stack-instances \
--stack-set-name SecurityBaseline \
--accounts 111111111111 222222222222 333333333333 \
--regions us-east-1 eu-west-1 ap-southeast-1 \
--operation-preferences MaxConcurrentPercentage=25,FailureTolerancePercentage=10
MaxConcurrentPercentage controls how many target accounts are deployed to simultaneously. FailureTolerancePercentage controls how many can fail before the operation stops.
Failure Handling
If a stack instance fails in one account/region, StackSets can continue deploying to others (depending on your failure tolerance settings) or stop entirely. Failed instances can be retried without redeploying to successful targets.
What Would Happen Without StackSets
You write automation scripts. You loop through accounts and regions. You track which deployments succeeded and which failed. You handle retries manually. You update 48 stacks individually when the template changes. StackSets replace all of that with a managed, auditable, retryable deployment system.
Limitation
StackSets deploy the same template to all targets. If you need different configurations per account or region, you pass parameters but all targets share the same template structure. Highly variable per-account configurations are better handled at the application layer or with separate stacks.
Deletion Policy
The Problem
By default, when you delete a CloudFormation stack, every resource in it is deleted. For an RDS database or an S3 bucket, this means permanent data loss. This default makes sense for stateless resources and temporary environments. It is dangerous for production data.
The Solution
DeletionPolicy lets you control what happens to a resource when its stack is deleted.
Resources:
ProductionDatabase:
Type: AWS::RDS::DBInstance
DeletionPolicy: Snapshot
Properties:
DBInstanceClass: db.m5.large
Engine: mysql
DBName: myapp
LogBucket:
Type: AWS::S3::Bucket
DeletionPolicy: Retain
Properties:
BucketName: myapp-logs
TempQueue:
Type: AWS::SQS::Queue
DeletionPolicy: Delete
Properties:
QueueName: myapp-temp
The Three Options
Delete Default. The resource is deleted when the stack is deleted.
Retain The resource is not deleted. CloudFormation removes it from the stack but leaves the physical resource running in AWS. You become responsible for managing it manually.
Snapshot Only available for EBS volumes, RDS instances, RDS clusters, Redshift clusters, and ElastiCache clusters. A final snapshot is taken before deletion. The resource is then deleted. The snapshot persists and can be used to restore.
UpdateReplacePolicy
There is a related but distinct attribute: UpdateReplacePolicy. This controls what happens to the old resource when an update requires replacement CloudFormation creates a new resource and must decide what to do with the old one.
Resources:
Database:
Type: AWS::RDS::DBInstance
DeletionPolicy: Snapshot
UpdateReplacePolicy: Snapshot
Properties:
DBInstanceClass: db.m5.large
Set both DeletionPolicy and UpdateReplacePolicy to Snapshot for any resource containing data that you cannot afford to lose.
Limitation
Retain does not mean the resource is protected from modifications made outside of CloudFormation. It only means CloudFormation will not delete it when the stack is deleted. Once retained, the resource is no longer managed by any stack it is orphaned in your account.
Stack Roles
The Problem
When you deploy a CloudFormation stack, CloudFormation uses your IAM identity to create resources. This means you need permissions to create EC2 instances, RDS databases, IAM roles, S3 buckets everything in the template. In practice, this means your user or role needs very broad permissions.
This creates two problems:
- Least privilege violation. Your identity has permissions far beyond what it needs for day-to-day work, just because it needs to be able to deploy infrastructure.
- Privilege escalation risk. A CloudFormation template can create IAM roles. If you can deploy any template, you can create an IAM role with AdministratorAccess and assume it.
The Solution: Stack Roles
Instead of using your own identity, CloudFormation assumes a specific IAM role to perform all resource operations. You pass this role when creating or updating the stack.
Your identity only needs: cloudformation:CreateStack, cloudformation:UpdateStack, iam:PassRole (to pass the stack role to CloudFormation). CloudFormation then uses the stack role not your identity — to create EC2 instances, RDS databases, IAM roles, and everything else.
aws cloudformation create-stack \
--stack-name my-app \
--template-url https://s3.amazonaws.com/mybucket/template.yaml \
--role-arn arn:aws:iam::123456789012:role/CloudFormationDeployRole
The Stack Role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "cloudformation.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
The role's trust policy allows CloudFormation to assume it. The role's permission policies define what CloudFormation can create on your behalf.
The Security Architecture
With stack roles, you can define exactly which resources your CloudFormation deployments can create. A developer can deploy CloudFormation stacks but cannot create arbitrary AWS resources directly. They can only do what the stack role permits and only through CloudFormation, where every change is tracked and auditable.
This is how mature organizations operate CloudFormation in production. Infrastructure changes go through CloudFormation. CloudFormation uses a controlled role. The role is the policy enforcement point.
What Would Happen Without Stack Roles
Every CloudFormation operator needs direct permissions to create every resource type in the templates they deploy. Permissions sprawl. Security boundaries erode. There is no clear separation between "can deploy through approved process" and "can create arbitrary resources."
ChangeSets
The Problem
You have a production stack. You want to update it. The update involves changing the instance type of an EC2 instance, updating an IAM policy, and modifying an S3 bucket configuration. What exactly will CloudFormation do? Will it modify the instance in place? Will it replace it? Will the replacement cause downtime? Will data be lost?
If you run the update directly, you find out in production.
The Solution: ChangeSets
A ChangeSet lets you preview exactly what CloudFormation will do before committing to it. You create a ChangeSet from your updated template, review the planned actions, and then decide whether to execute it.
# Create a ChangeSet
aws cloudformation create-change-set \
--stack-name my-production-app \
--change-set-name planned-update-2024-04 \
--template-url https://s3.amazonaws.com/mybucket/template-v2.yaml \
--parameters ParameterKey=InstanceType,ParameterValue=m5.large
# Review the ChangeSet
aws cloudformation describe-change-set \
--stack-name my-production-app \
--change-set-name planned-update-2024-04
# Execute if acceptable
aws cloudformation execute-change-set \
--stack-name my-production-app \
--change-set-name planned-update-2024-04
What the ChangeSet Shows
For each resource that will be affected, the ChangeSet shows:
-
Action -
Add,Modify, orRemove -
Replacement -
True,False, orConditional - Scope - Which properties are changing
- Details - The specific changes
Action: Modify
LogicalResourceId: WebServer
ResourceType: AWS::EC2::Instance
Replacement: True
Scope: [Properties]
Details:
- Attribute: Properties
Name: InstanceType
RequiresRecreation: Always
Replacement: True means this change will delete the existing instance and create a new one. In production, that means downtime unless you have architected for it. You now know this before executing the change.
What Would Happen Without ChangeSets
You deploy the update and discover the consequences in production. An unexpected instance replacement causes 90 seconds of downtime on a Sunday evening instead of during a planned maintenance window. A removed security group rule breaks a critical connection between services. ChangeSets move these discoveries from runtime to review time.
ChangeSets and Drift
ChangeSets do not show you what differs from the current running state they show you what will change compared to the current stack template. If the actual infrastructure has drifted from the template (someone manually changed something), ChangeSets will not catch it. Stack drift detection is a separate feature.
Custom Resources
The Problem That Built-In Resources Cannot Solve
CloudFormation supports hundreds of AWS resource types. But it does not support everything.
You need to:
- Populate an S3 bucket with default content after it is created
- Register an AMI from a snapshot and get back the AMI ID to use in your template
- Look up a value from an external API during stack creation
- Perform a database migration as part of a stack update
- Create a resource in a third-party system (Datadog, PagerDuty, Cloudflare)
None of these are standard CloudFormation resource types. Without Custom Resources, you would do these steps manually before or after the stack, breaking the "everything in the template" principle.
The Solution Custom Resources
A Custom Resource invokes a Lambda function (or an HTTPS endpoint) when the resource is created, updated, or deleted. Your Lambda function does whatever the built-in CloudFormation resource types cannot and returns a result that the template can use.
How It Works
- CloudFormation encounters the Custom Resource during stack operations.
- It sends an HTTPS request to the Lambda function with an event containing the operation type (
Create,Update,Delete), the resource properties, and a pre-signed S3 URL to send the response to. - Your Lambda function performs the custom logic.
- The Lambda sends a JSON response to the S3 URL indicating success or failure, and optionally returning data attributes.
- CloudFormation reads the response and either continues or fails the stack.
The Code
Template:
Resources:
CustomConfigLookup:
Type: AWS::CloudFormation::CustomResource
Properties:
ServiceToken: !GetAtt ConfigLookupFunction.Arn
Environment: !Ref EnvironmentName
ConfigKey: database/endpoint
ConfigLookupFunction:
Type: AWS::Lambda::Function
Properties:
Runtime: python3.12
Handler: index.handler
Role: !GetAtt LambdaRole.Arn
Code:
ZipFile: |
import json
import boto3
import urllib3
def handler(event, context):
http = urllib3.PoolManager()
try:
request_type = event['RequestType']
props = event['ResourceProperties']
if request_type in ['Create', 'Update']:
# Look up value from SSM or external API
ssm = boto3.client('ssm')
key = f"/{props['Environment']}/{props['ConfigKey']}"
value = ssm.get_parameter(Name=key)['Parameter']['Value']
send_response(http, event, 'SUCCESS', {
'ConfigValue': value
})
elif request_type == 'Delete':
# Nothing to clean up for a lookup
send_response(http, event, 'SUCCESS', {})
except Exception as e:
send_response(http, event, 'FAILED', {}, str(e))
def send_response(http, event, status, data, reason=""):
body = json.dumps({
'Status': status,
'Reason': reason,
'PhysicalResourceId': event.get('PhysicalResourceId', 'custom-resource'),
'StackId': event['StackId'],
'RequestId': event['RequestId'],
'LogicalResourceId': event['LogicalResourceId'],
'Data': data
})
http.request('PUT', event['ResponseURL'],
body=body,
headers={'Content-Type': 'application/json'})
# Reference the returned value in another resource
Database:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: !GetAtt CustomConfigLookup.ConfigValue
The Three Events Your Lambda Must Handle
| Event | When | What Your Lambda Should Do |
|---|---|---|
Create |
Stack creation or new resource | Perform the action, return data |
Update |
Stack update with changed properties | Re-perform the action with new values, return updated data |
Delete |
Stack deletion or resource removal | Clean up anything you created during Create |
Failing to handle Delete properly is the most common Custom Resource bug. If you create an external resource during Create and do not clean it up during Delete, it persists after the stack is gone and becomes orphaned infrastructure.
What Would Happen Without Custom Resources
The "everything in the template" principle breaks. You have pre-deployment scripts, post-deployment scripts, manual steps in runbooks. The stack no longer fully represents the deployed system. Custom Resources close this gap.
The Critical Timeout Consideration
CloudFormation waits up to one hour for a Custom Resource response. If your Lambda times out without sending a response to the pre-signed S3 URL, the stack will wait for the full hour before timing out. Always wrap your Lambda in a try-except and always call the response URL even on failure before the Lambda exits.
Putting It All Together. The Complete Pattern
Every feature in CloudFormation exists because there was a real problem that could not be solved without it. Here is how they connect:
Template Parameters → Make templates reusable across environments
Pseudo Parameters → Make templates reusable across regions and accounts
Mappings → Resolve environment-specific values at deploy time
Conditions → Create or configure resources conditionally
Intrinsic Functions → Wire resources together and build dynamic values
Outputs → Surface useful values and enable cross-stack sharing
Cross-Stack References → Share foundation infrastructure across independent stacks
Nested Stacks → Decompose large templates into manageable units
StackSets → Deploy consistently across accounts and regions
DependsOn → Control creation order for implicit dependencies
Wait Conditions + cfn-signal → Pause until application bootstrap is complete
cfn-init → Declare instance configuration instead of scripting it
cfn-hup → Keep instance configuration in sync with template changes
Deletion Policy → Protect data on stack deletion or replacement
Stack Roles → Enforce least privilege for infrastructure deployments
ChangeSets → Preview changes before applying them in production
Custom Resources → Extend CloudFormation to anything Lambda can do
None of these features are optional in a serious production environment. Each one closes a gap that, without it, requires manual intervention, custom scripting, or accepted risk.
CloudFormation is not just syntax. It is a system for describing, deploying, and maintaining infrastructure as code with every feature designed to solve a specific failure mode that organizations encountered in practice.
Written by Onyedikachi Obidiegwu | Cloud Security Engineer*
Top comments (0)