DEV Community

Cover image for Infrastructure as Code with AWS CloudFormation From Fundamentals to Production Patterns
Kachi
Kachi

Posted on

Infrastructure as Code with AWS CloudFormation From Fundamentals to Production Patterns

By the end of this guide, you will not just know what CloudFormation does. You will understand why each feature exists, what problem it solves, what breaks without it, and how features chain together to solve real infrastructure problems. Every concept is introduced with a problem first, not a definition first.


Table of Contents

  1. What is CloudFormation and Why It Matters
  2. Logical and Physical Resources
  3. Templates: Portable vs Non-Portable
  4. Template Parameters and Pseudo Parameters
  5. Intrinsic Functions
  6. Mappings
  7. Outputs
  8. Conditions
  9. DependsOn
  10. Wait Conditions and cfn-signal
  11. cfn-init
  12. cfn-hup
  13. Nested Stacks
  14. Cross-Stack References
  15. StackSets
  16. Deletion Policy
  17. Stack Roles
  18. ChangeSets
  19. Custom Resources

What is CloudFormation and Why It Matters

The Problem It Solves

Imagine you are building a three-tier web application. You need a VPC, subnets, an EC2 instance, a security group, an RDS database, an S3 bucket, and a load balancer. You click through the AWS Console and get it working. Three weeks later, a colleague asks you to replicate the exact environment for staging. You click through everything again. Two hours later, something is different you missed a security group rule, the subnet CIDR is wrong, and the RDS instance has a different parameter group.

This is the core problem. Manual infrastructure is not repeatable, not auditable, and not scalable.

CloudFormation is AWS's answer. You describe your infrastructure in a template a YAML or JSON file and CloudFormation takes that description and builds the real AWS resources. The template becomes the single source of truth.

What CloudFormation Actually Is

CloudFormation is a declarative infrastructure provisioning engine. You declare what you want, not how to create it. CloudFormation figures out the how, including the order in which resources must be created, the dependencies between them, and what to clean up if something fails.

Why It Matters Beyond Convenience

  • Repeatability The same template deployed ten times produces ten identical environments.
  • Version control Your infrastructure lives in Git. Every change is tracked. Every rollback is a git revert.
  • Accountability Who changed the security group? Check the commit history.
  • Speed A 40-resource stack that takes two hours to click through manually deploys in minutes.
  • Disaster recovery When a region fails, you re-deploy the template in another region. Your infrastructure is code, not memory.
  • Cost Stacks can be deleted entirely after use. Temporary environments cost nothing when they are gone.

CloudFormation is not just a tool. It is a practice the practice of treating infrastructure with the same discipline as application code.


Logical and Physical Resources

The Concept

This is the foundation. If you misunderstand this, everything else will be confusing.

A logical resource is what you write in the template. It is a declaration a name you give to a resource and a description of what you want.

A physical resource is what AWS actually creates when CloudFormation processes your template. It has a real ID an instance ID, a bucket name, a security group ID.

The Relationship

When you write this:

Resources:
  MyWebServer:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t3.micro
      ImageId: ami-0c55b159cbfafe1f0
Enter fullscreen mode Exit fullscreen mode

MyWebServer is the logical resource. The actual EC2 instance that gets created say i-0a1b2c3d4e5f6 is the physical resource.

The logical resource exists only in your template. The physical resource exists in AWS.

Why This Distinction Matters

CloudFormation tracks the mapping between logical and physical resources in something called the stack. This mapping is what enables CloudFormation to:

  • Update a resource when you change the template
  • Replace a resource when a change requires replacement
  • Delete all physical resources when you delete the stack

If you manually delete a physical resource that CloudFormation manages, the stack loses track of it. On the next update or delete, CloudFormation will fail or behave unpredictably because the physical resource it expects to find is gone.

Rule: Never manually modify a physical resource that is managed by a CloudFormation stack. Make the change in the template instead.

What Happens During Stack Creation

  1. CloudFormation reads your template.
  2. It builds a dependency graph of all logical resources.
  3. It creates physical resources in the correct order.
  4. It maps each logical resource to its new physical resource.
  5. The stack reaches CREATE_COMPLETE.

If any resource creation fails, CloudFormation rolls back it deletes everything it already created and the stack reaches ROLLBACK_COMPLETE. Nothing is left half-built.


Templates: Portable vs Non-Portable

The Problem with Non-Portable Templates

Here is a simple template:

AWSTemplateFormatVersion: "2010-09-09"
Description: Simple EC2 instance

Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t3.micro
      ImageId: ami-0c55b159cbfafe1f0
      SubnetId: subnet-0abc1234
Enter fullscreen mode Exit fullscreen mode

This works. Once. In one region. For one person.

The ImageId is an AMI ID. AMI IDs are region-specific. ami-0c55b159cbfafe1f0 in us-east-1 is not the same image in eu-west-1 in fact, it probably does not exist there at all. The SubnetId is hardcoded to a specific subnet in a specific AWS account.

If you or anyone else tries to deploy this template in a different region or a different account, it will fail.

This is a non-portable template. It has hardcoded values that only work in one specific context.

What Would Happen If You Ran It

  • In your account, same region: Works.
  • In your account, different region: Fails. AMI ID does not exist.
  • In a colleague's account: Fails. Subnet ID does not exist.
  • In a CI/CD pipeline targeting staging: Fails. Both IDs are wrong.

The Solution: Portable Templates

A portable template contains no hardcoded environment-specific values. Instead, it accepts inputs, references dynamic values, and uses lookup mechanisms to resolve environment-specific details at deploy time.

The tools that enable portability are:

  • Parameters Inputs you provide at deploy time
  • Pseudo Parameters Values AWS provides automatically (account ID, region, etc.)
  • Intrinsic Functions Functions that resolve values dynamically
  • Mappings Lookup tables built into the template

The portable version of the above template:

AWSTemplateFormatVersion: "2010-09-09"
Description: Portable EC2 instance

Parameters:
  SubnetId:
    Type: AWS::EC2::Subnet::Id
    Description: The subnet to launch the instance into

Mappings:
  RegionAMIMap:
    us-east-1:
      AMI: ami-0c55b159cbfafe1f0
    eu-west-1:
      AMI: ami-0d71ea30463e0ff49

Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: t3.micro
      ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
      SubnetId: !Ref SubnetId
Enter fullscreen mode Exit fullscreen mode

Now this template works in any region that is in the map, and in any account. The subnet is provided at deploy time. The AMI is looked up based on which region you are deploying to.

This is the difference between a template you write once and a template you write once and use everywhere.


Template Parameters and Pseudo Parameters

Template Parameters

Parameters make your template an interface. Instead of hardcoding values, you expose inputs that the person deploying the template or an automation system provides at deploy time.

Parameters:
  EnvironmentName:
    Type: String
    Default: development
    AllowedValues:
      - development
      - staging
      - production
    Description: The environment this stack is being deployed to

  InstanceType:
    Type: String
    Default: t3.micro
    AllowedValues:
      - t3.micro
      - t3.small
      - t3.medium
    Description: EC2 instance type

  DBPassword:
    Type: String
    NoEcho: true
    MinLength: 8
    MaxLength: 32
    Description: Database password will not be displayed
Enter fullscreen mode Exit fullscreen mode

Key parameter types:

Type What It Validates
String Any string
Number Numeric value
AWS::EC2::Subnet::Id Must be a valid subnet ID in your account
AWS::EC2::KeyPair::KeyName Must be a valid key pair name
AWS::SSM::Parameter::Value<String> Pulls value from Systems Manager Parameter Store

The NoEcho: true property is important. It prevents the value from being displayed in the Console or CLI output. Use it for passwords and secrets. Note: it is not encryption. The value is still stored in the stack. Do not use it for genuinely sensitive production secrets use Secrets Manager or SSM SecureString instead.

To reference a parameter inside the template:

Properties:
  InstanceType: !Ref InstanceType
Enter fullscreen mode Exit fullscreen mode

!Ref on a parameter returns its value. That is it. Simple.

Pseudo Parameters

Pseudo parameters are values that AWS provides automatically. You do not declare them. They are always available.

Pseudo Parameter What It Returns
AWS::AccountId Your 12-digit AWS account ID
AWS::Region The region being deployed to, e.g. eu-west-1
AWS::StackName The name of the current stack
AWS::StackId The full ARN of the stack
AWS::NoValue Used to conditionally remove a property
AWS::Partition aws, aws-cn, or aws-us-gov
AWS::URLSuffix amazonaws.com or region-specific suffix

Usage example:

Resources:
  MyBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: !Sub "myapp-${AWS::AccountId}-${AWS::Region}-logs"
Enter fullscreen mode Exit fullscreen mode

This creates a bucket name like myapp-123456789012-eu-west-1-logs. Because account IDs and regions are globally unique, this bucket name will never collide with anyone else's, and the same template deployed to multiple regions will always produce a unique bucket name.

AWS::NoValue is used when you want to conditionally exclude a property:

Properties:
  DBSnapshotIdentifier: !If [IsProduction, !Ref SnapshotId, !Ref AWS::NoValue]
Enter fullscreen mode Exit fullscreen mode

If the condition IsProduction is false, the property is completely removed from the resource definition. AWS sees it as if you never wrote it.


Intrinsic Functions

What They Are

Intrinsic functions are built-in functions that CloudFormation evaluates when it processes your template. They let you compute values, reference other resources, join strings, look up mappings, and make decisions all at deploy time.

You cannot use them in every part of a template. They work in the Properties section of resources, in Outputs, and in Metadata. They do not work in the Parameters section.

The Core Functions

!Ref

Returns the value of a parameter or the default identifier of a resource.

# On a parameter: returns the parameter value
InstanceType: !Ref InstanceTypeParam

# On a resource: returns the resource's primary identifier
SubnetId: !Ref MySubnet   # Returns the Subnet ID of MySubnet
Enter fullscreen mode Exit fullscreen mode

What !Ref returns depends on the resource type. For an EC2 instance it returns the instance ID. For an S3 bucket it returns the bucket name. For a security group it returns the group ID. Always check the CloudFormation documentation for what !Ref returns for each resource type.

!GetAtt

Returns a specific attribute of a resource not just the primary identifier.

# Get the ARN of a Lambda function
FunctionArn: !GetAtt MyLambdaFunction.Arn

# Get the DNS name of a load balancer
LoadBalancerDNS: !GetAtt MyALB.DNSName

# Get the ARN of an IAM role
RoleArn: !GetAtt MyRole.Arn
Enter fullscreen mode Exit fullscreen mode

!GetAtt is how you wire resources together. You create a load balancer and then pass its DNS name to your Route 53 record. You create an IAM role and pass its ARN to a Lambda function.

!Sub

Substitutes variables into a string. The most readable way to build dynamic strings.

# Simple substitution with pseudo parameters
BucketName: !Sub "myapp-${AWS::AccountId}-${AWS::Region}"

# Substitution with logical resource references
Description: !Sub "This is the ${EnvironmentName} environment stack"

# Substitution with explicit variable map
Command: !Sub
  - "aws s3 cp s3://${BucketName}/config.json /etc/myapp/config.json"
  - BucketName: !Ref MyBucket
Enter fullscreen mode Exit fullscreen mode

!Sub is cleaner than !Join for most string-building tasks. Use it by default and only reach for !Join when you are working with lists.

!Join

Joins a list of values with a delimiter.

# Join with no delimiter
PolicyArn: !Join ["", ["arn:aws:iam::", !Ref AWS::AccountId, ":root"]]

# Join with comma delimiter
AllowedOrigins: !Join [",", [!Ref Domain1, !Ref Domain2]]
Enter fullscreen mode Exit fullscreen mode

!Select

Returns a single value from a list by index.

# Get the first availability zone in the region
AvailabilityZone: !Select [0, !GetAZs ""]

# Get the second
AvailabilityZone: !Select [1, !GetAZs ""]
Enter fullscreen mode Exit fullscreen mode

!GetAZs returns all availability zones in a region. !Select picks one. This combination is used constantly when creating subnets across AZs.

!FindInMap

Looks up a value in a Mapping (covered in the next section).

ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
Enter fullscreen mode Exit fullscreen mode

!If

Returns one of two values based on a condition.

InstanceType: !If [IsProduction, m5.xlarge, t3.micro]
Enter fullscreen mode Exit fullscreen mode

!And, !Or, !Not, !Equals

Logical operators used when defining conditions.

Conditions:
  IsProductionAndEU: !And
    - !Condition IsProduction
    - !Equals [!Ref AWS::Region, "eu-west-1"]
Enter fullscreen mode Exit fullscreen mode

!ImportValue

Imports a value exported by another stack. This is how Cross-Stack References work — covered in detail in section 14.

VpcId: !ImportValue SharedInfra-VpcId
Enter fullscreen mode Exit fullscreen mode

How Functions Compose

The power comes from nesting. Functions can be composed:

SecurityGroupId: !Select
  - 0
  - !Split [",", !ImportValue SharedInfra-SecurityGroupIds]
Enter fullscreen mode Exit fullscreen mode

Here: import a comma-separated string of security group IDs from another stack, split it into a list, then select the first one. Three functions, one clean result.

This is not just syntax. It is the way CloudFormation templates become self-describing infrastructure documents that adapt to context.


Mappings

What They Are

Mappings are lookup tables built into your template. They let you define a set of key-value pairs and then look up a value at deploy time based on known context like which region you are deploying to, or which environment was selected.

Structure

Mappings:
  RegionAMIMap:
    us-east-1:
      AMI: ami-0c55b159cbfafe1f0
      BastionAMI: ami-0a887e401f7654935
    eu-west-1:
      AMI: ami-0d71ea30463e0ff49
      BastionAMI: ami-08d658f84a6d84a80
    ap-southeast-1:
      AMI: ami-01f7527546b557442
      BastionAMI: ami-0c5199d385b432989

  EnvironmentConfig:
    development:
      InstanceType: t3.micro
      MultiAZ: false
      DeletionProtection: false
    production:
      InstanceType: m5.large
      MultiAZ: true
      DeletionProtection: true
Enter fullscreen mode Exit fullscreen mode

Usage

Resources:
  MyInstance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
      InstanceType: !FindInMap [EnvironmentConfig, !Ref EnvironmentName, InstanceType]
Enter fullscreen mode Exit fullscreen mode

!FindInMap takes three arguments: the map name, the top-level key, and the second-level key. It returns the value at that intersection.

Limitations

Mappings are static. They are baked into the template at write time. You cannot populate them dynamically at deploy time, and you cannot look up values in external systems like SSM Parameter Store.

This means:

  • When an AMI is updated, you must update the mapping manually and redeploy.
  • You cannot have a different mapping value per account without writing account IDs into the template.

For dynamic lookups, the solution is SSM Parameter Store with AWS::SSM::Parameter::Value<String> parameter types, or Custom Resources (covered in section 19). The AWS-managed AMI parameter store path (/aws/service/ami-amazon-linux-latest/al2023-ami-kernel-default-x86_64) is the standard way to always get the latest Amazon Linux AMI without hardcoding AMI IDs at all.


Outputs

What They Are

Outputs are values that CloudFormation makes available after a stack is created or updated. They serve two purposes:

  1. Visibility Display useful information about what was created (endpoint URLs, resource IDs, ARNs).
  2. Cross-Stack References Export a value so other stacks can import it.

Structure

Outputs:
  WebServerPublicIP:
    Description: Public IP address of the web server
    Value: !GetAtt MyWebServer.PublicIp

  LoadBalancerDNS:
    Description: DNS name for the application load balancer
    Value: !GetAtt MyALB.DNSName

  VpcId:
    Description: VPC ID for use by other stacks
    Value: !Ref MyVPC
    Export:
      Name: !Sub "${AWS::StackName}-VpcId"
Enter fullscreen mode Exit fullscreen mode

When to Use Export

Only add Export when you intend for another stack to reference the value. Not every output needs to be exported. Exports create a dependency you cannot delete the exporting stack while any other stack is consuming its exports.

What Would Happen Without Outputs

Without outputs, you would need to go into the AWS Console or run CLI commands to find the DNS name of your load balancer, the ID of your VPC, or the ARN of your IAM role. Outputs surface these values automatically after deployment, making them available to operators, pipelines, and other stacks.

In a CI/CD pipeline:

# After CloudFormation deploy, grab the load balancer URL
ALB_URL=$(aws cloudformation describe-stacks \
  --stack-name my-app \
  --query "Stacks[0].Outputs[?OutputKey=='LoadBalancerDNS'].OutputValue" \
  --output text)

# Use it to run integration tests
curl -f "http://$ALB_URL/health"
Enter fullscreen mode Exit fullscreen mode

Outputs make your stack queryable. This is essential for automated pipelines.


Conditions

The Problem

You want one template that can deploy to both development and production, but the production environment needs a Multi-AZ RDS instance and an additional NAT Gateway, while development needs neither. Without conditions, you need two templates. With conditions, you need one.

Structure

Conditions are defined in the Conditions section and referenced in resources.

Parameters:
  EnvironmentName:
    Type: String
    AllowedValues: [development, production]

Conditions:
  IsProduction: !Equals [!Ref EnvironmentName, production]
  IsNotProduction: !Not [!Condition IsProduction]

Resources:
  PrimaryDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceClass: !If [IsProduction, db.m5.large, db.t3.micro]
      MultiAZ: !If [IsProduction, true, false]
      DeletionProtection: !If [IsProduction, true, false]

  NATGateway:
    Type: AWS::EC2::NatGateway
    Condition: IsProduction
    Properties:
      SubnetId: !Ref PublicSubnet
      AllocationId: !GetAtt ElasticIP.AllocationId
Enter fullscreen mode Exit fullscreen mode

The Condition: IsProduction on NATGateway means: only create this resource if IsProduction is true. The !If inside PrimaryDatabase means: use different property values depending on the condition.

Condition Operators

Conditions:
  IsProduction: !Equals [!Ref Env, production]
  IsUS: !Equals [!Ref AWS::Region, us-east-1]

  # Both must be true
  IsProductionUS: !And
    - !Condition IsProduction
    - !Condition IsUS

  # Either must be true
  IsProductionOrUS: !Or
    - !Condition IsProduction
    - !Condition IsUS

  # Invert
  IsNotProduction: !Not [!Condition IsProduction]
Enter fullscreen mode Exit fullscreen mode

Limitation

Conditions are evaluated at deploy time. They cannot change during stack execution. You cannot create a condition that says "if resource X was successfully created, then create resource Y" that is what DependsOn and WaitConditions handle.


DependsOn

The Problem

CloudFormation builds resources in parallel where possible. It determines the order automatically by following !Ref and !GetAtt references if resource B references resource A, CloudFormation knows to create A first.

But sometimes resource B does not reference resource A in its properties, yet it still needs A to exist before it can be created or function correctly. CloudFormation does not know about this implicit dependency, so it might try to create both simultaneously, and B fails because A is not ready.

The Solution

DependsOn explicitly tells CloudFormation: do not start creating this resource until that resource is complete.

Resources:
  MyVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16

  InternetGateway:
    Type: AWS::EC2::InternetGateway

  VPCGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref MyVPC
      InternetGatewayId: !Ref InternetGateway

  PublicSubnet:
    Type: AWS::EC2::Subnet
    DependsOn: VPCGatewayAttachment
    Properties:
      VpcId: !Ref MyVPC
      CidrBlock: 10.0.1.0/24
Enter fullscreen mode Exit fullscreen mode

Here, PublicSubnet references MyVPC via !Ref, so CloudFormation knows it must wait for the VPC. But the subnet does not reference VPCGatewayAttachment in its properties. Without DependsOn, CloudFormation might create the subnet before the gateway is attached, and routing will not work correctly.

A More Common Use Case — RDS and EC2

Resources:
  Database:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceClass: db.t3.micro
      Engine: mysql

  AppServer:
    Type: AWS::EC2::Instance
    DependsOn: Database
    Properties:
      UserData: !Base64
        !Sub |
          #!/bin/bash
          echo "DB_HOST=${Database.Endpoint.Address}" >> /etc/app/config
Enter fullscreen mode Exit fullscreen mode

The AppServer does reference Database via !Sub, which creates an implicit dependency. But DependsOn is also good practice here you want the database to be fully ready before the application server boots and tries to connect.

What Happens Without It

Without DependsOn where it is needed, resources are created in the wrong order. The stack may still reach CREATE_COMPLETE but your application may fail at runtime because the dependency was not ready when the dependent resource was configured.

Limitation

DependsOn only tells CloudFormation to wait until the resource creation is complete. It does not tell CloudFormation to wait until the resource is ready to serve traffic or until a process inside the resource has finished running. For that, you need Wait Conditions and cfn-signal.


Wait Conditions and cfn-signal

The Problem That DependsOn Cannot Solve

CloudFormation considers an EC2 instance "created" the moment the API call to launch it succeeds. From CloudFormation's perspective, the resource is done. But the instance has not finished booting. The operating system is still starting. Your bootstrap script the one that installs your application, configures the web server, and starts your process is still running.

If a second resource depends on the application being ready, DependsOn is not enough. CloudFormation will proceed the moment the EC2 API reports success, not when your application is actually ready.

The Solution: cfn-signal and WaitConditions

cfn-signal is a script that runs inside your EC2 instance and sends a signal back to CloudFormation: "I am done and I succeeded" or "I am done and I failed."

A WaitCondition is a CloudFormation resource that pauses the stack and waits for a specific number of signals before proceeding.

Together, they let you tell CloudFormation: "Wait until the bootstrap process inside the instance has finished before you continue creating other resources."

How It Works. Step by Step

  1. You create an EC2 instance with a UserData script that runs your bootstrap logic.
  2. At the end of the script, you call cfn-signal to send a success or failure signal.
  3. CloudFormation sees the signal and either continues or fails the stack.

The Code

Resources:
  WaitHandle:
    Type: AWS::CloudFormation::WaitConditionHandle

  WebServer:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
      InstanceType: t3.micro
      UserData:
        !Base64
          !Sub |
            #!/bin/bash -xe
            # Install and configure application
            yum update -y
            yum install -y httpd
            systemctl start httpd
            systemctl enable httpd
            echo "<h1>Hello from ${AWS::StackName}</h1>" > /var/www/html/index.html

            # Signal CloudFormation that bootstrap is complete
            /opt/aws/bin/cfn-signal -e $? \
              --stack ${AWS::StackName} \
              --resource WebServerWaitCondition \
              --region ${AWS::Region}

  WebServerWaitCondition:
    Type: AWS::CloudFormation::WaitCondition
    DependsOn: WebServer
    Properties:
      Handle: !Ref WaitHandle
      Timeout: 600
      Count: 1
Enter fullscreen mode Exit fullscreen mode

-e $? passes the exit code of the last command. If the script ran successfully, $? is 0 CloudFormation receives a success signal. If the script failed, $? is non-zero CloudFormation receives a failure signal, fails the wait condition, and rolls back the stack.

What Happens Without cfn-signal

Without signalling, CloudFormation marks the EC2 instance as CREATE_COMPLETE the moment the API call succeeds typically within 10-15 seconds. Any resource that depends on the instance being fully configured will attempt to use it before it is ready. Load balancer health checks will fail. Downstream resources will misconfigure. Your application stack reaches CREATE_COMPLETE in a broken state.

This is the most common cause of "CloudFormation says it worked but the application is not working" problems.

Timeout

The Timeout is in seconds. 600 means CloudFormation will wait up to 10 minutes for the signal. If no signal arrives, the stack times out and rolls back. Size this based on how long your bootstrap realistically takes, with a comfortable buffer.

Limitation

cfn-signal handles the moment of creation well. But what about ongoing configuration? What if you need to update the instance configuration after the stack is deployed, or re-apply configuration if it drifts? For that, you need cfn-init.


CloudFormation Init (cfn-init)

The Problem with UserData

UserData is a blunt instrument. It is a script that runs once at instance launch and that is it. If you update the CloudFormation template and the instance is not replaced, UserData does not re-run. If the configuration drifts someone manually changes a file on the instance there is no way for CloudFormation to detect or correct it.

Also, UserData is imperative: you write step-by-step instructions. The result depends entirely on the starting state of the machine. If any step fails partway through, you have a half-configured instance with no clean way to recover.

The Solution: cfn-init

cfn-init is a declarative configuration engine built into the CloudFormation helper tools. Instead of writing scripts that say "run these commands," you declare the desired state: "these packages should be installed, these files should exist with this content, these services should be running."

Configuration is written in the Metadata section of the resource using the AWS::CloudFormation::Init key.

Structure

Resources:
  WebServer:
    Type: AWS::EC2::Instance
    Metadata:
      AWS::CloudFormation::Init:
        config:
          packages:
            yum:
              httpd: []
              php: []

          files:
            /var/www/html/index.php:
              content: !Sub |
                <?php
                echo "<h1>Environment: ${EnvironmentName}</h1>";
                echo "<p>Stack: ${AWS::StackName}</p>";
                ?>
              mode: "000644"
              owner: apache
              group: apache

            /etc/httpd/conf.d/myapp.conf:
              content: |
                <VirtualHost *:80>
                    DocumentRoot /var/www/html
                    DirectoryIndex index.php
                </VirtualHost>
              mode: "000644"
              owner: root
              group: root

          services:
            sysvinit:
              httpd:
                enabled: true
                ensureRunning: true
                files:
                  - /etc/httpd/conf.d/myapp.conf
                packages:
                  yum:
                    - httpd

    Properties:
      ImageId: !FindInMap [RegionAMIMap, !Ref AWS::Region, AMI]
      InstanceType: t3.micro
      UserData:
        !Base64
          !Sub |
            #!/bin/bash -xe
            # Run cfn-init to apply the configuration
            /opt/aws/bin/cfn-init -v \
              --stack ${AWS::StackName} \
              --resource WebServer \
              --region ${AWS::Region}

            # Signal success or failure
            /opt/aws/bin/cfn-signal -e $? \
              --stack ${AWS::StackName} \
              --resource WebServerWaitCondition \
              --region ${AWS::Region}

  WebServerWaitCondition:
    Type: AWS::CloudFormation::WaitCondition
    DependsOn: WebServer
    Properties:
      Handle: !Ref WaitHandle
      Timeout: 600
      Count: 1

  WaitHandle:
    Type: AWS::CloudFormation::WaitConditionHandle
Enter fullscreen mode Exit fullscreen mode

The Four cfn-init Keys

Key What It Does
packages Installs system packages via yum, apt, rpm, or other package managers
files Creates files with specific content, permissions, and ownership
commands Runs shell commands in a specific order with optional test conditions
services Ensures services are started, enabled, and restarted when dependencies change

configSets Ordering Multiple Configurations

When you have complex configuration that needs to run in phases, use configSets:

Metadata:
  AWS::CloudFormation::Init:
    configSets:
      full_install:
        - install_cfn
        - install_base
        - install_app
        - configure_app

    install_cfn:
      files:
        /etc/cfn/cfn-hup.conf:
          content: !Sub |
            [main]
            stack=${AWS::StackId}
            region=${AWS::Region}
          mode: "000400"
          owner: root
          group: root

    install_base:
      packages:
        yum:
          httpd: []
          php: []
          php-mysqlnd: []

    install_app:
      files:
        /var/www/html/index.php:
          content: !Sub |
            <?php phpinfo(); ?>
          mode: "000644"
          owner: apache
          group: apache

    configure_app:
      services:
        sysvinit:
          httpd:
            enabled: true
            ensureRunning: true
Enter fullscreen mode Exit fullscreen mode

Then in UserData, reference the configSet:

/opt/aws/bin/cfn-init -v \
  --stack ${AWS::StackName} \
  --resource WebServer \
  --configsets full_install \
  --region ${AWS::Region}
Enter fullscreen mode Exit fullscreen mode

What Happens Without cfn-init

Without cfn-init, you rely entirely on UserData scripts. These are harder to maintain, harder to debug, and run only once at launch. Configuration drift is invisible, and updating configuration requires replacing the instance. cfn-init makes your instance configuration as declarative and auditable as your infrastructure definition.

Limitation

cfn-init applies configuration at launch. It does not continuously monitor or re-apply configuration when the template changes, unless the instance is replaced. For tracking template changes and re-applying configuration without replacing instances, you need cfn-hup.


cfn-hup

The Problem cfn-init Alone Cannot Solve

You update your CloudFormation template specifically, you change the content of a configuration file managed by cfn-init. You run aws cloudformation update-stack. CloudFormation processes the update.

If the EC2 instance is not being replaced (because the change does not require replacement only a metadata change), CloudFormation will not re-run cfn-init. The instance keeps running with the old configuration. The update completes, the stack reaches UPDATE_COMPLETE, and your configuration is silently out of date.

The Solution: cfn-hup

cfn-hup is a daemon that runs on the instance and polls the CloudFormation stack for changes to the resource's metadata. When it detects a change, it re-runs cfn-init to apply the updated configuration.

This gives you the ability to update instance configuration through CloudFormation without replacing the instance.

How to Set It Up

cfn-hup requires two configuration files on the instance, typically created via cfn-init itself:

Metadata:
  AWS::CloudFormation::Init:
    configSets:
      full_install:
        - install_cfn
        - install_app

    install_cfn:
      files:
        /etc/cfn/cfn-hup.conf:
          content: !Sub |
            [main]
            stack=${AWS::StackId}
            region=${AWS::Region}
            interval=5
          mode: "000400"
          owner: root
          group: root

        /etc/cfn/hooks.d/cfn-auto-reloader.conf:
          content: !Sub |
            [cfn-auto-reloader-hook]
            triggers=post.update
            path=Resources.WebServer.Metadata.AWS::CloudFormation::Init
            action=/opt/aws/bin/cfn-init -v \
              --stack ${AWS::StackName} \
              --resource WebServer \
              --configsets full_install \
              --region ${AWS::Region}
            runas=root
          mode: "000400"
          owner: root
          group: root

      services:
        sysvinit:
          cfn-hup:
            enabled: true
            ensureRunning: true
            files:
              - /etc/cfn/cfn-hup.conf
              - /etc/cfn/hooks.d/cfn-auto-reloader.conf
Enter fullscreen mode Exit fullscreen mode

How It Works

cfn-hup.conf tells cfn-hup which stack to watch and how often to poll (every 5 minutes in this example).

cfn-auto-reloader.conf tells cfn-hup what to do when it detects a change: watch the Metadata.AWS::CloudFormation::Init path of the WebServer resource, and when it changes, re-run cfn-init with the full configSet.

When you update the template — change a file, add a package, modify a service — cfn-hup detects the metadata change within the poll interval and re-applies the full configuration to the running instance.

The Complete Pattern

cfn-init, cfn-signal, and cfn-hup form a complete configuration management pattern:

  • cfn-init applies configuration at launch
  • cfn-signal tells CloudFormation that configuration is complete
  • cfn-hup keeps configuration in sync with the template over time

Together, they turn your EC2 instances into configuration-as-code managed infrastructure that stays synchronized with your CloudFormation templates without requiring replacement.


Nested Stacks

The Problem with Single Large Stacks

A CloudFormation stack has a hard limit of 500 resources. But the real problem appears well before that limit.

A template with 80+ resources becomes difficult to read, difficult to test, and difficult to reason about. Changes to one part of the template require updating and deploying the entire thing. Teams working on different parts of the infrastructure step on each other. Re-using patterns across projects is impossible because everything is in one monolithic file.

This is the same problem that led software engineers to adopt functions, modules, and packages. The solution is the same: decomposition.

The Solution — Nested Stacks

A nested stack is a CloudFormation resource of type AWS::CloudFormation::Stack. It references another CloudFormation template stored in S3 and deploys it as a child stack. The parent stack manages the lifecycle of its child stacks.

Resources:
  NetworkStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/mybucket/templates/network.yaml
      Parameters:
        EnvironmentName: !Ref EnvironmentName
        VpcCidr: 10.0.0.0/16
      TimeoutInMinutes: 20

  AppStack:
    Type: AWS::CloudFormation::Stack
    DependsOn: NetworkStack
    Properties:
      TemplateURL: https://s3.amazonaws.com/mybucket/templates/app.yaml
      Parameters:
        EnvironmentName: !Ref EnvironmentName
        VpcId: !GetAtt NetworkStack.Outputs.VpcId
        SubnetIds: !GetAtt NetworkStack.Outputs.SubnetIds
      TimeoutInMinutes: 30

  DatabaseStack:
    Type: AWS::CloudFormation::Stack
    DependsOn: NetworkStack
    Properties:
      TemplateURL: https://s3.amazonaws.com/mybucket/templates/database.yaml
      Parameters:
        SubnetIds: !GetAtt NetworkStack.Outputs.SubnetIds
        DBPassword: !Ref DBPassword
Enter fullscreen mode Exit fullscreen mode

The parent template is thin it orchestrates the child stacks and passes data between them via outputs and parameters. Each child template is a focused, independently testable unit.

Passing Data Between Nested Stacks

Notice !GetAtt NetworkStack.Outputs.VpcId. This is how you pass data from one nested stack to another:

  1. The network.yaml template has an Outputs section that exports VpcId.
  2. The parent references it via !GetAtt NetworkStack.Outputs.VpcId.
  3. The parent passes it as a parameter to AppStack.
# network.yaml Outputs section
Outputs:
  VpcId:
    Value: !Ref MyVPC
  SubnetIds:
    Value: !Join [",", [!Ref SubnetA, !Ref SubnetB]]
Enter fullscreen mode Exit fullscreen mode

What Would Happen Without Nested Stacks

All resources live in one template. It grows. Team members modify the same file. Changes to the network layer require touching the application template. Testing one component requires deploying everything. The template becomes the infrastructure equivalent of a 5,000-line monolithic application technically functional but practically unmaintainable.

Limitation

Nested stacks solve the organization and size problem within a single deployment. They do not solve the problem of sharing infrastructure across multiple independent teams or projects. If Team A creates a VPC that Team B also needs, nested stacks are not the right tool Cross-Stack References are.


Cross-Stack References

The Problem Nested Stacks Do Not Solve

Nested stacks keep related resources together. But what about shared infrastructure that multiple independent stacks need to reference?

Your networking team creates a VPC stack: VPC, subnets, route tables, NAT gateways. This is the foundation. Three separate application teams each deploy their own stacks that need to run inside that VPC.

With nested stacks, the networking stack would need to be the parent of all three application stacks. That creates an artificial coupling the networking team owns the deployment of all application stacks. This does not reflect how real teams work.

The right model is: the networking stack exists independently, exports its values, and each application stack imports what it needs.

The Solution: Exports and ImportValue

A stack can export named values. Any other stack in the same region and account can import those values.

Exporting stack (networking stack):

Outputs:
  VpcId:
    Description: VPC ID for use by application stacks
    Value: !Ref MyVPC
    Export:
      Name: SharedNetwork-VpcId

  PrivateSubnetIds:
    Description: Comma-separated private subnet IDs
    Value: !Join [",", [!Ref PrivateSubnetA, !Ref PrivateSubnetB]]
    Export:
      Name: SharedNetwork-PrivateSubnetIds

  AppSecurityGroup:
    Description: Security group for application instances
    Value: !Ref AppSG
    Export:
      Name: SharedNetwork-AppSecurityGroupId
Enter fullscreen mode Exit fullscreen mode

Importing stack (application stack):

Resources:
  AppServer:
    Type: AWS::EC2::Instance
    Properties:
      SubnetId: !Select
        - 0
        - !Split [",", !ImportValue SharedNetwork-PrivateSubnetIds]
      SecurityGroupIds:
        - !ImportValue SharedNetwork-AppSecurityGroupId
      VpcId: !ImportValue SharedNetwork-VpcId
Enter fullscreen mode Exit fullscreen mode

Naming Convention

Export names must be unique within a region and account. The convention StackName-ResourceName is standard. You can also use !Sub with pseudo parameters for uniqueness:

Export:
  Name: !Sub "${AWS::StackName}-VpcId"
Enter fullscreen mode Exit fullscreen mode

The Critical Constraint

You cannot delete an exporting stack while any importing stack exists. CloudFormation prevents it. This is by design it enforces that shared infrastructure cannot be removed while it is in use.

This means Cross-Stack References create a real dependency at the infrastructure level. Your networking team cannot tear down the VPC stack without first removing all application stacks that import from it.

Plan your exports carefully. Only export what genuinely needs to be shared. Do not export everything.

Cross-Stack vs Nested Stacks. When to Use Which

Scenario Use
Related components in one deployment, managed together Nested Stacks
Shared infrastructure consumed by independent teams/stacks Cross-Stack References
One team controls everything Nested Stacks
Multiple teams share a foundation Cross-Stack References

StackSets

The Problem

You have a security baseline: CloudTrail enabled, AWS Config rules in place, specific IAM roles for your operations team, and a default VPC security configuration. You need all of this in every AWS account and every region in your organization.

You have 12 accounts and deploy to 4 regions. That is 48 stacks. You could deploy them one by one. Or you could use StackSets.

What StackSets Are

A StackSet lets you deploy a single CloudFormation template across multiple AWS accounts and multiple regions in a single operation. You define the template once, specify the target accounts and regions, and CloudFormation handles the deployment everywhere.

# CLI command to deploy a StackSet
aws cloudformation create-stack-set \
  --stack-set-name SecurityBaseline \
  --template-url https://s3.amazonaws.com/mybucket/security-baseline.yaml \
  --permission-model SERVICE_MANAGED \
  --auto-deployment Enabled=true,RetainStacksOnAccountRemoval=false
Enter fullscreen mode Exit fullscreen mode

Permission Models

SELF_MANAGED You manually create IAM roles in each target account that trust the administrator account. Full control, more setup.

SERVICE_MANAGED Uses AWS Organizations integration. CloudFormation assumes roles automatically. Supports auto-deployment: when a new account joins the organization, the StackSet deploys to it automatically. This is the recommended model for most organizations.

Deployment Options

# Deploy to specific accounts and regions
aws cloudformation create-stack-instances \
  --stack-set-name SecurityBaseline \
  --accounts 111111111111 222222222222 333333333333 \
  --regions us-east-1 eu-west-1 ap-southeast-1 \
  --operation-preferences MaxConcurrentPercentage=25,FailureTolerancePercentage=10
Enter fullscreen mode Exit fullscreen mode

MaxConcurrentPercentage controls how many target accounts are deployed to simultaneously. FailureTolerancePercentage controls how many can fail before the operation stops.

Failure Handling

If a stack instance fails in one account/region, StackSets can continue deploying to others (depending on your failure tolerance settings) or stop entirely. Failed instances can be retried without redeploying to successful targets.

What Would Happen Without StackSets

You write automation scripts. You loop through accounts and regions. You track which deployments succeeded and which failed. You handle retries manually. You update 48 stacks individually when the template changes. StackSets replace all of that with a managed, auditable, retryable deployment system.

Limitation

StackSets deploy the same template to all targets. If you need different configurations per account or region, you pass parameters but all targets share the same template structure. Highly variable per-account configurations are better handled at the application layer or with separate stacks.


Deletion Policy

The Problem

By default, when you delete a CloudFormation stack, every resource in it is deleted. For an RDS database or an S3 bucket, this means permanent data loss. This default makes sense for stateless resources and temporary environments. It is dangerous for production data.

The Solution

DeletionPolicy lets you control what happens to a resource when its stack is deleted.

Resources:
  ProductionDatabase:
    Type: AWS::RDS::DBInstance
    DeletionPolicy: Snapshot
    Properties:
      DBInstanceClass: db.m5.large
      Engine: mysql
      DBName: myapp

  LogBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Properties:
      BucketName: myapp-logs

  TempQueue:
    Type: AWS::SQS::Queue
    DeletionPolicy: Delete
    Properties:
      QueueName: myapp-temp
Enter fullscreen mode Exit fullscreen mode

The Three Options

Delete Default. The resource is deleted when the stack is deleted.

Retain The resource is not deleted. CloudFormation removes it from the stack but leaves the physical resource running in AWS. You become responsible for managing it manually.

Snapshot Only available for EBS volumes, RDS instances, RDS clusters, Redshift clusters, and ElastiCache clusters. A final snapshot is taken before deletion. The resource is then deleted. The snapshot persists and can be used to restore.

UpdateReplacePolicy

There is a related but distinct attribute: UpdateReplacePolicy. This controls what happens to the old resource when an update requires replacement CloudFormation creates a new resource and must decide what to do with the old one.

Resources:
  Database:
    Type: AWS::RDS::DBInstance
    DeletionPolicy: Snapshot
    UpdateReplacePolicy: Snapshot
    Properties:
      DBInstanceClass: db.m5.large
Enter fullscreen mode Exit fullscreen mode

Set both DeletionPolicy and UpdateReplacePolicy to Snapshot for any resource containing data that you cannot afford to lose.

Limitation

Retain does not mean the resource is protected from modifications made outside of CloudFormation. It only means CloudFormation will not delete it when the stack is deleted. Once retained, the resource is no longer managed by any stack it is orphaned in your account.


Stack Roles

The Problem

When you deploy a CloudFormation stack, CloudFormation uses your IAM identity to create resources. This means you need permissions to create EC2 instances, RDS databases, IAM roles, S3 buckets everything in the template. In practice, this means your user or role needs very broad permissions.

This creates two problems:

  1. Least privilege violation. Your identity has permissions far beyond what it needs for day-to-day work, just because it needs to be able to deploy infrastructure.
  2. Privilege escalation risk. A CloudFormation template can create IAM roles. If you can deploy any template, you can create an IAM role with AdministratorAccess and assume it.

The Solution: Stack Roles

Instead of using your own identity, CloudFormation assumes a specific IAM role to perform all resource operations. You pass this role when creating or updating the stack.

Your identity only needs: cloudformation:CreateStack, cloudformation:UpdateStack, iam:PassRole (to pass the stack role to CloudFormation). CloudFormation then uses the stack role not your identity — to create EC2 instances, RDS databases, IAM roles, and everything else.

aws cloudformation create-stack \
  --stack-name my-app \
  --template-url https://s3.amazonaws.com/mybucket/template.yaml \
  --role-arn arn:aws:iam::123456789012:role/CloudFormationDeployRole
Enter fullscreen mode Exit fullscreen mode

The Stack Role

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "cloudformation.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The role's trust policy allows CloudFormation to assume it. The role's permission policies define what CloudFormation can create on your behalf.

The Security Architecture

With stack roles, you can define exactly which resources your CloudFormation deployments can create. A developer can deploy CloudFormation stacks but cannot create arbitrary AWS resources directly. They can only do what the stack role permits and only through CloudFormation, where every change is tracked and auditable.

This is how mature organizations operate CloudFormation in production. Infrastructure changes go through CloudFormation. CloudFormation uses a controlled role. The role is the policy enforcement point.

What Would Happen Without Stack Roles

Every CloudFormation operator needs direct permissions to create every resource type in the templates they deploy. Permissions sprawl. Security boundaries erode. There is no clear separation between "can deploy through approved process" and "can create arbitrary resources."


ChangeSets

The Problem

You have a production stack. You want to update it. The update involves changing the instance type of an EC2 instance, updating an IAM policy, and modifying an S3 bucket configuration. What exactly will CloudFormation do? Will it modify the instance in place? Will it replace it? Will the replacement cause downtime? Will data be lost?

If you run the update directly, you find out in production.

The Solution: ChangeSets

A ChangeSet lets you preview exactly what CloudFormation will do before committing to it. You create a ChangeSet from your updated template, review the planned actions, and then decide whether to execute it.

# Create a ChangeSet
aws cloudformation create-change-set \
  --stack-name my-production-app \
  --change-set-name planned-update-2024-04 \
  --template-url https://s3.amazonaws.com/mybucket/template-v2.yaml \
  --parameters ParameterKey=InstanceType,ParameterValue=m5.large

# Review the ChangeSet
aws cloudformation describe-change-set \
  --stack-name my-production-app \
  --change-set-name planned-update-2024-04

# Execute if acceptable
aws cloudformation execute-change-set \
  --stack-name my-production-app \
  --change-set-name planned-update-2024-04
Enter fullscreen mode Exit fullscreen mode

What the ChangeSet Shows

For each resource that will be affected, the ChangeSet shows:

  • Action - Add, Modify, or Remove
  • Replacement - True, False, or Conditional
  • Scope - Which properties are changing
  • Details - The specific changes
Action: Modify
LogicalResourceId: WebServer
ResourceType: AWS::EC2::Instance
Replacement: True
Scope: [Properties]
Details:
  - Attribute: Properties
    Name: InstanceType
    RequiresRecreation: Always
Enter fullscreen mode Exit fullscreen mode

Replacement: True means this change will delete the existing instance and create a new one. In production, that means downtime unless you have architected for it. You now know this before executing the change.

What Would Happen Without ChangeSets

You deploy the update and discover the consequences in production. An unexpected instance replacement causes 90 seconds of downtime on a Sunday evening instead of during a planned maintenance window. A removed security group rule breaks a critical connection between services. ChangeSets move these discoveries from runtime to review time.

ChangeSets and Drift

ChangeSets do not show you what differs from the current running state they show you what will change compared to the current stack template. If the actual infrastructure has drifted from the template (someone manually changed something), ChangeSets will not catch it. Stack drift detection is a separate feature.


Custom Resources

The Problem That Built-In Resources Cannot Solve

CloudFormation supports hundreds of AWS resource types. But it does not support everything.

You need to:

  • Populate an S3 bucket with default content after it is created
  • Register an AMI from a snapshot and get back the AMI ID to use in your template
  • Look up a value from an external API during stack creation
  • Perform a database migration as part of a stack update
  • Create a resource in a third-party system (Datadog, PagerDuty, Cloudflare)

None of these are standard CloudFormation resource types. Without Custom Resources, you would do these steps manually before or after the stack, breaking the "everything in the template" principle.

The Solution Custom Resources

A Custom Resource invokes a Lambda function (or an HTTPS endpoint) when the resource is created, updated, or deleted. Your Lambda function does whatever the built-in CloudFormation resource types cannot and returns a result that the template can use.

How It Works

  1. CloudFormation encounters the Custom Resource during stack operations.
  2. It sends an HTTPS request to the Lambda function with an event containing the operation type (Create, Update, Delete), the resource properties, and a pre-signed S3 URL to send the response to.
  3. Your Lambda function performs the custom logic.
  4. The Lambda sends a JSON response to the S3 URL indicating success or failure, and optionally returning data attributes.
  5. CloudFormation reads the response and either continues or fails the stack.

The Code

Template:

Resources:
  CustomConfigLookup:
    Type: AWS::CloudFormation::CustomResource
    Properties:
      ServiceToken: !GetAtt ConfigLookupFunction.Arn
      Environment: !Ref EnvironmentName
      ConfigKey: database/endpoint

  ConfigLookupFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.12
      Handler: index.handler
      Role: !GetAtt LambdaRole.Arn
      Code:
        ZipFile: |
          import json
          import boto3
          import urllib3

          def handler(event, context):
              http = urllib3.PoolManager()

              try:
                  request_type = event['RequestType']
                  props = event['ResourceProperties']

                  if request_type in ['Create', 'Update']:
                      # Look up value from SSM or external API
                      ssm = boto3.client('ssm')
                      key = f"/{props['Environment']}/{props['ConfigKey']}"
                      value = ssm.get_parameter(Name=key)['Parameter']['Value']

                      send_response(http, event, 'SUCCESS', {
                          'ConfigValue': value
                      })
                  elif request_type == 'Delete':
                      # Nothing to clean up for a lookup
                      send_response(http, event, 'SUCCESS', {})

              except Exception as e:
                  send_response(http, event, 'FAILED', {}, str(e))

          def send_response(http, event, status, data, reason=""):
              body = json.dumps({
                  'Status': status,
                  'Reason': reason,
                  'PhysicalResourceId': event.get('PhysicalResourceId', 'custom-resource'),
                  'StackId': event['StackId'],
                  'RequestId': event['RequestId'],
                  'LogicalResourceId': event['LogicalResourceId'],
                  'Data': data
              })
              http.request('PUT', event['ResponseURL'],
                          body=body,
                          headers={'Content-Type': 'application/json'})

# Reference the returned value in another resource
  Database:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: !GetAtt CustomConfigLookup.ConfigValue
Enter fullscreen mode Exit fullscreen mode

The Three Events Your Lambda Must Handle

Event When What Your Lambda Should Do
Create Stack creation or new resource Perform the action, return data
Update Stack update with changed properties Re-perform the action with new values, return updated data
Delete Stack deletion or resource removal Clean up anything you created during Create

Failing to handle Delete properly is the most common Custom Resource bug. If you create an external resource during Create and do not clean it up during Delete, it persists after the stack is gone and becomes orphaned infrastructure.

What Would Happen Without Custom Resources

The "everything in the template" principle breaks. You have pre-deployment scripts, post-deployment scripts, manual steps in runbooks. The stack no longer fully represents the deployed system. Custom Resources close this gap.

The Critical Timeout Consideration

CloudFormation waits up to one hour for a Custom Resource response. If your Lambda times out without sending a response to the pre-signed S3 URL, the stack will wait for the full hour before timing out. Always wrap your Lambda in a try-except and always call the response URL even on failure before the Lambda exits.


Putting It All Together. The Complete Pattern

Every feature in CloudFormation exists because there was a real problem that could not be solved without it. Here is how they connect:

Template Parameters        → Make templates reusable across environments
Pseudo Parameters          → Make templates reusable across regions and accounts
Mappings                   → Resolve environment-specific values at deploy time
Conditions                 → Create or configure resources conditionally
Intrinsic Functions        → Wire resources together and build dynamic values
Outputs                    → Surface useful values and enable cross-stack sharing
Cross-Stack References     → Share foundation infrastructure across independent stacks
Nested Stacks              → Decompose large templates into manageable units
StackSets                  → Deploy consistently across accounts and regions
DependsOn                  → Control creation order for implicit dependencies
Wait Conditions + cfn-signal → Pause until application bootstrap is complete
cfn-init                   → Declare instance configuration instead of scripting it
cfn-hup                    → Keep instance configuration in sync with template changes
Deletion Policy            → Protect data on stack deletion or replacement
Stack Roles                → Enforce least privilege for infrastructure deployments
ChangeSets                 → Preview changes before applying them in production
Custom Resources           → Extend CloudFormation to anything Lambda can do
Enter fullscreen mode Exit fullscreen mode

None of these features are optional in a serious production environment. Each one closes a gap that, without it, requires manual intervention, custom scripting, or accepted risk.

CloudFormation is not just syntax. It is a system for describing, deploying, and maintaining infrastructure as code with every feature designed to solve a specific failure mode that organizations encountered in practice.


Written by Onyedikachi Obidiegwu | Cloud Security Engineer*

Top comments (0)