DEV Community

ほうき星 for AWS Community Builders

Posted on • Originally published at qiita.com

Automatically Healing CloudFormation Drift with Durable Functions

This article is a machine translation of the contents of the following URL, which I wrote in Japanese:

Durable Functions を用いて CloudFormation のドリフトを自動修復する #AWS - Qiita

はじめに こんにちは、ほうき星 @H0ukiStar です。 皆さんは昨年(2025年)の11月に CloudFormation がアップデートされ、ドリフト状態の修正に利用可能なドリフト認識変更セットが追加されたことをご存じでしょうか? 本機能の登場以前は、ドリフ...

favicon qiita.com

Introduction

Hello, I’m @H0ukiStar.

Did you know that in November 2025, AWS introduced a new CloudFormation feature called drift-aware change sets?

https://aws.amazon.com/about-aws/whats-new/2025/11/configuration-drift-enhanced-cloudformation-sets/

Before this feature was introduced, repairing drift typically required making a temporary dummy change to the stack, updating the stack, and then reverting the change afterward.

Now, by specifying --deployment-mode REVERT_DRIFT when creating a change set, CloudFormation can recognize differences between the IaC definition and the actual infrastructure, and automatically generate a change set specifically for drift remediation.

In this article, I’ll show how I implemented a Configuration Healing mechanism using Durable Functions that:

  • Detects CloudFormation stack drift
  • Automatically creates a drift-aware change set
  • Repairs the drift automatically

Trying Drift-Aware Change Sets

Deploying a Sample Stack

First, let’s see how drift-aware change sets work.

I deployed the following CloudFormation template with the stack name test.

AWSTemplateFormatVersion: 2010-09-09
Description: Stack for testing drift-aware change sets

Resources:
  TestParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Name: /drift-test/sample
      Type: String
      Value: initial-value
      Description: Parameter for drift testing
Enter fullscreen mode Exit fullscreen mode

Immediately after deployment, the stack is not drifted.

Drift status immediately after stack deployment

[!WARNING]
The stack was deployed using a dedicated CloudFormation execution role.

The IAM role template is omitted here for brevity.
Please refer to the following GitHub repository for the complete template definition.

sample-aws-cfn-configuration-healing

Introducing Drift

Next, I intentionally introduced drift by modifying the SSM parameter value with the following command.

aws ssm put-parameter \
  --name /drift-test/sample \
  --value updated-value \
  --overwrite
Enter fullscreen mode Exit fullscreen mode

You can confirm from the CloudFormation drift detection screen that the stack is now drifted.

Drift status after modifying the SSM parameter

Repairing Drift with a Drift-Aware Change Set

On the CloudFormation drift screen, CloudFormation displays guidance indicating that the drift can be repaired using a drift-aware change set.

Select Create change set.

Creating a change set

Make sure the change set type is set to Drift-aware change set, then follow the instructions to create it.

Creating a drift-aware change setn

If you review the generated change set, you can see that CloudFormation correctly detects the drifted resource and prepares the remediation changes automatically.

Reviewing the generated change set

After executing the change set, the drift is resolved.

Drift status after applying the change set

Configuration Healing with Durable Functions

CloudFormation drift detection and change set creation are asynchronous operations, which means polling is required until completion.

Implementing this behavior using only standard Lambda functions can quickly become complicated due to retry handling, wait control, and state management.

This time, I used Durable Functions to simplify these wait operations and state transitions.

Durable Functions are designed for long-running workflows, making them a great fit for asynchronous APIs like CloudFormation.

Especially for workflows like:

  • Start operation
  • Wait for completion
  • Check status
  • Proceed to the next step

Durable Functions make the orchestration logic much easier to implement and maintain.

Workflow

The implementation follows the workflow below to detect and repair CloudFormation stack drift.

Durable Functions make it straightforward to implement waiting logic for both drift detection and change set creation.

In addition, by passing CreateChangeSetOnly: true in the Lambda event payload, the workflow can stop after creating the change set without executing it automatically.

Workflow figure

Durable Functions Implementation Example

The Durable Functions implementation follows the workflow shown above.

The full source code, including the SAM template, is available in the following repository.

GitHub logo H0ukiStar / sample-aws-cfn-configuration-healing

Sample implementation of an AWS Lambda function to detect and automatically heal CloudFormation stack configuration drift.

CloudFormation Configuration Healing with Durable Functions

Automatic detection and healing of CloudFormation stack drift using AWS Lambda with Durable Functions.

This sample demonstrates how to automatically detect configuration drift in CloudFormation stacks and heal them by creating and executing change sets using the AWS Durable Execution SDK for Python.

Features

  • Automatic Drift Detection: Detects configuration drift in CloudFormation stacks
  • Automatic Healing: Creates and executes change sets to restore the stack to its desired state
  • Durable Functions: Uses AWS Lambda Durable Execution SDK to handle long-running operations reliably
  • SNS Notifications: Sends notifications about drift detection and healing operations
  • Error Handling: Comprehensive error handling for drift detection and change set operations

Installation

Deploy using the AWS SAM CLI with the following commands:

cd configuration-healing
sam build
sam deploy --guided
Enter fullscreen mode Exit fullscreen mode

During the guided deployment, you will be prompted to provide:

  • SNS Topic ARN: The ARN of an…
# The complete implementation is available in the GitHub repository above.
# The code is omitted here for brevity.
Enter fullscreen mode Exit fullscreen mode

Verification

Next, I intentionally modified the value of the SSM Parameter created in the test stack to introduce drift, then executed the deployed Lambda function.

aws lambda invoke \
  --function-name arn:aws:lambda:<region>:<account-id>:function:cfn-drift-healing:Alias \
  --invocation-type Event \
  --cli-binary-format raw-in-base64-out \
  --payload '{"StackName": "test"}' \
  response.json
Enter fullscreen mode Exit fullscreen mode

From the logs, you can confirm that the automatic remediation proceeds as follows:

  1. Start drift detection
  2. Detect stack drift
  3. Create a drift-aware change set
  4. Execute the change set
  5. Resolve the drift

Lambda function execution logs

Durable execution result

You can also confirm the remediation completion through SNS notifications.

SNS notification after successful remediation

You can also stop the workflow after creating the change set by passing CreateChangeSetOnly: true in the event payload.

This allows a human operator to review and execute the change set manually.

aws lambda invoke \
  --function-name arn:aws:lambda:<region>:<account-id>:function:cfn-drift-healing:Alias \
  --invocation-type Event \
  --cli-binary-format raw-in-base64-out \
  --payload '{"StackName": "test", "CreateChangeSetOnly": true}' \
  response.json
Enter fullscreen mode Exit fullscreen mode

Lambda function logs when stopping after change set creation

SNS notification when stopping after change set creation

Conclusion

CloudFormation drift-aware change sets make it much safer and easier to repair configuration drift than before.

In this article, I implemented a Configuration Healing mechanism using Durable Functions that automatically detects and repairs CloudFormation drift.

Even when using IaC, long-running environments inevitably experience unintended configuration changes over time.

By combining this workflow with services like EventBridge Scheduler for periodic execution, you can continuously and automatically remediate drift caused by:

  • Unintended manual changes
  • Temporary fixes that were never reverted
  • Configuration updates forgotten over time

This helps maintain infrastructure consistency and improve long-term IaC governance.

If fully automated remediation feels too risky for production environments, you can instead stop after creating the change set and require human review before execution, as demonstrated earlier in this article.

I hope this article serves as a useful example of implementing Configuration Healing with CloudFormation.

Top comments (0)