Kenta Goto for AWS Heroes

Posted on Feb 26

Why is cdk.out (Cloud Assembly) Necessary in AWS CDK?

#aws #awscdk #cdk

What is cdk.out

cdk.out is a directory that stores a collection of files called the cloud assembly.

When you run the cdk synth command or the cdk deploy command, the cloud assembly is generated from your CDK code and stored in the cdk.out directory.

What is the Cloud Assembly

The cloud assembly is a collection of files generated by the synthesis of a CDK application (CDK code). The CDK CLI references this information to deploy resources to the AWS environment.

In other words, the cloud assembly can be considered an intermediate artifact that CDK uses to deploy infrastructure definitions written in CDK code to the AWS environment.

Components of the Cloud Assembly

The cloud assembly primarily consists of the following files and directories:

(stack name).template.json
- CloudFormation template generated by CDK code
(stack name).assets.json
- File describing information about assets used in CDK code
manifest.json
- File describing metadata of the cloud assembly
tree.json
- File representing the Construct tree structure of resources defined in CDK code
cdk.out
- File storing Cloud Assembly schema version information
asset.(hash value)/
- Directory storing asset files used in CDK code
- Uses hash values calculated for each asset as directory names
- Each asset is classified into two types: S3 assets and Docker image assets, which are uploaded to S3 buckets or ECR repositories created by the cdk bootstrap command when the cdk deploy command is executed
- Docker image assets are built when the cdk deploy command is executed (not when the cdk synth command is executed)
assembly-(stage name)/
- Directory storing cloud assemblies generated for each stage when using Stage Construct

Why cdk.out is Necessary

As explained above, cdk.out is a directory for storing the cloud assembly, which is an intermediate artifact generated from CDK code.

So why is this directory necessary? You might wonder whether it's really necessary to generate and store information needed for deployment as files.

Before explaining this, let me first explain that synthesis can be executed multiple times depending on how CDK commands are used, and the risks associated with this.

Risks of Running Synthesis Multiple Times

The cdk deploy command internally executes synthesis processing equivalent to the synth command.

For example, if you execute cdk deploy after executing cdk synth, synthesis is executed in both synth and deploy, meaning the CDK application is synthesized twice.

In this case, there is a risk that the second synthesis execution will create output different from the first synthesis result. This could cause behavior different from what the developer intended, as a different result might be generated in the next synthesis process even though the developer confirmed and approved the first synthesis result.

On the other hand, even when executing cdk deploy without executing cdk synth, if you manage multiple environments such as development environment (dev) and production environment (prod) with one CDK application, synthesis will be executed every time you deploy to each environment. There is also a risk that unintended environment differences may be introduced because synthesis in the development environment and synthesis in the production environment are executed at different times.

Apart from risks, running the synthesis process twice is redundant and inefficient. Especially for large applications, synthesis can take considerable time, which can lead to a significant increase in deployment time.

Reusing cdk.out with the `--app` Option

I explained the risks of running synthesis multiple times, so how can we avoid these risks?

The cdk deploy command has the --app (-a) option to avoid this.

This option allows you to specify the path to the cdk.out directory. By specifying this, you can deploy using an already synthesized cloud assembly and skip the synthesis that is normally executed during the deploy command. If you know that no code has been changed since the last cdk synth, you can avoid redundant synthesis processing by reusing the cloud assembly with this option.

npx cdk deploy --app ./cdk.out

You can also explicitly separate synthesis and deployment execution by specifying a command to directly synthesize CDK code. This command is typically the same as the command defined in the app property of the cdk.json file.

npx cdk deploy --app "npx ts-node --prefer-ts-exts bin/cdk-sample.ts"

# For environments where TypeScript can be executed with Node.js
npx cdk deploy --app "node bin/cdk-sample.ts"

With this option, you can specify cdk.out as the synthesis result and deploy using the same cloud assembly in multiple environments such as development and production. In other words, you can reuse cdk.out with the --app option. This avoids the risks of running synthesis multiple times and also avoids redundant synthesis processing.

npx cdk synth

npx cdk deploy -a ./cdk.out DevStack
npx cdk deploy -a ./cdk.out ProdStack

Separating Synthesis and Deployment

I explained that reusing cdk.out with the --app option can avoid the risks of running synthesis multiple times. This is precisely because cdk.out enables the separation of synthesis and deployment.

In this way, being able to separate synthesis and deployment can be said to be the necessity of cdk.out in AWS CDK.

Additionally, if a mechanism supports the cloud assembly, you can deploy using the cloud assembly from applications other than the CDK CLI. For example, tools like cdk-assets also reference the cloud assembly to manage assets. In this way, separating synthesis and deployment, such as performing synthesis with CDK and deployment with tools other than the CDK CLI, enables flexible workflows in CDK application development.

synthesize once, deploy many

In application development such as containers, there is an approach called "build once, deploy many." This is the idea of building an application only once and deploying that build artifact to multiple environments.

The approach of reusing cdk.out with the --app option in CDK is exactly the same idea. It can be called "synthesize once, deploy many," where synthesis is performed only once and deployment can be done multiple times afterward.

Other Benefits and Use Cases of cdk.out

Asset Caching

cdk.out includes asset directories that store asset files used in CDK code. This includes code files used by Lambda and ECS, Dockerfiles, and other files.

As a CDK mechanism, if an asset with the same hash value as the asset used in the CDK code to be synthesized already exists during synthesis, the bundling process for that asset is skipped and the existing asset is reused. In other words, cdk.out also functions as a cache for assets.

Utilizing cdk.out in CI/CD Pipelines

I mentioned that cdk.out allows you to deploy using the same cloud assembly in multiple environments. Specifically, you can utilize cdk.out in CI/CD pipelines such as GitHub Actions as follows:

jobs:
  synthesize:
    runs-on: ubuntu-latest
    steps:
      - run: npm ci
      - run: npx cdk synth
      - uses: actions/upload-artifact@v6
        with:
          name: cdk-out
          path: cdk.out

  deploy-dev:
    needs: synthesize
    steps:
      - uses: actions/download-artifact@v6
      - run: npx cdk deploy --app cdk.out DevStack

  deploy-prod:
    needs: deploy-dev
    steps:
      - uses: actions/download-artifact@v6
      - run: npx cdk deploy --app cdk.out ProdStack

Reference Materials

In the explanations so far, I mentioned that multiple deployments are possible based on a single synthesis, but did you know that there are actually two types of CDK stack creation methods: static stack creation and dynamic stack creation?

The approach of performing synthesis only once introduced this time is possible with the static stack creation approach. On the other hand, dynamic stack creation also has other benefits, so if you are interested in the differences between them, please check out the article "CDK Environment Management: Static vs Dynamic Stack Creation". This is an article I co-authored with Thorsten Höger, an AWS Hero from Germany.

Also, an article by a former CDK maintainer is helpful, so please take a look.

cdk.out Bloat and How to Deal with It

cdk.out Bloat

We learned that cdk.out plays an important role in CDK, but by repeating synthesis, new files and directories accumulate, and it can reach several GB before you know it.

This is because a new asset directory is created when application files are updated. Especially for container applications using Docker, the size tends to be particularly large.

Additionally, since the cdk deploy command also builds Docker images, not only cdk.out but also built images stored in local Docker will continue to increase.

cdk-agc

To optimize such bloated cdk.out, I created a tool called cdk-agc.

(It's very useful, so please star it on GitHub if you like!)

Specifically, it does the following:

Deletes old asset files and directories in cdk.out that are not used by the current stack
Automatically deletes locally built Docker images if the deleted assets are Docker assets
Deletes temporary CDK directories accumulated in $TMPDIR
No installation required - just run the command with npx and it executes immediately

This optimizes cdk.out, removes accumulated Docker images, and saves local disk space. Furthermore, if you cache cdk.out in CI/CD, it can significantly reduce cache size, improving the efficiency of your CI/CD pipeline.

No installation required - you can run it immediately by adding npx to the command, so please feel free to try it.

You can use -d for dry run, -k to specify retention period, and -t to clean up temporary directories in $TMPDIR, so please use them according to your needs.

# Navigate to the CDK project with cdk.out you want to optimize
cd my-cdk-project

# Clean up cdk.out (also deletes built Docker asset images)
npx cdk-agc

# Check files to be deleted with dry run feature
npx cdk-agc -d

# Keep files modified within 24 hours
npx cdk-agc -k 24

# Clean up temporary directories in $TMPDIR
npx cdk-agc -t

Conclusion

cdk.out is often overlooked, but it plays an important role in CDK application development. By understanding and properly utilizing cdk.out, you can develop CDK applications more efficiently.

DEV Community

Why is cdk.out (Cloud Assembly) Necessary in AWS CDK?

What is cdk.out

What is the Cloud Assembly

Components of the Cloud Assembly

Why cdk.out is Necessary

Risks of Running Synthesis Multiple Times

Reusing cdk.out with the `--app` Option

Separating Synthesis and Deployment

synthesize once, deploy many

Other Benefits and Use Cases of cdk.out

Asset Caching

Utilizing cdk.out in CI/CD Pipelines

Reference Materials

cdk.out Bloat and How to Deal with It

cdk.out Bloat

cdk-agc

Conclusion

Top comments (0)

What is cdk.out

What is the Cloud Assembly

Components of the Cloud Assembly

Why cdk.out is Necessary

Risks of Running Synthesis Multiple Times

Reusing cdk.out with the --app Option

Separating Synthesis and Deployment

synthesize once, deploy many

Other Benefits and Use Cases of cdk.out

Asset Caching

Utilizing cdk.out in CI/CD Pipelines

Reference Materials

cdk.out Bloat and How to Deal with It

cdk.out Bloat

cdk-agc

Conclusion

Reusing cdk.out with the `--app` Option