A behind-the-scenes guide to CDK internals — Logical IDs, synthesis, bootstrap trust chains, and the replacement logic that most teams learn the hard way.
A Behind-the-Scenes Guide That Most Teams Learn the Hard Way
It was a Friday afternoon. The kind of Friday where everything had gone smoothly — too smoothly.
A senior engineer on a team pushed what he called a "cleanup refactor." No business logic changed. No new features. Just reorganizing CDK constructs into cleaner modules. The kind of work that gets approved in code review in minutes because nothing functional changed.
He ran cdk deploy.
CloudFormation accepted the changeset. Thirty seconds later, Slack lit up.
The production DynamoDB table — 4 million user records — was gone.
Not corrupted. Not locked. Deleted and recreated empty. Because CloudFormation saw a different Logical ID and concluded the old table should be removed and a new one created in its place.
The engineer didn't change a single schema property. He moved a construct from one file to another.
That one action cost the team 11 hours of downtime, a backup restoration, and a very uncomfortable conversation with their CTO.
This guide exists so that conversation never happens on your team.
Who This Guide Is For
If you lead an engineering team that uses — or is adopting — AWS CDK, this guide is for you.
This is not a getting-started tutorial. You won't find "how to create your first S3 bucket" here.
Instead, this is the guide that explains what is actually happening beneath the surface when your team runs cdk deploy. The mental model that separates teams who use CDK confidently from teams who are one refactor away from a production incident.
We'll cover:
- Why CDK is a compiler, not a provisioning tool
- The Logical ID system that silently controls resource identity
- How context caching creates invisible divergence between local and CI
- The bootstrap trust chain that most teams never fully understand
- The replacement logic that CloudFormation uses — and CDK does not control
- A production playbook for teams managing real infrastructure
Every section connects back to a single question: How does this knowledge protect production?
Part 1: CDK Is a Compiler — Not What You Think It Is
Here's the first mental model shift that changes everything.
CDK does not provision infrastructure.
Read that again. The tool your team writes infrastructure code in — it never talks to EC2, S3, DynamoDB, or any AWS service directly. Not once.
Here's what CDK actually does:
- Executes your code — your TypeScript, Python, or Java runs like any normal program
- Builds an in-memory construct tree — a hierarchy of objects representing your infrastructure
- Synthesizes a CloudFormation template — translates that tree into a JSON/YAML template
- Hands everything off to CloudFormation — and exits
That's it. CDK's job is done before a single resource is created.
CloudFormation is the engine that:
- Stores the current state of your stack
- Calculates the diff between old and new templates
- Determines which resources need updating, replacing, or deleting
- Calls the actual AWS service APIs
- Handles rollback if something fails
Think of it this way:
CDK is the compiler. CloudFormation is the runtime.
This distinction matters enormously. When something goes wrong during deployment — a resource gets replaced, a permission fails, a rollback triggers — the answer almost never lives in your CDK code. It lives in the relationship between your synthesized template and CloudFormation's state machine.
Why this matters for your team: When an engineer says "CDK deleted my table," that's technically wrong. CDK produced a template. CloudFormation decided to delete the table based on that template. Understanding this boundary is the first step to debugging infrastructure issues effectively.
Part 2: The Construct Tree — CDK's Object Model
Before we can understand why refactoring causes replacements, we need to understand how CDK organizes infrastructure internally.
Everything Is a Construct
Every piece of infrastructure in CDK — a bucket, a Lambda function, an IAM role — is a Construct. Constructs are nested inside other constructs, forming a tree:
App
└── Stack (e.g., ProdStack)
├── Construct (e.g., ApiService)
│ ├── Lambda Function
│ └── API Gateway
└── Construct (e.g., DataLayer)
├── DynamoDB Table
└── S3 Bucket
This tree is the source of truth for template generation. Every construct has a path determined by its position in the tree — and that path has consequences we'll explore in the next section.
Three Levels of Abstraction
CDK constructs operate at three levels:
| Level | What It Is | Example |
|---|---|---|
| L1 | Raw CloudFormation resource — a 1:1 mapping. Prefixed with Cfn. |
CfnBucket, CfnTable
|
| L2 | An opinionated abstraction with sensible defaults and helper methods. |
Bucket, Table, Function
|
| L3 | Pre-wired patterns that compose multiple resources together. |
LambdaRestApi, ApplicationLoadBalancedFargateService
|
Most teams work at L2. It's the sweet spot — enough abstraction to move fast, enough control to customize.
The critical thing to understand: When you write CDK code, you are building an object tree in memory. No AWS API calls happen during this phase. No infrastructure is queried or created. You're constructing a blueprint.
The moment that blueprint becomes real is during synthesis.
Part 3: Synthesis and Logical IDs — Where Refactoring Becomes Dangerous
This is the section that explains the Friday afternoon disaster from our opening story.
What Happens During cdk synth
When you run cdk synth, your CDK application executes as a normal program. The construct tree is built, and then CDK walks that tree to produce a CloudFormation template.
During this walk, four things happen:
- Each construct is visited — its properties are collected
- Logical IDs are generated — a unique identifier for each resource
- Tokens are resolved — cross-references between resources are wired up
-
A template is written to the
cdk.outdirectory
No infrastructure exists yet. This is pure compilation.
Logical IDs — The Hidden Identity System
This is the single most important concept in CDK that most engineers never fully grasp.
Every resource in a CloudFormation template has a Logical ID. It looks something like:
UsersTableA1B2C3D4
This ID is generated from the construct's path in the tree plus a hash. For example:
Path: App/ProdStack/UsersTable
Logical ID: UsersTableA1B2C3D4
CloudFormation uses this Logical ID as the primary key for tracking resources. It's how CloudFormation knows that the UsersTable in today's deployment is the same UsersTable from yesterday's deployment.
The Refactoring Trap
Now watch what happens when an engineer "cleans up" the code by moving the table into a nested construct:
Before:
// Table is directly in the stack
new dynamodb.Table(this, "UsersTable", {
partitionKey: { name: "id", type: dynamodb.AttributeType.STRING }
});
Path: App/ProdStack/UsersTable
Logical ID: UsersTableA1B2C3D4
After the refactor:
// Table is now inside a "Storage" construct
const storage = new Construct(this, "Storage");
new dynamodb.Table(storage, "UsersTable", {
partitionKey: { name: "id", type: dynamodb.AttributeType.STRING }
});
Path: App/ProdStack/Storage/UsersTable
Logical ID: StorageUsersTableE5F6G7H8 ← DIFFERENT
The schema didn't change. The table configuration didn't change. But the Logical ID changed because the construct path changed.
CloudFormation receives the new template and sees:
- A resource with Logical ID
UsersTableA1B2C3D4— no longer present → delete it - A resource with Logical ID
StorageUsersTableE5F6G7H8— new → create it
That's a delete and recreate. Your data is gone.
This is exactly what happened in our opening story. A "cleanup refactor" changed the construct tree, which changed Logical IDs, which CloudFormation interpreted as resource replacement.
Why this matters for your team: Your engineers need to understand that infrastructure code is not like application code. In application code, moving a function between files changes nothing about runtime behavior. In CDK, moving a construct between parent constructs changes the resource's identity. Refactoring infrastructure requires a fundamentally different discipline.
Part 4: Context Caching — The Silent Divergence
There's a common misconception I encounter repeatedly: teams believe that cdk.context.json has something to do with drift detection. It doesn't. But what it does do is equally dangerous if misunderstood.
How Context Works
Some CDK constructs need to query AWS during synthesis. The most common example:
const vpc = ec2.Vpc.fromLookup(this, "MainVpc", {
vpcId: "vpc-0123456789abcdef0"
});
When this runs during cdk synth, CDK actually calls the AWS API to look up VPC details — availability zones, subnets, route tables. It then caches the result in cdk.context.json.
On subsequent runs, CDK reads from the cache instead of calling AWS again.
The Divergence Problem
Here's where teams get burned:
-
Developer A runs
cdk synthlocally. Context is cached with current VPC state. - The VPC changes — a new subnet is added, an AZ is modified.
-
CI/CD pipeline runs
cdk synth— butcdk.context.jsonwasn't committed to git. CI performs a fresh lookup and gets different VPC data. - The template generated in CI differs from local. Resources reference different subnets. Deployment behaves unexpectedly.
The engineer stares at the diff and thinks: "I didn't change anything."
They're right — they didn't. The environment changed, and the lack of committed context allowed that change to silently propagate into the template.
What Context Is and Isn't
| Context... | Does | Doesn't |
|---|---|---|
| Affects | Template generation during synthesis | Deployed resource state |
| Relates to | Lookup values cached locally | CloudFormation drift detection |
| Deleting it | Forces fresh AWS lookups | Fix or prevent stack drift |
Drift detection — comparing what's actually deployed vs. what the template says — is handled entirely by CloudFormation. The context file has no role in that process.
Why this matters for your team: Commit cdk.context.json to version control. Treat it as part of your infrastructure definition. When the context file is committed, every developer and every CI pipeline synthesizes the same template from the same cached data. When you want to pick up environment changes, delete the context file deliberately and re-synthesize — as a conscious decision, not an accident.
Part 5: Bootstrap — The Trust Chain Nobody Explains
Every CDK tutorial tells you to run cdk bootstrap. Almost none of them explain what it actually creates or why it matters.
What Bootstrap Creates
When you run cdk bootstrap, it deploys a CloudFormation stack (called CDKToolkit) into your target account and region. This stack contains:
- S3 Bucket — stores file assets (Lambda code bundles, Docker context files)
- ECR Repository — stores Docker image assets
- Deploy Role — an IAM role that CDK assumes to initiate deployments
- CloudFormation Execution Role — the role CloudFormation assumes to create/modify resources
- File Publishing Role — for uploading assets to S3
- Image Publishing Role — for pushing images to ECR
The Trust Chain
Deployment flows through a specific chain of trust:
Your credentials (local or CI)
↓ assumes
Deploy Role
↓ passes to
CloudFormation
↓ assumes
Execution Role
↓ calls
AWS Service APIs (EC2, S3, DynamoDB, etc.)
Each arrow is an IAM trust relationship. If any link in this chain is misconfigured — a missing trust policy, an incorrect principal, an account ID mismatch — deployment fails. And the error messages are often cryptic enough to send engineers down the wrong debugging path for hours.
Why This Matters for Multi-Account Setups
In production environments, most teams use multiple AWS accounts — development, staging, production, shared services. CDK's bootstrap model is designed for this:
- Each target account needs to be bootstrapped
- The bootstrap roles in each account must trust the deploying account (often a CI/CD account)
- The execution role in each account determines what CloudFormation can actually create
This is where security teams get involved — and rightfully so. The execution role in your production account determines the blast radius of a deployment. An overly permissive execution role means a bad template can create or modify anything in production.
Why this matters for your team: Bootstrap is not a one-time setup command you run and forget. It's the security boundary of your deployment pipeline. Review the execution role's permissions. Understand which accounts trust which. In mature organizations, the bootstrap template is customized to enforce least-privilege — restricting what CloudFormation can do, even if the CDK code asks for it.
Part 6: The Deploy Lifecycle — What Actually Happens
Now that we understand all the components, let's trace the full lifecycle of cdk deploy from start to finish.
Step 1: Synthesis
Your CDK app executes. The construct tree is built. Logical IDs are generated. A CloudFormation template is written to cdk.out/.
Step 2: Asset Upload
If your stack includes file assets (Lambda code) or Docker images, CDK uploads them to the S3 bucket and ECR repository created during bootstrap.
Step 3: ChangeSet Creation
CDK submits the synthesized template to CloudFormation as a ChangeSet. A ChangeSet is CloudFormation's way of previewing what will happen — it's a diff between the currently deployed template and the new one.
Step 4: CloudFormation Diff Calculation
CloudFormation compares the new template against its stored state. For each resource, it determines:
- No change — resource definition is identical, skip it
- Update — a mutable property changed, update in-place
- Replace — an immutable property or Logical ID changed, delete and recreate
Step 5: Dependency Graph Execution
CloudFormation doesn't execute changes randomly. It builds a dependency graph and processes resources in the correct order — creating dependencies before dependents, deleting dependents before dependencies.
Step 6: API Execution
CloudFormation calls the actual AWS service APIs — CreateTable, PutBucketPolicy, CreateFunction, etc.
Step 7: State Update
Once all changes are applied (or rolled back on failure), CloudFormation updates its internal state to reflect the new reality.
The key insight: CDK exits after Step 3. Once the ChangeSet is submitted, CDK's role is finished. Everything from Step 4 onward is CloudFormation operating independently. When you're watching your terminal during cdk deploy, CDK is just polling CloudFormation for status updates — it's not controlling the process.
Part 7: Replacement Logic — Who Decides, and How
This is the question I get asked most often: "Why did CloudFormation replace my resource?"
The answer is never CDK. It's always CloudFormation, and it follows a specific decision tree:
Reason 1: Logical ID Changed
As we covered in Part 3, if the construct path changes, the Logical ID changes. CloudFormation interprets this as "old resource removed, new resource added." This is the most common cause of unintended replacements.
Reason 2: Immutable Property Changed
Some resource properties can only be set at creation time. Changing them requires replacement. Examples:
- DynamoDB partition key or sort key
- RDS engine type
- EC2 instance type in some configurations
- S3 bucket name (if explicitly set)
CloudFormation knows which properties are immutable for each resource type. When one changes, replacement is the only option.
Reason 3: Resource Type Changed
If you change a resource from one type to another (rare, but it happens during refactors), CloudFormation treats it as a deletion and creation.
How to Protect Against Unintended Replacement
Always review the ChangeSet before executing. CDK provides a built-in tool for this:
cdk diff
This shows you exactly what CloudFormation will do — including which resources will be replaced. Make this a mandatory step in your deployment process. In CI/CD pipelines, generate the diff as a review artifact before applying changes.
Part 8: The Questions Your Team Will Ask — Answered
These are the questions that come up in every CDK engagement I've been part of. Having clear answers to these saves hours of debugging and prevents production incidents.
"Why did my resource get replaced when I didn't change anything?"
You changed the construct path. The Logical ID shifted. CloudFormation interpreted this as a new resource. Check the ChangeSet — it will show the old and new Logical IDs. The fix: either revert the path change, or migrate the resource using CloudFormation's resource import feature.
"Does deleting cdk.context.json fix drift?"
No. Drift detection compares deployed resources against CloudFormation's stored state. The context file only affects synthesis. Deleting it forces fresh lookups, which may change your template — but it tells you nothing about drift. Use aws cloudformation detect-stack-drift for that.
"Why does the CI template differ from my local template?"
Because context wasn't committed. Your local machine has cached lookup results. CI performed fresh lookups and got different data. Commit cdk.context.json. If you deliberately want fresh lookups, delete the file and re-run cdk synth locally, then commit the updated cache.
"Why do I get permission errors during deployment?"
The trust chain is broken. Remember: your credentials → Deploy Role → CloudFormation → Execution Role → AWS APIs. Check each link. Common issues: the deploy role doesn't trust your CI account, the execution role lacks permission for a specific service, or the bootstrap stack is out of date.
"Why does refactoring break production?"
Because infrastructure identity depends on construct path stability. In application code, moving a class between packages is a safe operation. In CDK, moving a construct between parents changes the Logical ID, which changes the resource identity. Infrastructure code requires architectural discipline that application code does not.
The Production Playbook
These five practices are what separate teams that deploy CDK with confidence from teams that deploy with crossed fingers.
1. Separate Stateful and Stateless Stacks
NetworkStack → VPCs, Subnets, NAT Gateways
DatabaseStack → DynamoDB Tables, RDS Instances, ElastiCache
ApplicationStack → Lambdas, API Gateways, ECS Services
MonitoringStack → Alarms, Dashboards, SNS Topics
Stateful resources (databases, storage) live in stacks that change rarely. Stateless resources (compute, APIs) live in stacks that change frequently. This separation limits the blast radius of any single deployment. Your database stack should be boring — deployed once, modified almost never.
2. Apply Removal Policies to Stateful Resources
const table = new dynamodb.Table(this, "UsersTable", {
partitionKey: { name: "id", type: dynamodb.AttributeType.STRING },
removalPolicy: RemovalPolicy.RETAIN,
});
RemovalPolicy.RETAIN tells CloudFormation: "Even if you think this resource should be deleted, don't." If a Logical ID change causes CloudFormation to attempt replacement, the old resource will be retained instead of deleted. You'll have an orphaned resource to clean up, but you won't have data loss.
Apply this to every DynamoDB table, every RDS instance, every S3 bucket that holds data you cannot afford to lose.
3. Avoid Volatile Lookups
Every fromLookup() call introduces non-determinism into your synthesis. The template you get depends on the state of your AWS account at synthesis time.
Prefer explicit configuration:
// Instead of this:
const vpc = ec2.Vpc.fromLookup(this, "Vpc", { vpcId: "vpc-abc123" });
// Consider this — explicit, deterministic:
const vpc = ec2.Vpc.fromVpcAttributes(this, "Vpc", {
vpcId: "vpc-abc123",
availabilityZones: ["us-east-1a", "us-east-1b"],
publicSubnetIds: ["subnet-111", "subnet-222"],
});
Deterministic synthesis means the same code always produces the same template, regardless of when or where it runs. That's a property worth protecting.
4. Refactor Using a Migration Strategy
Never restructure constructs and deploy in one step. Use a phased approach:
Phase 1: Add the new construct alongside the old one. Deploy.
Phase 2: Migrate data or traffic from old to new.
Phase 3: Switch references to point to the new resource.
Phase 4: Remove the old construct (with RETAIN policy, so the underlying resource persists until you manually clean up).
This is slower than a single refactor-and-deploy. It's also the only approach that doesn't risk data loss.
5. Review ChangeSets — Always
Make this a non-negotiable rule:
# Before every production deployment:
cdk diff
# In CI/CD pipelines:
cdk deploy --require-approval broadening
No engineer should deploy to production without reading the ChangeSet. No CI pipeline should apply changes without human approval for anything that modifies IAM or deletes resources.
The five minutes spent reviewing a ChangeSet can save you the 11 hours it takes to restore a database from backup.
The Full Lifecycle — At a Glance
[CDK Code Written]
↓
[cdk synth runs your program]
↓
[Construct Tree built in memory]
↓
[Logical IDs generated from construct paths]
↓
[CloudFormation template written to cdk.out/]
↓
[Assets uploaded to S3/ECR via bootstrap roles]
↓
[ChangeSet submitted to CloudFormation]
↓
[CloudFormation diffs old vs new template]
↓
[Update / Replace / Delete decided per resource]
↓
[AWS APIs called in dependency order]
↓
[Stack state updated — deployment complete]
Key Takeaways
- CDK is a compiler. It produces CloudFormation templates. It does not manage infrastructure directly.
- Logical IDs are resource identity. They're derived from construct paths. Change the path, change the identity.
- Refactoring is not free. Moving constructs is an infrastructure operation, not a code cleanup.
-
Context caching affects templates, not drift. Commit
cdk.context.jsonto version control. - Bootstrap is your security boundary. The execution role determines what CloudFormation can do in each account.
- CloudFormation decides replacement, not CDK. Immutable property changes and Logical ID changes trigger replacement.
- Separation of stateful and stateless is non-negotiable. Your database stack should be the most boring stack in your codebase.
- Deterministic synthesis prevents surprises. Same code, same template, every time.
What's Next
This guide covers the foundation — the mental model every team needs before they can use CDK safely at scale. But there's more ground to cover: multi-account deployment strategies, custom constructs, pipeline architecture, and testing infrastructure code.
I write about cloud architecture, AWS patterns, and the hard-won lessons from building production infrastructure.
If this guide saved you from a future production incident — or explained something you've been struggling with — follow me on LinkedIn. I publish in-depth guides like this regularly.
Let me know in the comments: What's the most painful CDK lesson your team has learned?



Top comments (0)