Day 12 of my 30-Day Terraform Challenge was all about one of the most practical and important DevOps concepts: deploying updates without taking an application offline.
Today I learned how Terraform handles infrastructure replacement, why its default behavior can cause downtime, and how to solve that using:
create_before_destroy- Auto Scaling Groups
- Application Load Balancers
- a full blue/green deployment strategy
This was one of the most exciting challenge days so far because it felt much closer to how production systems are actually managed.
Why Default Terraform Can Cause Downtime
Terraform is great at managing infrastructure, but not every resource can be updated in place.
Some resources — especially those tied to EC2 instance configuration, such as Launch Templates or Launch Configurations — often require replacement instead of modification.
By default, Terraform follows this sequence:
- Destroy the old resource
- Create the new resource
That is a problem for live infrastructure.
If an Auto Scaling Group is destroyed before the replacement is ready, then:
- the old EC2 instances are terminated
- the application stops serving traffic
- users experience downtime
- only after that does the new infrastructure come online
That gap is the outage window.
In real systems, even a short outage like that can be unacceptable.
The create_before_destroy Solution
Terraform provides a lifecycle rule called create_before_destroy to fix this.
Instead of deleting the old resource first, Terraform reverses the order:
- Create the new resource
- Wait for it to become healthy
- Destroy the old resource
That simple change makes a huge difference.
It allows new infrastructure to come online before the old infrastructure disappears, which is exactly what you want during updates.
This was the key pattern behind today’s zero-downtime deployment.
The Auto Scaling Group Naming Problem
There is an important catch with create_before_destroy.
When Terraform creates the replacement resource before destroying the original one, both resources must exist at the same time.
That becomes a problem with Auto Scaling Groups because AWS does not allow two ASGs with the same name to exist simultaneously.
So if your ASG name is fixed, the deployment fails.
The Fix
The correct solution is to avoid hardcoding the name and instead use something like:
name_prefix- or a generated unique suffix
I used name_prefix so AWS could generate unique names for replacement Auto Scaling Groups during deployment.
That solved the conflict and made zero-downtime replacement possible.
Deploying Version 1
For the first deployment, I launched my infrastructure with a simple application response showing version 1.
Once the Auto Scaling Group instances passed health checks and the Load Balancer became healthy, I verified the application in the browser.
This gave me my initial “live” environment.
At that point, traffic was being served successfully by the original deployment.
Verifying Zero Downtime with a Traffic Loop
To actually prove that my deployment was zero-downtime, I opened a second terminal and ran a continuous traffic check against the ALB.
The goal was simple:
- keep sending requests during deployment
- observe whether traffic ever failed
- confirm that the application stayed available the whole time
This is a very practical way to test infrastructure changes because it simulates a real user continuously accessing the app while updates are happening.
Deploying Version 2
Next, I updated the application response from v1 to v2.
This forced Terraform to replace the underlying instance configuration.
Normally, this kind of change could cause downtime.
But because I had configured Terraform with create_before_destroy, the update behaved differently:
- the replacement infrastructure was created first
- the new instances came online
- health checks passed
- traffic remained available
- then the old resources were removed
What I Observed
The traffic loop kept returning responses during the entire deployment.
At one point, the output changed from:
- v1
to:
- v2
The important part is that there were:
- no connection errors
- no timeouts
- no interruption in service
That was my clearest proof that the deployment was truly zero-downtime.
This was honestly one of the most satisfying Terraform experiments I’ve done so far.
Taking It Further with Blue/Green Deployment
After proving a rolling zero-downtime update, I moved on to a more advanced deployment strategy:
Blue/Green Deployment
Blue/green deployment keeps two complete environments running at the same time:
- Blue = current live environment
- Green = new environment
Instead of replacing the live environment directly, you deploy the new version separately and then shift traffic when it is ready.
This pattern is extremely powerful because it makes rollbacks much easier and reduces deployment risk.
How the Traffic Switch Works
I implemented blue/green using:
- two target groups
- two Auto Scaling Groups
- one ALB listener rule
A variable called active_environment controlled which target group received traffic.
That meant I could switch between:
bluegreen
just by changing one Terraform variable and applying again.
Why This Is So Effective
The traffic switch happens at the load balancer level.
That means Terraform does not need to tear down one environment before enabling the other.
Instead, the ALB updates where traffic is routed.
This makes the cutover:
- fast
- clean
- effectively instantaneous
That is what makes blue/green deployment such a strong production deployment pattern.
Testing the Blue/Green Switch
After both environments were deployed, I tested switching traffic from:
- blue → green
- then green → blue
Each time, the Terraform apply completed successfully and traffic moved to the selected environment.
There was no observable interruption while switching.
This was a really useful demonstration because it showed how infrastructure can support safe application rollouts and fast rollback paths.
That is exactly the kind of deployment flexibility teams want in production systems.
Limitations of create_before_destroy
Even though create_before_destroy is powerful, it has tradeoffs.
Some limitations include:
- You temporarily run duplicate infrastructure, which increases cost
- Some AWS resources still have naming or uniqueness constraints
- It does not provide fine-grained traffic control
- It is still tied to replacement behavior rather than explicit traffic switching
This is where blue/green becomes more powerful.
Tradeoffs of Blue/Green Deployment
Blue/green solves several of the limitations of create_before_destroy, but it introduces its own tradeoffs.
Advantages:
- safer rollouts
- fast rollback
- cleaner traffic switching
- better deployment isolation
Tradeoffs:
- higher cost because two environments run at once
- more infrastructure complexity
- more moving parts to manage
So while blue/green is more powerful, it also requires more operational discipline.
What I Learned from Day 12
Today taught me that infrastructure updates are not just about changing code — they are about changing systems safely.
The biggest lessons for me were:
- why Terraform’s default replacement behavior can be dangerous
- how
create_before_destroyavoids downtime - why naming strategy matters in AWS
- how blue/green deployment gives even more control
- how to verify deployment safety using real traffic
This was one of the most production-relevant challenge days so far.
Final Thoughts
Day 12 made Terraform feel much closer to real-world DevOps work.
It moved beyond simply provisioning infrastructure and into something more valuable:
deploying live updates safely and predictably
That is one of the most important skills in cloud and infrastructure engineering.
And honestly, seeing traffic switch from v1 → v2 and blue → green without downtime was very satisfying.
Connect With Me
I’m documenting my journey through the 30-Day Terraform Challenge as I continue learning more about:
- Terraform
- AWS
- Infrastructure as Code
- DevOps best practices
If you are also learning cloud or IaC, feel free to connect.
Top comments (0)