Most ECS blue-green deployment tutorials eventually lead to the same stack:
- AWS CodeDeploy
- Deployment groups
- AppSpec files
- Lifecycle hooks
- Weighted traffic shifting
- Complex rollback orchestration
And while CodeDeploy works, I kept running into one practical limitation during real deployments:
I couldn’t let my internal team validate a new release on the actual production URL before exposing it to customers.
That became the entire motivation behind this setup.
I didn’t want:
- separate staging domains
- duplicate ALBs
- temporary preview environments
- “almost production” testing
I wanted something much simpler:
- Internal users should see the new version first
- Customers should continue seeing the stable version
- Both should use the same production domain
- Rollback should be immediate
- Deployments should remain fully zero downtime
So I built a Terraform-driven deployment workflow using:
- ECS Fargate
- Application Load Balancer (ALB)
- ALB listener priorities
- Source IP routing
- Terraform
without using CodeDeploy.
After running this setup in practice, I ended up preferring it for many ECS workloads.
The Core Idea
Both BLUE and GREEN environments run behind the same ALB.
Internal office/VPN IPs get routed to GREEN first.
Everyone else continues hitting BLUE.
That means QA and internal teams can validate the new release directly on the real production infrastructure before public rollout begins.
Same:
- domain
- SSL certificate
- ALB
- authentication flow
- redirects
- networking path
No “staging surprises” later.
A lot of deployment issues only appear on the real production routing path.
Real Example
Internal users open:
https://nginx.jayakrishnayadav.cloud
…and immediately see the GREEN version.
Meanwhile, public users continue seeing BLUE.
No DNS switching.
No duplicate infrastructure.
Just ALB listener routing.
Architecture Overview
The deployment flow looks like this:
┌────────────────────┐
│ Application LB │
└─────────┬──────────┘
│
┌────────────────┴────────────────┐
│ │
Internal Office/VPN IPs Public Users
│ │
▼ ▼
GREEN Target Group BLUE Target Group
│ │
ECS GREEN Tasks ECS BLUE Tasks
The canary routing rule gets evaluated first.
If the request source IP matches internal CIDRs, traffic goes to GREEN.
Everything else falls back to BLUE.
Terraform Structure
I kept the Terraform layout modular so it could be reused across multiple services.
.
├── main.tf
├── variables.tf
├── outputs.tf
├── env/
│ ├── backend.hcl
│ └── terraform.tfvars
├── modules/
│ ├── vpc/
│ ├── iam/
│ ├── alb/
│ ├── ecs-cluster/
│ └── ecs-blue-green-service/
└── scripts/
└── zero-downtime-test.sh
Each ECS service gets:
- BLUE ECS service
- GREEN ECS service
- BLUE target group
- GREEN target group
- production listener rule
- optional canary listener rule
ALB Listener Rule Logic
The entire deployment behavior depends on ALB listener priorities.
The canary listener rule gets evaluated first.
If the request source IP matches internal CIDRs, traffic gets forwarded to GREEN.
resource "aws_lb_listener_rule" "canary" {
count = var.activate_canary ? 1 : 0
priority = 99
condition {
source_ip {
values = var.canary_source_ips
}
}
condition {
host_header {
values = ["nginx.jayakrishnayadav.cloud"]
}
}
action {
type = "forward"
target_group_arn = aws_lb_target_group.green.arn
}
}
The production rule remains below it:
resource "aws_lb_listener_rule" "production" {
priority = 100
condition {
host_header {
values = ["nginx.jayakrishnayadav.cloud"]
}
}
action {
type = "forward"
target_group_arn = local.active_target_group
}
}
That’s it.
No weighted routing.
No lifecycle hooks.
Just listener priorities.
Real Deployment Workflow
This wasn’t built as a theoretical architecture exercise.
I tested the rollout flow directly from Terraform while continuously validating traffic behavior against live ECS Fargate services.
Terraform initialization:
terraform init -backend-config=env/backend.hcl
Deployment apply:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
During canary validation, I continuously verified my public IP:
curl ifconfig.me
That mattered because the ALB source-IP rule decides whether traffic reaches:
- BLUE
- GREEN
Once my IP matched the configured canary CIDRs, traffic immediately started routing to GREEN.
Deployment Flow
The nice part about this setup is that everything becomes variable-driven.
Step 1 — Normal Production State
BLUE handles all production traffic.
GREEN remains scaled down.
enable_canary = false
activate_canary = false
promote_to_all = false
Apply:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
Result:
- BLUE active
- GREEN inactive
- minimal Fargate cost
Step 2 — Start GREEN Tasks
Now we start the GREEN environment.
enable_canary = true
activate_canary = false
promote_to_all = false
Apply again:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
At this stage:
- GREEN tasks start
- ECS health checks complete
- ALB target registration completes
- no production traffic reaches GREEN yet
Users never hit partially starting containers.
Step 3 — Internal Canary Validation
Now we enable canary routing.
enable_canary = true
activate_canary = true
promote_to_all = false
Apply again:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
Now:
- internal office/VPN users hit GREEN
- public users continue hitting BLUE
This became the most valuable phase of the deployment workflow.
Because now:
- QA validates production behavior
- developers inspect logs
- authentication flows get tested
- sessions and redirects get verified
while customers remain completely unaffected.
Internal Canary Routing
This is the ALB listener rules view while canary routing is enabled.
The priority 99 rule matches internal source IPs and forwards them to GREEN, while everyone else continues hitting BLUE.
Step 4 — Promote GREEN to Production
Once validation looks good:
enable_canary = true
activate_canary = false
promote_to_all = true
Apply again:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
Now:
- production listener switches to GREEN
- BLUE scales down
- all users see the new version
No downtime occurs.
Traffic simply moves from one target group to another.
Verifying Zero Downtime
I didn’t want to assume the deployment was safe.
I wanted to verify it continuously during rollout.
So I used a simple curl-based validation script that continuously hit both applications while traffic shifted between BLUE and GREEN.
for i in {1..100}
do
for url in \
"https://nginx.jayakrishnayadav.cloud/" \
"https://apache.jayakrishnayadav.cloud/"
do
response=$(curl -k -s -w " HTTPSTATUS:%{http_code}" "$url")
body=${response% HTTPSTATUS:*}
status=${response##*HTTPSTATUS:}
if [[ $body == *"BLUE - v"* ]]; then
color="BLUE"
elif [[ $body == *"GREEN - v"* ]]; then
color="GREEN"
else
color="UNKNOWN"
fi
echo "Run: $i | URL: $url | Status: $status | Version: $color"
done
done
Output during deployment:
You can clearly see:
- HTTP 200 responses throughout deployment
- no failed requests
- no 503s
- clean traffic movement from BLUE to GREEN
That confirmed the deployment was genuinely zero downtime.
Production Promotion View
After promotion:
- the canary rule disappears
- the production listener points directly to GREEN
- all traffic reaches the new version
- BLUE scales down to zero
Clean and simple.
Rollback
Rollback became extremely simple.
I just reverted the Terraform variables:
enable_canary = false
activate_canary = false
promote_to_all = false
Apply Terraform again:
terraform apply \
-var-file=env/terraform.tfvars \
-lock=false \
-auto-approve
ALB immediately routes traffic back to BLUE.
The rollback process stays predictable because traffic switching is entirely controlled through ALB listener rules.
HTTPS Configuration
The ALB uses ACM certificates for HTTPS.
Listeners:
- Port 80 → redirect to HTTPS
- Port 443 → production traffic
- optional internal listener → restricted to internal CIDRs
Example:
test_listener_allowed_cidrs = [
"160.30.39.198/32"
]
That keeps internal preview traffic private while still using the same production infrastructure.
Cost Optimization
One thing I specifically wanted to avoid was permanently doubling infrastructure cost.
Normal state:
- only BLUE tasks run
Deployment window:
- BLUE + GREEN both run temporarily
After promotion:
- BLUE scales down again
So infrastructure cost only increases briefly during deployments.
Final Thoughts
This project started because I wanted a very practical deployment workflow:
Internal users should validate the new version on the actual production URL before customers ever see it.
Once I implemented that using ALB listener priorities and source IP routing, I realized I no longer really needed CodeDeploy for this workflow.
The end result became:
- simpler
- easier to operate
- easier to rollback
- easier to debug
- easier to reason about
- fully zero downtime
And because everything is Terraform-driven, the deployment process stays reproducible and predictable.
GitHub Repository
Full Terraform implementation:
https://github.com/jayakrishnayadav24/ecs-blue-green-deployment/tree/canary




Top comments (0)