I Got Tired of Lambda Deployment Headaches, So I Built My Own Blue-Green Deployment System
We've all had that moment.
You push a new Lambda version to production, hit deploy, and then spend the next few minutes refreshing dashboards and logs hoping nothing breaks.
I've been there more times than I'd like to admit.
Sometimes deployments worked perfectly. Other times I would see unexpected 503 errors, traffic routing issues, or situations where rolling back wasn't as fast as I wanted. The Lambda deployment itself wasn't usually the problem. The challenge was everything around it—traffic routing, validation, and rollback.
After hitting the same deployment issues multiple times, I decided to build a deployment model that matched how I actually wanted releases to work.
The result was a Terraform-driven blue-green deployment framework for AWS Lambda that gives me:
- Zero-downtime deployments
- IP-based canary testing
- Weighted traffic shifting
- Instant rollback
- No CodeDeploy
- No Route53 dependency
- No hardcoded VPC IDs
Everything is controlled through Terraform.
The Problem I Wanted to Solve
AWS already provides deployment options for Lambda through aliases and CodeDeploy.
For many teams, that's enough.
But in my case, there were a few limitations that kept bothering me.
1. I Wanted to Test Production Before Users Did
When deploying a new version, I wanted a way to access the new release from my own machine while everyone else continued using the stable version.
With standard Lambda deployments, that wasn't straightforward.
2. Rollbacks Needed To Be Instant
If something goes wrong after deployment, I don't want to wait for another deployment process.
I want traffic to switch back immediately.
3. I Wanted Full Traffic Control
Sometimes I want:
- 10% rollout
- 25% rollout
- 50% rollout
- 75% rollout
Not just predefined deployment strategies.
4. I Wanted Less Moving Parts
CodeDeploy works, but it introduces additional services, configurations, hooks, and deployment definitions.
For this project, I wanted something simpler.
The Idea
Instead of thinking about Lambda deployments as version updates, I started thinking about them the same way ECS blue-green deployments work.
What if two Lambda environments always existed?
One would be stable.
One would contain the new release.
Traffic could simply move between them.
That led to this architecture:
- Blue Lambda (current production)
- Green Lambda (new version)
- Blue Target Group
- Green Target Group
- ALB Listener Rules controlling traffic
The important detail is that both environments always exist.
Nothing is destroyed.
Nothing is recreated.
Traffic is simply redirected.
That's what eliminates deployment interruptions.
Everything Is Controlled Through Terraform
The deployment workflow is driven through a simple Terraform configuration.
nginx-lambda = {
family = "nginx-svc"
blue_image = ".../nginx-lambda:blue"
green_image = ".../nginx-lambda:green"
active_color = "blue"
activate_canary = false
green_weight = 0
}
Whenever I want to change deployment behavior, I update the values and run:
terraform apply
That's it.
No deployment pipelines.
No manual traffic switching.
No complicated release process.
Phase 1: Canary Testing
The first thing I do after deploying a new version is test it myself.
I enable canary mode:
activate_canary = true
green_weight = 0
This creates a higher-priority ALB rule.
If requests come from my whitelisted IP address:
My IP → Green Version
Everyone else still reaches:
Users → Blue Version
This allows me to validate:
- Application functionality
- API responses
- Headers
- Authentication flows
- Database interactions
All in production.
Without exposing the new version to actual users.
Phase 2: Gradual Traffic Shifting
Once canary testing looks good, I begin moving real traffic.
For example:
green_weight = 50
Now:
50% → Blue
50% → Green
The Application Load Balancer handles traffic distribution automatically.
Because both target groups already exist and already have Lambda functions attached, traffic shifting happens smoothly.
No downtime.
No target registration delays.
No deployment gaps.
Phase 3: Full Promotion
When confidence is high enough, I move all traffic to Green.
green_weight = 100
At this point:
100% Users → Green
But here's the part I like most.
Blue still exists.
It's sitting there untouched.
Ready for rollback.
Phase 4: Instant Rollback
Every deployment system claims to support rollback.
The real question is:
How fast?
With this setup, rollback is literally:
green_weight = 0
Apply Terraform.
Traffic immediately returns to Blue.
No redeployment.
No waiting for Lambda versions.
No waiting for target group registration.
No waiting for containers to start.
The previous stable environment never disappeared.
What About The Next Release?
This was actually one of the hardest design problems.
Let's say:
- Blue = v1
- Green = v2
After promoting v2, what happens when v3 arrives?
The answer is surprisingly simple.
Rotate the images.
blue_image = ".../nginx-lambda:green"
green_image = ".../nginx-lambda:yellow"
Now:
Blue = v2 (stable)
Green = v3 (candidate)
The deployment cycle repeats.
Canary.
Weighted rollout.
Promotion.
Rollback if needed.
Forever.
Why I Stopped Seeing 503 Errors
Before building this, I experimented with several approaches.
Lambda Alias Switching
This occasionally introduced timing issues during updates.
Dynamic Target Groups
Using conditional resources sometimes caused Terraform to destroy resources before listener rules updated.
Permission Propagation Delays
Lambda permissions occasionally weren't fully propagated before attachments occurred.
The common problem was simple:
Resources were being created and destroyed during deployment.
Eventually I realized the best deployment is often the one that changes the least.
So I stopped changing infrastructure during releases.
Instead:
- Blue always exists
- Green always exists
- Target groups always exist
- Attachments always exist
The only thing that changes is traffic routing.
That's why the process is so reliable.
GitHub Repository
https://github.com/jayakrishnayadav24/lambda-blue-green-deployment
Final Thoughts
The biggest lesson from building this wasn't about Lambda or Terraform.
It was about reducing moving parts.
Instead of making deployments smarter, I made them simpler.
Both environments stay alive.
Traffic moves gradually.
Rollback is immediate.
And I can personally validate a release before customers ever touch it.
After several production deployments using this pattern, it's become one of my favorite deployment strategies for Lambda workloads.
If you've been fighting deployment-related downtime or complicated release workflows, you might find this approach useful too.











Top comments (0)