Breindel Medina

Posted on May 12 • Edited on May 28

Modern Deployment Strategies with AWS: Blue/Green and Canary with CodeDeploy

#aws #devops #tutorial #cicd

When deploying applications to production, the method you use to replace the old code with the new code dictates your risk level. Historically, deployments meant overwriting the live server. Today, modern architectures use load balancers to shift traffic dynamically, drastically reducing the blast radius of a bad release.

Here is an in-depth look at how these deployment strategies work, complete with architectural flows, followed by a manual, CLI-driven hands-on guide to deploying a FastAPI container from Amazon ECR to ECS using AWS CodeDeploy.

The Danger of "In-Place" Deployments

Deploying without traffic shifting is known as an In-Place Deployment.

In practice, this means your server stops serving the current version (Version 1) and replaces it with the new version (Version 2) on the same machine or instance. There is no buffer, no parallel environment, just a direct swap.

The Risk: During the replacement, your server is either down or running in a partially updated state. If Version 2 contains a fatal bug, every single user hitting your server experiences the failure immediately. There is no subset of users, it is all or nothing.
The Rollback: You must manually re-deploy the old version, wait for it to boot, and wait for it to pass health checks. Depending on your setup, this can mean minutes of active downtime while you scramble to fix it.

The Traffic Shifting Alternatives

Traffic shifting decouples the infrastructure provisioning from the release. Instead of swapping the running server, you spin up the new version completely in the background. A Load Balancer sits in front of your servers and acts as a traffic cop, you manipulate its routing rules to control which users see which version, without ever taking the old version offline.

Blue/Green Deployment

Blue/Green provisions a completely identical, parallel server environment alongside the live one.

Blue is your current live server, it is serving 100% of your users right now.
Green is the new server, it is fully provisioned and running, but the load balancer is not sending anyone to it yet.

The steps are:

Provision: Spin up the Green server with Version 2. Users still hit Blue.
Test: Send internal or test traffic directly to Green to verify it works correctly.
Shift: Instruct the load balancer to swap: Blue receives 0% of traffic, Green receives 100%. The switch is instant.

Rollback: If something goes wrong, the load balancer just points back to Blue. Blue was never torn down, so the rollback is instant with zero re-provisioning.

Canary Deployment

Canary deployments use the same two-server setup as Blue/Green, but instead of an instant full switch, the load balancer gradually bleeds a small percentage of real user traffic to the new version while the majority stays on the old one.

The name comes from the old mining practice of sending a canary into a coal mine first, if it survives, it's safe for everyone else.

The steps are:

Provision: Spin up the new server with Version 2 in the background.
Shift: The load balancer routes 10% of real user traffic to the new server, and 90% stays on the old one.
Monitor: Watch your error rates, latency, and logs during this window. Only a small fraction of users are exposed to any potential bug.

Complete: If the 10% remains stable for a set bake time (e.g., 5 minutes), the load balancer shifts the remaining 90%. If errors spike, traffic is shifted back to the original server before the damage spreads.

The key difference from Blue/Green is the graduated exposure. Instead of betting everything on a single switch, you use real user traffic as a controlled test, with an automatic escape hatch if things go wrong.

AWS CodeDeploy

These deployment strategies can easily be implemented with the use of AWS CodeDeploy, but before that, what is AWS CodeDeploy?

AWS CodeDeploy is the delivery fleet. CodeDeploy does not build your code, and it does not run your tests. Its entire job is to take a finished, ready-to-go artifact (like a Docker image or a ZIP file) and place it onto your servers (EC2, ECS, or Lambda) safely.

Why CodeDeploy?

Without CodeDeploy, you would have to write custom, complex scripts to tell your load balancer to shift traffic, wait 5 minutes, check for errors, and then shift more traffic. CodeDeploy abstracts all of that. You just give it an appspec.yaml file, and it acts as the conductor—coordinating the load balancers, the containers, and the health checks to execute those advanced Blue/Green and Canary strategies automatically.

Hands-On: ECR to ECS via CodeDeploy (CLI Workflow)

Prerequisites:

AWS CLI installed and configured (aws configure)
Docker CLI installed
jq installed
An AWS account with permissions to create CloudFormation stacks

Provision stack using CloudFormation Templates

In CloudFormation, Press Create Stack then press With new resources (standard)

Input this link https://codedeploy-canary-bluegreen-template.s3.ap-southeast-1.amazonaws.com/CanaryBlueGreenWorkshopStack.template.json into the Amazon S3 URL input field so that it uses this template for the stack

Provide a stackname, preferably CanaryBlueGreenWorkshopStack-<unique number> then click next

Scroll down to the very bottom under Capabilities and tick the checkbox then Next
In the next page, click submit

Understanding the Infrastructure Stack

Before we push code, let’s look at the "piping" provisioned in your AWS account. These resources work together as a single unit to enable traffic shifting:

Virtual Private Cloud (VPC): The isolated network where your application lives. To keep costs low, we use Public Subnets only and by assigning each Fargate task a public IP, containers can reach ECR and Docker Hub directly without expensive NAT Gateways
Application Load Balancer (ALB): The "Front Door." It receives all incoming traffic and uses Listener Rules to decide which Target Group gets the request.
Target Groups (Blue & Green): These are logical groupings of your containers.
- Blue: Holds the currently running production version (v1).
- Green: The staging area where the new version (v2) is deployed before anyone sees it.
ECS Cluster & Fargate Service: The "Compute." We use Fargate, which is serverless—you don't manage EC2 instances; you just provide the Docker image and AWS runs it.
CodeDeploy & CloudWatch Alarms: The "Safety Officer." CodeDeploy manages the ALB weights. It watches a CloudWatch Alarm (monitoring 5XX errors). If the alarm triggers during the 5-minute Canary window, CodeDeploy immediately shifts traffic back to Blue.

Key Definitions for the Manual Workflow

Task Definition (task-def.json): Think of this as the Blueprint for the ECS. It defines which Docker image to use, how much CPU/Memory it needs, and which ports are open.
AppSpec (appspec.yaml): This is the Instruction Manual for CodeDeploy. It tells CodeDeploy which Task Definition to move to and which container inside that definition should receive the load balancer traffic.

Give Permissions to ECS to get ECR Image for upload later

Search the Execution Role and click it

Press Attach policies and Give AmazonECSTaskExecutionRolePolicy
Copy the Execution Role for later steps

The FastAPI Application

To install the dependencies, spin up a virtual environment with python through python -m venv .venv and install dependencies using the command pip install fastapi uvicorn and freezing the dependencies to a requirements.txt using pip freeze > requirements.txt

main.py

from fastapi import FastAPI
app = FastAPI()

# ---------------------------------------------------------
# WORKSHOP INSTRUCTIONS:
# For Version 1: Set VERSION = "v1"
# For Version 2: Set VERSION = "v2"
# ---------------------------------------------------------
VERSION = "v2"

@app.get("/")
def read_root():
    return {"status": "success", "version": VERSION}

@app.get("/crash")
def crash():
    raise Exception("Intentional failure for workshop demo")

@app.get("/health")
def health_check():
    # Keep this as JSON for the AWS Load Balancer health check
    return {"status": "healthy", "version": VERSION}

Dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "80"]

Build and Push the Docker Image to ECR

First, authenticate the Docker CLI to your Amazon ECR registry, build the image, and push it.

Click View Push Commands to see how to push your image to the ECR Registry, and perform the actions in your terminal ( Needs AWS CLI and Docker CLI for the commands to work )

In this case instead of being codedeploy-workshop:latest , use codedeploy-workshop:v2

Register the New ECS Task Definition

ECS needs to know about the new image. You define this in a JSON file.

task-def.json

{
  "family": "codedeploy-workshop-task",
  "executionRoleArn": "<The Execution Role from the Task Definition of ECS>",
  "networkMode": "awsvpc",
  "containerDefinitions": [
    {
      "name": "fastapi-container",
      "image": "<AWS_ACCOUNT_ID>.dkr.ecr.ap-southeast-1.amazonaws.com/codedeploy-workshop:v2",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80,
          "protocol": "tcp"
        }
      ]
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512"
}

aws ecs register-task-definition --cli-input-json file://task-def.json

You will see the task definition on the ECS Task Definitions page now

In another terminal, call the ALB URL to see which version is being delivered and keep it on until the CodeDeploy Trigger, but first you need to get the ALB URL

To get the ALB URL, Navigate to your CloudFormation Stack and go to outputs, in there you will see the LoadBalancerUrl, paste it in the command below to call the Load balancer

while true; do
  curl -s "http://<ALB_URL>/" 
  sleep 1
done

Trigger the CodeDeploy Deployment

CodeDeploy uses an appspec.yaml file to understand which Task Definition to deploy and how to map it to the load balancer.

Finding the Task Definition

Go to ECS Dashboard and in the sidebar, Navigate to Task Definitions

From there, click on the Task Definition with your stack name

From there, you will see the different revisions, click the most recent one

Copy the ARN and paste it to the TaskDefinition in appspec.yaml

appspec.yaml

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        # Update this to the new revision you just generated in Step 3
        TaskDefinition: "<ARN recently copied>"
        LoadBalancerInfo:
          ContainerName: "fastapi-container"
          ContainerPort: 80

Blue/Green Deployment

First, Lets take a look at Blue/Green Deployment with CodeDeploy

aws deploy create-deployment \
  --application-name AppECS-workshop-cluster-fastapi-service \
  --deployment-group-name DgpECS-workshop-cluster-fastapi-service \
  --deployment-config-name CodeDeployDefault.ECSAllAtOnce \
  --revision "{\"revisionType\": \"AppSpecContent\", \"appSpecContent\": {\"content\": $(jq -Rs . appspec.yaml)}}"

You will now see the deployment being made for a Blue/Green Deployment Configuration
Now check what is returning from the while loop earlier

You will now see its redirecting the other container

In the CodeDeploy dashboard, it also shows the traffic shifted

Now lets go back to the original, use the original task definition for the v1 server and edit appspec.yaml and wait for the rollback

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        # Update this to the new revision you just generated in Step 3
        TaskDefinition: "<Find Task Definition named CanaryBlueGreenWorkshopStackTaskDef>"
        LoadBalancerInfo:
          ContainerName: "fastapi-container"
          ContainerPort: 80

aws deploy create-deployment \
  --application-name AppECS-workshop-cluster-fastapi-service \
  --deployment-group-name DgpECS-workshop-cluster-fastapi-service \
  --deployment-config-name CodeDeployDefault.ECSAllAtOnce \
  --revision "{\"revisionType\": \"AppSpecContent\", \"appSpecContent\": {\"content\": $(jq -Rs . appspec.yaml)}}"

Call this again to start the roll back, call the while loop again and if it shows v1, the rollback has completed, then change the ARN again to the task definition named codedeploy-workshop-task

Canary Deployment

After rolling back to v1 again, lets now try to deploy with canary, we will also look at how CodeDeploy automatically stop traffic shift to the replacement server once there are errors being seen

aws deploy create-deployment \
  --application-name AppECS-workshop-cluster-fastapi-service \
  --deployment-group-name DgpECS-workshop-cluster-fastapi-service \
  --deployment-config-name CodeDeployDefault.ECSCanary10Percent5Minutes \
  --revision "{\"revisionType\": \"AppSpecContent\", \"appSpecContent\": {\"content\": $(jq -Rs . appspec.yaml)}}"

Once this command executes, CodeDeploy takes over. It spins up the the replacement container, waits for the /health endpoint to pass, and instructs the ALB to send 10% of traffic to the new containers. If CloudWatch Alarms remain quiet for 5 minutes, it routes the remaining 90%, gracefully terminating the old containers

See the Deployment in action during the 5 min window

while true; do
  curl -s "http://<ALB_URL>/" 
  sleep 1
done

As you can see, it switches occasionally to v2 since there will 10% that will be redirected to the replacement server

Call the crash endpoint in another terminal

while true; do
  curl -s "http://<ALB_URL>/crash"
  sleep 0.5
done

This triggers the CloudWatch alarm for 5XX errors

This then triggers CodeDeploy to start the rollback and stop the full rerouting to the replacement task and stay 100% to the original

As you can see in the Deployment Details, it succeeded in the rollback upon seeing the alarm

You might wonder why its 100% to replacement when it is supposed to be 100% to the original?
Its because in the rollback process, the replacement is the original instance ( the 90% ) and the original here is the one being routed 10% earlier which was the replacement

Clean Up

Navigate to our stack in CloudFormation and press Delete Stack . This will start deleting the resources you used effectively cleaning up your environment and avoid incurring more cost.

Takeaways

Deployment is a Spectrum of Risk

Standard "In-Place" deployments are the riskiest because they lack a safety buffer. Blue/Green and Canary strategies move the risk from the infrastructure (will it boot?) to the traffic (will it work for users?), allowing for zero-downtime releases.

Health Checks vs. CloudWatch Alarms

Modern CI/CD requires two layers of defense:

Target Group Health Checks: Ensure the container is "alive" before shifting any traffic (prevents Dead-on-Arrival deployments).
CloudWatch Alarms: Monitor the application's "behavior" during the shift (prevents runtime bugs from affecting 100% of users).

Automation Reduces "Human Error" Burnout

By using appspec.yaml and CodeDeploy, you remove the need for manual Load Balancer manipulation. This allows developers to focus on the code while the infrastructure handles the "bake time" and safety monitoring automatically.

DEV Community

Modern Deployment Strategies with AWS: Blue/Green and Canary with CodeDeploy

The Danger of "In-Place" Deployments

The Traffic Shifting Alternatives

Blue/Green Deployment

Canary Deployment

AWS CodeDeploy

Why CodeDeploy?

Hands-On: ECR to ECS via CodeDeploy (CLI Workflow)

Prerequisites:

Provision stack using CloudFormation Templates

Understanding the Infrastructure Stack

Key Definitions for the Manual Workflow

Give Permissions to ECS to get ECR Image for upload later

The FastAPI Application

Build and Push the Docker Image to ECR

Register the New ECS Task Definition

Trigger the CodeDeploy Deployment

Finding the Task Definition

Blue/Green Deployment

Canary Deployment

See the Deployment in action during the 5 min window

Call the crash endpoint in another terminal

Clean Up

Takeaways

Deployment is a Spectrum of Risk

Health Checks vs. CloudWatch Alarms

Automation Reduces "Human Error" Burnout

Top comments (0)