DEV Community

bardawilpeter
bardawilpeter

Posted on

Zero-Downtime Deployments with AWS and Terraform

Greetings, tech enthusiasts! I'm excited to share an approach to setting up blue/green deployments on AWS ECS using Terraform. This post isn't just about the 'how' but also the 'why' behind each step, making it different from what you might have read online.

Why Blue/Green Deployment?

Before diving into the code, let's understand why blue/green deployment is a game-changer:

Minimized Downtime: Seamless transition between versions.
Risk Reduction: Immediate rollback capabilities.
Consistent Environment: Identical production environments reduce surprises 😲

Have you ever wondered how to maintain your application’s availability even during updates? That's exactly what we'll uncover here.

How It Works: The Mechanics of Zero-Downtime Deployment

To appreciate the beauty of zero-downtime deployments, let's break down how the setup functions:

ECS Cluster: This is the core where our containerized applications run. When we deploy new versions, ECS ensures that the transition happens without interrupting the current services.

Task Definitions: These are like blueprints for our applications. Each task definition specifies how to run a container, including which Docker image to use, CPU and memory allocations, and more.

CodeDeploy Integration: AWS CodeDeploy plays a pivotal role. It orchestrates the deployment process by gradually shifting traffic from the old version (blue) to the new version (green) of the application. This phased approach allows for monitoring the new version under load, ensuring stability before full traffic is shifted.

Terraform's Role: Terraform scripts are used to define and create the necessary AWS resources in a repeatable and consistent manner. This includes setting up the ECS cluster, task definitions, and CodeDeploy configurations.

Transitioning to Terraform: Building the Foundation

Now that we've explored the key components and their roles in zero-downtime deployment, let's dive into the heart of the matter - the Terraform code that brings our setup to life. This is where the theory meets practice.

Terraform Code Breakdown

- Setting up the AWS Provider

We start by configuring our Terraform provider for AWS. This step is crucial as it tells Terraform how to interact with our AWS resources.

provider "aws" {
  region = "us-east-1"  # Feel free to choose the region that best fits your needs
}
Enter fullscreen mode Exit fullscreen mode

- Creating the VPC and Subnets
A VPC provides network isolation for your resources. Private and public subnets are created for ECS tasks and the ALB.

# Create a VPC
resource "aws_vpc" "my_vpc" {
  cidr_block = "10.0.0.0/16"
}

# Create a private subnet for ECS
resource "aws_subnet" "private_subnet" {
  vpc_id                  = aws_vpc.my_vpc.id
  cidr_block              = "10.0.1.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = false
}

# Create a public subnet in two different AZs for ALB
resource "aws_subnet" "public_subnet_1" {
  vpc_id                  = aws_vpc.my_vpc.id
  cidr_block              = "10.0.4.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = true
}

resource "aws_subnet" "public_subnet_2" {
  vpc_id                  = aws_vpc.my_vpc.id
  cidr_block              = "10.0.5.0/24"
  availability_zone       = "us-east-1b"
  map_public_ip_on_launch = true
}
Enter fullscreen mode Exit fullscreen mode

- Network Configuration and Routing
The next crucial part of our Terraform setup involves establishing a robust network infrastructure. This step is pivotal for maintaining secure and efficient communication within our AWS environment.

resource "aws_eip" "nat_eip" {}

resource "aws_nat_gateway" "nat_gateway" {
  allocation_id = aws_eip.nat_eip.id
  subnet_id     = aws_subnet.public_subnet_1.id
  depends_on    = [aws_internet_gateway.my_gateway]
}


resource "aws_route_table" "private_route_table" {
  vpc_id = aws_vpc.my_vpc.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat_gateway.id
  }
}

resource "aws_route_table_association" "private_subnet_association" {
  subnet_id      = aws_subnet.private_subnet.id
  route_table_id = aws_route_table.private_route_table.id
}


resource "aws_internet_gateway" "my_gateway" {
  vpc_id = aws_vpc.my_vpc.id
}

resource "aws_route_table" "public_route_table" {
  vpc_id = aws_vpc.my_vpc.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.my_gateway.id
  }
}

resource "aws_route_table_association" "public_subnet_association_1" {
  subnet_id      = aws_subnet.public_subnet_1.id
  route_table_id = aws_route_table.public_route_table.id
}

resource "aws_route_table_association" "public_subnet_association_2" {
  subnet_id      = aws_subnet.public_subnet_2.id
  route_table_id = aws_route_table.public_route_table.id
}
Enter fullscreen mode Exit fullscreen mode

- Security Groups for ECS and ALB
Security Groups act as a virtual firewall, defining the rules that control traffic to and from our ECS tasks and the Application Load Balancer (ALB). Here's how we set them up in Terraform:

# Create a security group for ECS tasks
resource "aws_security_group" "ecs_sg" {
  name_prefix = "ecs-sg-"
  description = "Security group for ECS tasks"
  vpc_id      = aws_vpc.my_vpc.id

  # Allow inbound HTTP traffic from the ALB
  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1" # -1 means all protocols
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# Create a security group for the Application Load Balancer (ALB)
resource "aws_security_group" "alb_sg" {
  name_prefix = "alb-sg-"
  description = "Security group for ALB"
  vpc_id      = aws_vpc.my_vpc.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
Enter fullscreen mode Exit fullscreen mode

- Creating the ECS Cluster
The ECS cluster is where all our services and tasks will be managed. Here's how we define it in Terraform:

resource "aws_ecs_cluster" "ecs_cluster" {
  name = "zero-downtime-cluster"
}
Enter fullscreen mode Exit fullscreen mode

- Task Definitions and IAM Role
Now, let's dive into defining the ECS task that will run in our cluster. But before that, it's crucial to discuss the essential IAM role required for ECS task execution.

IAM Role for ECS Task Execution:
Before we proceed to define the ECS task, we need to ensure that our tasks have the necessary permissions to run seamlessly within our cluster. To achieve this, we create an IAM role specifically for ECS task execution.

Here's how we define the IAM role in Terraform:

resource "aws_iam_role" "ecs_execution_role" {
  name = "ecs_execution_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "ecs-tasks.amazonaws.com"
        }
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

Task Definition:
Now that we have the IAM role in place, we can proceed to define the ECS task that will run in our cluster. For simplicity and focus on the deployment strategy, we'll use a pre-built Docker image. In this example, we'll employ nginx:latest, which is widely used for web servers.

Here's how the task definition looks in Terraform:

# Create an ECS task definition
resource "aws_ecs_task_definition" "ecs_task" {
  family                   = "my-ecs-task"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = aws_iam_role.ecs_execution_role.arn

  container_definitions = jsonencode([
    {
      name      = "my-container",
      image     = "nginx:latest", # Replace with your Docker image
      essential = true,
      portMappings = [
        {
          containerPort = 80,
          hostPort      = 80
        }
      ],
      memoryReservation = 128
    }
  ])
}
Enter fullscreen mode Exit fullscreen mode

Note on Docker Images: This example uses nginx:latest for its simplicity and ease of demonstration. However, in a real-world scenario, you might use a custom image that you've built and pushed to a container registry like Amazon ECR. You can replace nginx:latest with the URI of your custom image.

- Creating an ECS Service
Now, let's proceed to create an ECS service, a fundamental component for managing and ensuring the availability of our containers within our ECS cluster.

# Create an ECS service
resource "aws_ecs_service" "ecs_service" {
  name            = "my-ecs-service"
  cluster         = aws_ecs_cluster.ecs_cluster.id
  task_definition = aws_ecs_task_definition.ecs_task.arn
  launch_type     = "FARGATE"
  desired_count   = 2

  network_configuration {
    subnets         = [aws_subnet.private_subnet.id]
    security_groups = [aws_security_group.ecs_sg.id]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.blue_group.arn
    container_name   = "my-container"
    container_port   = 80
  }

  deployment_controller {
    type = "CODE_DEPLOY"
  }
  lifecycle {
    ignore_changes = [task_definition, desired_count, load_balancer]
  }
  depends_on = [aws_lb_target_group.blue_group, aws_lb_target_group.green_group]
}
Enter fullscreen mode Exit fullscreen mode

- Application Load Balancer and Target Groups

Now, let's set up our Application Load Balancer (ALB) and the target groups for blue/green deployment.

ALB Configuration:

# Create an Application Load Balancer (ALB)
resource "aws_lb" "zero_downtime_alb" {
  name               = "my-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb_sg.id]
  subnets            = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id]
}

Enter fullscreen mode Exit fullscreen mode

Target Groups for Blue/Green Deployment:

# Create target groups for blue/green deployment (HTTP)
resource "aws_lb_target_group" "blue_group" {
  name        = "blue-target-group"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.my_vpc.id
  target_type = "ip"
  health_check {
    enabled             = true
    interval            = 30
    path                = "/"
    port                = "traffic-port"
    protocol            = "HTTP"
    healthy_threshold   = 3
    unhealthy_threshold = 3
    timeout             = 5
    matcher             = "200"
  }
}

resource "aws_lb_target_group" "green_group" {
  name        = "green-target-group"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = aws_vpc.my_vpc.id
  target_type = "ip"
  health_check {
    enabled             = true
    interval            = 30
    path                = "/"
    port                = "traffic-port"
    protocol            = "HTTP"
    healthy_threshold   = 3
    unhealthy_threshold = 3
    timeout             = 5
    matcher             = "200"
  }
}
Enter fullscreen mode Exit fullscreen mode

Listener Configuration:

# Create an ALB listener for HTTP
resource "aws_lb_listener" "my_alb_listener" {
  load_balancer_arn = aws_lb.zero_downtime_alb.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue_group.arn
  }
  lifecycle {
    ignore_changes = [
      default_action # Ignore changes to the default action to prevent Terraform from reverting CodeDeploy changes
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

- Integrating with CodeDeploy

IAM Role for CodeDeploy:

resource "aws_iam_role" "codedeploy_role" {
  name = "codedeploy_role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "codedeploy.amazonaws.com"
        }
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

IAM Policy for CodeDeploy:
This IAM policy defines the permissions required for CodeDeploy to interact with ECS and Elastic Load Balancing.

resource "aws_iam_policy" "codedeploy_policy" {
  name        = "CodeDeployECSPolicy"
  description = "Policy for AWS CodeDeploy ECS Role"

  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Effect = "Allow",
        Action = [
          "ecs:DescribeServices",
          "ecs:UpdateService",
          "ecs:CreateTaskSet",
          "ecs:UpdateServicePrimaryTaskSet",
          "ecs:DeleteTaskSet",
          "ecs:ListTaskSets",
        ],
        Resource = "*"
      },
      {
        Effect = "Allow",
        Action = [
          "elasticloadbalancing:Describe*",
          "elasticloadbalancing:DeregisterTargets",
          "elasticloadbalancing:RegisterTargets",
          "elasticloadbalancing:DeleteListener",
          "elasticloadbalancing:CreateListener",
          "elasticloadbalancing:ModifyListener"
        ],
        Resource = "*"
      },
      {
        Effect   = "Allow",
        Action   = "iam:PassRole",
        Resource = aws_iam_role.ecs_execution_role.arn
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

IAM Role Policy Attachment:

resource "aws_iam_role_policy_attachment" "codedeploy_attachment" {
  role       = aws_iam_role.codedeploy_role.name
  policy_arn = aws_iam_policy.codedeploy_policy.arn
}
Enter fullscreen mode Exit fullscreen mode

CodeDeploy Application and Deployment Group:
The following resources create a CodeDeploy application and deployment group for ECS:

# Create CodeDeploy application
resource "aws_codedeploy_app" "codedeploy_app" {
  name             = "my-codedeploy-app"
  compute_platform = "ECS"
}

# Create CodeDeploy deployment group
resource "aws_codedeploy_deployment_group" "codedeploy_group" {
  app_name               = aws_codedeploy_app.codedeploy_app.name
  deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
  deployment_group_name  = "my-codedeploy-group"
  service_role_arn       = aws_iam_role.codedeploy_role.arn

  ecs_service {
    cluster_name = aws_ecs_cluster.ecs_cluster.name
    service_name = aws_ecs_service.ecs_service.name
  }


  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.my_alb_listener.arn]
      }

      target_group {
        name = aws_lb_target_group.blue_group.name
      }

      target_group {
        name = aws_lb_target_group.green_group.name
      }
    }
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 1
    }
  }

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE"]
  }
}
Enter fullscreen mode Exit fullscreen mode

- Outputting the ALB DNS Name

After deploying our ECS service with CodeDeploy, it's crucial to retrieve essential information for monitoring and accessing our application. Here, we define Terraform outputs to provide this information:

ALB DNS Name:
This output displays the DNS name of the Application Load Balancer (ALB), allowing easy access to your deployed application. You can use this DNS name in your browser to access the application.

output "alb_dns_name" {
  description = "The DNS name of the ALB"
  value       = aws_lb.zero_downtime_alb.dns_name
}
Enter fullscreen mode Exit fullscreen mode

Latest Task Definition ARN:
This output provides the Amazon Resource Name (ARN) of the latest ECS task definition. It's useful for monitoring and tracking your task definitions.

output "latest_task_definition_arn" {
  description = "The ARN of the latest task definition"
  value       = aws_ecs_task_definition.ecs_task.arn
}
Enter fullscreen mode Exit fullscreen mode

Container Name:
This output extracts the name of the container from the ECS task definition. Knowing the container name is essential for tasks like debugging and logging.

output "container_name" {
  description = "The name of the container"
  value       = jsondecode(aws_ecs_task_definition.ecs_task.container_definitions)[0].name
}
Enter fullscreen mode Exit fullscreen mode

Why This Code Matters

In this setup, every line of Terraform code plays a critical role in ensuring a smooth and reliable deployment process. Here's a breakdown of why each component is significant:

AWS Provider Setup: Starting with the AWS provider is foundational. The chosen region impacts latency, cost, and availability, making this a crucial first step.

ECS Cluster Creation: The ECS cluster is where our containerized applications are managed. It's the backbone of our deployment, providing the environment where our applications run.

Task Definition: The task definition is essential for describing how our application runs in the containers. It includes specifications for the Docker image, CPU, memory, and more, ensuring our application behaves as expected.

ALB Configuration: The Application Load Balancer is the traffic director of our setup. It handles incoming requests and routes them to the appropriate target groups. This is vital for managing traffic during the blue/green deployment process.

Target Groups for Blue/Green Deployment: Setting up separate target groups for the blue and green environments allows us to control and monitor traffic flow to each version of the application. It's key for a seamless transition and rollback if needed.

CodeDeploy Integration: Integrating with CodeDeploy automates the deployment process. It manages the complex task of shifting traffic between the blue and green environments, ensuring a smooth transition with minimal user impact.

Preparing to Launch: Setting Up and Running Your Terraform Code

Now that we've explored the significance of each component in our Terraform setup, let's move into the practical steps of initializing and applying our Terraform configuration. But first, we need to set up our AWS credentials to ensure Terraform can interact with our AWS resources.

Setting Up AWS Credentials for Terraform

  1. AWS CLI Configuration: If you haven't already, install the AWS CLI and configure it with your credentials. This can be done using the aws configure command, which will prompt you for your Access Key, Secret Key, and preferred AWS region.

  2. Environment Variables: Alternatively, you can set your AWS credentials as environment variables. This method is often preferred for temporary credentials or when working on multiple projects.

export AWS_ACCESS_KEY_ID="your-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret-access-key"
export AWS_DEFAULT_REGION="us-east-1"
Enter fullscreen mode Exit fullscreen mode
  1. Running the Terraform Code: This command initializes your Terraform project, setting up the necessary plugins and modules.
terraform init
Enter fullscreen mode Exit fullscreen mode

4.Apply Terraform Configuration: Review the execution plan and confirm the changes if everything looks correct.

terraform apply
Enter fullscreen mode Exit fullscreen mode

Testing Your Deployment

After applying your Terraform configurations:

  1. Retrieve the ALB DNS name from the Terraform output.
  2. Open a web browser and enter the ALB DNS name.
  3. You should see the NGINX welcome page or your application's specific content.
  4. This validates the successful deployment of your ECS service and proper configuration of your ALB.

Validating Blue/Green Deployment with a New Task Definition

To test the blue/green deployment process, we'll introduce a new task definition using the httpd image, and then deploy it using AWS CodeDeploy. This step is crucial to confirm that our setup properly handles the transition between two different application versions.

Creating a New Task Definition with httpd Image
We'll create a new task definition, this time using the httpd image. This represents a new version of your application. Here's the JSON for the new task definition:

{
    "family": "my-ecs-task",
    "networkMode": "awsvpc",
    "containerDefinitions": [
        {
            "name": "my-container",
            "image": "httpd:latest",
            "cpu": 10,
            "memory": 512,
            "portMappings": [
                {
                    "containerPort": 80,
                    "hostPort": 80
                }
            ]
        }
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "256",
    "memory": "512"
}
Enter fullscreen mode Exit fullscreen mode

Registering the Task Definition with AWS ECS
Once you have defined your new task, you need to register it with ECS. Use the AWS CLI to register the task definition:

aws ecs register-task-definition --cli-input-json file://path-to-your-task-definition.json
Enter fullscreen mode Exit fullscreen mode

Deploying with AWS CodeDeploy
Next, initiate a deployment using AWS CodeDeploy. We'll specify the new task definition in the appspec.yaml file.

Example appspec.yaml:

applicationName: 'my-codedeploy-app'
deploymentGroupName: 'my-codedeploy-group'
revision:
  revisionType: AppSpecContent
  appSpecContent:
    content: |
      version: 0.0
      Resources:
        - TargetService:
            Type: AWS::ECS::Service
            Properties:
              TaskDefinition: "[YOUR_LATEST_TASK_DEFINITION]" # Replace with your new task definition ARN
              LoadBalancerInfo:
                ContainerName: "my-container"
                ContainerPort: 80

Enter fullscreen mode Exit fullscreen mode

Be sure to replace the task definition ARN with the one for your new httpd based task.

Running the Deployment
Deploy using the AWS CLI:

aws deploy create-deployment --cli-input-yaml file://appspec.yaml
Enter fullscreen mode Exit fullscreen mode

Verifying the Deployment
Monitor the deployment in the AWS CodeDeploy console. Once it completes, visit the ALB DNS link again. You should now see the httpd welcome page, confirming the successful blue/green deployment with the new version of your application.

Image description

Performance and Cost Considerations

When implementing blue/green deployments, it's crucial to consider the potential impact on performance and costs:

Resource Utilization: Blue/green deployment requires running two environments simultaneously, which can increase resource usage and associated costs.
Optimization Strategies: Employ strategies like auto-scaling and choosing the right instance types to optimize resource utilization and control costs.

Real-world Application Example

Imagine a scenario where a fintech company uses this setup for their customer-facing application. By employing blue/green deployment, they can seamlessly update their application with zero downtime, ensuring continuous service for their users.

GitHub Repository: Access the Code

To help you better understand and implement the concepts discussed in this post, I've made the Terraform code available in a GitHub repository. This repository includes all the scripts and configurations you need to set up your own blue/green deployment on AWS ECS using Terraform.

Explore the Code:
Visit GitHub Repository Link to access the code and get started. Feel free to clone the repository, explore the code, and adapt it to your specific needs.

Conclusion and Recap
In this post, we explored the importance of blue/green deployments and walked through setting one up using AWS ECS and Terraform. This approach minimizes downtime, reduces risks, and ensures consistency in production environments.

Happy coding! 😊

Top comments (0)