Kieran Jennings

Posted on Oct 10, 2020

ECS Fargate Service Auto Scaling with Terraform

#aws #terraform #cloud

Note: This is my first blog post! Any feedback is totally welcome.

You can access the completed code for this blog here.

Introduction

ECS (Elastic Container Service) is AWS's container orchestration service. You can read more about ECS here.

There are two deployment options that can be used, EC2 and Fargate.

With EC2 deployments, you need to manage the number of EC2 instances that are required for your container.

Fargate is a serverless compute engine provided by AWS. This means the servers that your containers are launched on are managed by AWS. All you need to do is specify the CPU and memory your container will use and AWS will provision an appropriate amount of compute resource.

In this blog post, I will be using Terraform (v0.13.4). Terraform is one of the most popular Infrastructure as Code tools.

I will be demonstrating how to configure the whole infrastructure in this post. If there is a certain section you are focusing on, feel free to skip to it!

Image

I will be using a sample docker image that uses nginx to show a simple static page. You can find information about the image here. The tag is dockersamples/static-site.

Let's Start!

Directory Structure

Create the below directory structure, where modules is an empty directory for now, and leave the files empty.



├── terraform
│   ├── modules
│   ├── main.tf
│   ├── variables.tf
│   ├── backend.tf

Backend Config

To store my terraform state I will be using S3. Open up the backend.tf file and add the following:



terraform {
  backend "s3" {
    bucket = "<your-s3-bucket>"
    region = "eu-west-1"
    key = "terraform.tfstate"
  }
}

Make sure you replace <your-s3-bucket> with the name of your S3 bucket. You can also change the region and the key location if you want.

You can confirm that your configuration is working by initialising the terraform backend. To do this, open your terminal to the terraform directory you created above. When in that directory run terraform init.

You should see output similar to the screenshot below.

If you have any problems, drop a comment below or take a look at Terraform's documentation on configuring backends here.

VPC

We need to create a VPC to hold all our infrastructure.

Create a directory under the modules directory called vpc. Put three empty files, called main.tf, variables.tf and outputs.tf in the vpc directory.



├── terraform
│   ├── modules
│       ├── vpc
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│   ├── main.tf
│   ├── variables.tf
│   ├── backend.tf

In the main.tf file, we want to add a VPC resource, like below.



resource "aws_vpc" "vpc" {
  cidr_block = "192.0.0.0/16"
  enable_dns_support = true
  enable_dns_hostnames = true

  tags = {
    Name = "dev-to"
    Project = "dev-to"
  }
}

If you understand CIDR blocks then you can choose your own, just make sure you are consistent with the change throughout the blog! If you don't know much about CIDR blocks, you are probably better sticking with the CIDR blocks I use.

If you are just starting out with AWS, or any cloud platform, it would be beneficial to do some reading about CIDR blocks.

Internet Gateway

We need an internet gateway to give internet access to the load balancer and to the Fargate subnets, so they can download the docker image.



resource "aws_internet_gateway" "internal_gateway" {
  vpc_id = aws_vpc.vpc.id
  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

Route table

We need a route table to describe the route for certain CIDR blocks. In this post, we will just route all traffic to the internet gateway.

In a production system, for security reasons, you would want two route tables. You would want a public route table that routes traffic to the internet gateway, and a second route table that routes traffic to a NAT Gateway. There is a section on adding a NAT Gateway under the Improvements section below.



resource "aws_route_table" "route_table" {
  vpc_id = aws_vpc.vpc.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.internal_gateway.id
  }

  tags = {
    Name = "dev-to"
    Project = "dev-to"
  }
}

Subnets

I'm going to create six subnets. There will be three subnets for the load balancer and three subnets for the ECS tasks to be placed in.

I use three subnets for each because there will a subnet in each availability zone in the eu-west-1 region. The number of subnets might be different for you, depending on which region you are using.



data "aws_availability_zones" "available" {}

resource "aws_subnet" "elb_a" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.0.0/24"
  availability_zone = data.aws_availability_zones.available.names[0]
  map_public_ip_on_launch = true
  tags = {
    Name = "elb-a"
    Project = "dev-to"
  }
}

resource "aws_subnet" "elb_b" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.1.0/24"
  availability_zone = data.aws_availability_zones.available.names[1]
  map_public_ip_on_launch = true
  tags = {
    Name = "elb-b"
    Project = "dev-to"
  }
}

resource "aws_subnet" "elb_c" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.2.0/24"
  availability_zone = data.aws_availability_zones.available.names[2]
  map_public_ip_on_launch = true
  tags = {
    Name = "elb-c"
    Project = "dev-to"
  }
}

resource "aws_subnet" "ecs_a" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.3.0/24"
  availability_zone = data.aws_availability_zones.available.names[0]
  map_public_ip_on_launch = true
  tags = {
    Name = "ecs-a"
    Project = "dev-to"
  }
}

resource "aws_subnet" "ecs_b" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.4.0/24"
  availability_zone = data.aws_availability_zones.available.names[1]
  map_public_ip_on_launch = true
  tags = {
    Name = "ecs-b"
    Project = "dev-to"
  }
}

resource "aws_subnet" "ecs_c" {
  vpc_id = aws_vpc.vpc.id
  cidr_block = "192.0.5.0/24"
  availability_zone = data.aws_availability_zones.available.names[2]
  map_public_ip_on_launch = true
  tags = {
    Name = "ecs-c"
    Project = "dev-to"
  }
}

resource "aws_route_table_association" "elb_a" {
  subnet_id = aws_subnet.elb_a.id
  route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "elb_b" {
  subnet_id = aws_subnet.elb_b.id
  route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "elb_c" {
  subnet_id = aws_subnet.elb_c.id
  route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "ecs_a" {
  subnet_id = aws_subnet.ecs_a.id
  route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "ecs_b" {
  subnet_id = aws_subnet.ecs_b.id
  route_table_id = aws_route_table.route_table.id
}

resource "aws_route_table_association" "ecs_c" {
  subnet_id = aws_subnet.ecs_c.id
  route_table_id = aws_route_table.route_table.id
}

Security groups

For our application, we will create two security groups, one for the load balancer and one for the ECS tasks.



resource "aws_security_group" "load_balancer" {
  vpc_id = aws_vpc.vpc.id
  tags = {
    Name = "load-balancer"
    Project = "dev-to"
  }
}

resource "aws_security_group" "ecs_task" {
  vpc_id = aws_vpc.vpc.id
  tags = {
    Name = "ecs-task"
    Project = "dev-to"
  }
}

We can then assign rules to these security groups. The load balancer will need access from anywhere on ports 80 and 443. The ECS task security group needs to allow traffic from the load balancer to the port that the docker container will run on. The default port for the image we are using is 80.



resource "aws_security_group_rule" "ingress_load_balancer_http" {
  from_port = 80
  protocol = "tcp"
  security_group_id = aws_security_group.load_balancer.id
  to_port = 80
  cidr_blocks = [
    "0.0.0.0/0"]
  type = "ingress"
}

resource "aws_security_group_rule" "ingress_load_balancer_https" {
  from_port = 443
  protocol = "tcp"
  security_group_id = aws_security_group.load_balancer.id
  to_port = 443
  cidr_blocks = [
    "0.0.0.0/0"]
  type = "ingress"
}

resource "aws_security_group_rule" "ingress_ecs_task_elb" {
  from_port = 80
  protocol = "tcp"
  security_group_id = aws_security_group.ecs_task.id
  to_port = 80
  source_security_group_id = aws_security_group.load_balancer.id
  type = "ingress"
}

resource "aws_security_group_rule" "egress_load_balancer" {
  type = "egress"
  from_port = 0
  to_port = 65535
  protocol = "tcp"
  cidr_blocks = [
    "0.0.0.0/0"]
  security_group_id = aws_security_group.load_balancer.id
}

resource "aws_security_group_rule" "egress_ecs_task" {
  type = "egress"
  from_port = 0
  to_port = 65535
  protocol = "tcp"
  cidr_blocks = [
    "0.0.0.0/0"]
  security_group_id = aws_security_group.ecs_task.id
}

NACLs

This section isn't necessary but is an added layer of security. It can be confusing for a beginner, so feel free to skip to the outputs section if you want.

Network access control lists (NACLs) control access in and out of the subnets. The thing to remember with NACLs is they're stateless. This means that return traffic must be explicitly allowed by rules.

Let's create an NACL for our load balancer subnets and a different NACL for the ECS task subnets.



resource "aws_network_acl" "load_balancer" {
  vpc_id = aws_vpc.vpc.id
  subnet_ids = [
    aws_subnet.elb_a.id,
    aws_subnet.elb_b.id,
    aws_subnet.elb_c.id]
}

resource "aws_network_acl" "ecs_task" {
  vpc_id = aws_vpc.vpc.id
  subnet_ids = [
    aws_subnet.ecs_a.id,
    aws_subnet.ecs_b.id,
    aws_subnet.ecs_c.id]
}

resource "aws_network_acl_rule" "load_balancer_http" {
  network_acl_id = aws_network_acl.load_balancer.id
  rule_number = 100
  egress = false
  protocol = "tcp"
  rule_action = "allow"
  cidr_block = "0.0.0.0/0"
  from_port = 80
  to_port = 80
}

resource "aws_network_acl_rule" "load_balancer_https" {
  network_acl_id = aws_network_acl.load_balancer.id
  rule_number = 200
  egress = false
  protocol = "tcp"
  rule_action = "allow"
  cidr_block = "0.0.0.0/0"
  from_port = 443
  to_port = 443
}

resource "aws_network_acl_rule" "ingress_load_balancer_ephemeral" {
  network_acl_id = aws_network_acl.load_balancer.id
  rule_number = 300
  egress = false
  protocol = "tcp"
  rule_action = "allow"
  cidr_block = "0.0.0.0/0"
  from_port = 1024
  to_port = 65535
}

resource "aws_network_acl_rule" "ecs_task_ephemeral" {
  network_acl_id = aws_network_acl.ecs_task.id
  rule_number = 100
  egress = false
  protocol = "tcp"
  rule_action = "allow"
  cidr_block = "0.0.0.0/0"
  from_port = 1024
  to_port = 65535
}

resource "aws_network_acl_rule" "ecs_task_http" {
  network_acl_id = aws_network_acl.ecs_task.id
  rule_number = 200
  egress = false
  protocol = "tcp"
  rule_action = "allow"
  cidr_block = aws_vpc.vpc.cidr_block
  from_port = 80
  to_port = 80
}

resource "aws_network_acl_rule" "load_balancer_ephemeral" {
  network_acl_id = aws_network_acl.load_balancer.id
  rule_number = 100
  egress = true
  protocol = "tcp"
  rule_action = "allow"
  from_port = 0
  to_port = 65535
  cidr_block = "0.0.0.0/0"
}

resource "aws_network_acl_rule" "ecs_task_all" {
  network_acl_id = aws_network_acl.ecs_task.id
  rule_number = 100
  egress = true
  protocol = "tcp"
  rule_action = "allow"
  from_port = 0
  to_port = 65535
  cidr_block = "0.0.0.0/0"
}

Outputs

We then need to output the resources we have created in vpc/outputs.tf.



output "vpc" {
  value = aws_vpc.vpc
}

output "load_balancer_subnet_a" {
  value = aws_subnet.elb_a
}

output "load_balancer_subnet_b" {
  value = aws_subnet.elb_b
}

output "load_balancer_subnet_c" {
  value = aws_subnet.elb_c
}

output "ecs_subnet_a" {
  value = aws_subnet.ecs_a
}

output "ecs_subnet_b" {
  value = aws_subnet.ecs_b
}

output "ecs_subnet_c" {
  value = aws_subnet.ecs_c
}

output "load_balancer_sg" {
  value = aws_security_group.load_balancer
}

output "ecs_sg" {
  value = aws_security_group.ecs_task
}

Application Load Balancer

We need an application load balancer to route traffic to the ECS tasks and manage the load across all the ECS tasks.

I will also be pointing a Route 53 hosted zone record to the load balancer to use https and give a better-looking URL. If you don't have a Route 53 hosted zone then don't worry, you can still do this part and only use http.

Create a directory called elb under the modules directory. You should have a directory structure similar to below.



├── terraform
│   ├── modules
│       ├── vpc
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│       ├── elb
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│   ├── main.tf
│   ├── variables.tf
│   ├── backend.tf

Certificate

You can skip this section if you do not have a Route 53 hosted zone to use.

We need to create a certificate using ACM so that we can enable https on our load balancer.

To do this we need to add a variable to the variables.tf file.



variable "hosted_zone_id" {}

This variable will represent the id of the Route 53 hosted zone you want to use.

We then need to create the certificate and validate it using DNS. Terraform also allows you to wait for the validation to complete.

Open up elb/main.tf and add the following.



data "aws_route53_zone" "selected" {
  zone_id = var.hosted_zone_id
}

resource "aws_acm_certificate" "elb_cert" {
  domain_name = data.aws_route53_zone.selected.name
  validation_method = "DNS"

  tags = {
    Project = "dev-to"
    Billing = "dev-to"
  }
}

resource "aws_route53_record" "cert_validation" {
  for_each = {
      for dvo in aws_acm_certificate.elb_cert.domain_validation_options : dvo.domain_name => {
        name = dvo.resource_record_name
        record = dvo.resource_record_value
        type = dvo.resource_record_type
      }
    }

  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.record]
  ttl             = 60
  type            = each.value.type
  zone_id         = data.aws_route53_zone.selected.zone_id
}

resource "aws_acm_certificate_validation" "elb_cert" {
  certificate_arn = aws_acm_certificate.elb_cert.arn
  validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}

Make sure you put the correct domain_name attribute for the certificate. It needs to cover the domain name you intend to use.

Load balancer and listeners

There are a few more variables we need to add in ecs/variables.tf.



variable "load_balancer_sg" {}

variable "load_balancer_subnet_a" {}

variable "load_balancer_subnet_b" {}

variable "load_balancer_subnet_c" {}

variable "vpc" {}

We can then create the application load balancer.

Open up elb/main.tf and add the following.



resource "aws_lb" "elb" {
  name               = "dev-to"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [
    var.load_balancer_sg.id]
  subnets            = [
    var.load_balancer_subnet_a.id,
    var.load_balancer_subnet_b.id,
    var.load_balancer_subnet_c.id]

  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

We now need to create the load balancer listeners. This section will depend on whether you are using a Route 53 hosted zone or not.

With Hosted Zone



resource "aws_lb_target_group" "ecs" {
  name     = "ecs"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc.id
  target_type = "ip"

  health_check {
    enabled             = true
    interval            = 300
    path                = "/"
    timeout             = 60
    matcher             = "200"
    healthy_threshold   = 5
    unhealthy_threshold = 5
  }

  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.elb.arn
  port              = "443"
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-2016-08"
  certificate_arn   =aws_acm_certificate_validation.elb_cert.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.ecs.arn
  }
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.elb.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type = "redirect"

    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

Make sure the port in the target group resource matches your application port.

Without Hosted Zone



resource "aws_lb_target_group" "ecs" {
  name     = "ecs"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc.id
  target_type = "ip"

  health_check {
    enabled             = true
    interval            = 300
    path                = "/"
    timeout             = 60
    matcher             = "200"
    healthy_threshold   = 5
    unhealthy_threshold = 5
  }

  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.elb.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.ecs.arn
  }
}

Make sure the port in the target group resource matches your application port.

Both

We then need to add the load balancer and target group resources as outputs. Open the elb/outputs.tf file and add the following.



output "elb" {
  value = aws_lb.elb
}

output "ecs_target_group" {
  value = aws_lb_target_group.ecs
}

These resources can now be used in other modules. This will be done later on when we bring it all together.

IAM

We need to create an IAM role for the ECS service and task to assume.

Create an iam directory under the modules directory so your new directory structure is like the following.



├── terraform
│   ├── modules
│       ├── vpc
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│       ├── elb
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│       ├── iam
│           ├── main.tf
│           ├── variables.tf
│           ├── outputs.tf
│   ├── main.tf
│   ├── variables.tf
│   ├── backend.tf

In the iam/variables.tf file add the following.



variable "elb" {}

Then open the iam/main.tf file and add:



resource "aws_iam_role" "ecs_service" {
  name = "ecs-service"

  assume_role_policy = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": "sts:AssumeRole",
      "Principal": {
        "Service": "ecs-tasks.amazonaws.com"
      },
      "Effect": "Allow"
    }
  ]
}
EOF
}

data "aws_iam_policy_document" "ecs_service_elb" {
  statement {
    effect = "Allow"

    actions = [
      "ec2:Describe*"
    ]

    resources = [
      "*"
    ]
  }

  statement {
    effect = "Allow"

    actions = [
      "elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
      "elasticloadbalancing:DeregisterTargets",
      "elasticloadbalancing:Describe*",
      "elasticloadbalancing:RegisterInstancesWithLoadBalancer",
      "elasticloadbalancing:RegisterTargets"
    ]

    resources = [
      var.elb.arn
    ]
  }
}

data "aws_iam_policy_document" "ecs_service_standard" {

  statement {
    effect = "Allow"

    actions = [
      "ec2:DescribeTags",
      "ecs:DeregisterContainerInstance",
      "ecs:DiscoverPollEndpoint",
      "ecs:Poll",
      "ecs:RegisterContainerInstance",
      "ecs:StartTelemetrySession",
      "ecs:UpdateContainerInstancesState",
      "ecs:Submit*",
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ]

    resources = [
      "*"
    ]
  }
}

data "aws_iam_policy_document" "ecs_service_scaling" {

  statement {
    effect = "Allow"

    actions = [
      "application-autoscaling:*",
      "ecs:DescribeServices",
      "ecs:UpdateService",
      "cloudwatch:DescribeAlarms",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DeleteAlarms",
      "cloudwatch:DescribeAlarmHistory",
      "cloudwatch:DescribeAlarms",
      "cloudwatch:DescribeAlarmsForMetric",
      "cloudwatch:GetMetricStatistics",
      "cloudwatch:ListMetrics",
      "cloudwatch:PutMetricAlarm",
      "cloudwatch:DisableAlarmActions",
      "cloudwatch:EnableAlarmActions",
      "iam:CreateServiceLinkedRole",
      "sns:CreateTopic",
      "sns:Subscribe",
      "sns:Get*",
      "sns:List*"
    ]

    resources = [
      "*"
    ]
  }
}

resource "aws_iam_policy" "ecs_service_elb" {
  name = "dev-to-elb"
  path = "/"
  description = "Allow access to the service elb"

  policy = data.aws_iam_policy_document.ecs_service_elb.json
}

resource "aws_iam_policy" "ecs_service_standard" {
  name = "dev-to-standard"
  path = "/"
  description = "Allow standard ecs actions"

  policy = data.aws_iam_policy_document.ecs_service_standard.json
}

resource "aws_iam_policy" "ecs_service_scaling" {
  name = "dev-to-scaling"
  path = "/"
  description = "Allow ecs service scaling"

  policy = data.aws_iam_policy_document.ecs_service_scaling.json
}

resource "aws_iam_role_policy_attachment" "ecs_service_elb" {
  role = aws_iam_role.ecs_service.name
  policy_arn = aws_iam_policy.ecs_service_elb.arn
}

resource "aws_iam_role_policy_attachment" "ecs_service_standard" {
  role = aws_iam_role.ecs_service.name
  policy_arn = aws_iam_policy.ecs_service_standard.arn
}

resource "aws_iam_role_policy_attachment" "ecs_service_scaling" {
  role = aws_iam_role.ecs_service.name
  policy_arn = aws_iam_policy.ecs_service_scaling.arn
}

We can then output the role by adding the following to iam/outputs.tf.



output "ecs_role" {
  value = aws_iam_role.ecs_service
}

ECS

We can now create the ECS cluster, service, and task.

Add an ecs directory under the modules directory, and create the main.tf, variables.tf and outputs.tf under the ecs directory.

Add the following variables in ecs/variables.tf.



variable "ecs_target_group" {}

variable "ecs_subnet_a" {}

variable "ecs_subnet_b" {}

variable "ecs_subnet_c" {}

variable "ecs_sg" {}

variable "ecs_role" {}

You can then add the following to ecs/main.tf.



resource "aws_ecs_cluster" "dev_to" {
  name = "dev-to"
  capacity_providers = [
    "FARGATE"]
  setting {
    name = "containerInsights"
    value = "enabled"
  }

  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

resource "aws_ecs_task_definition" "dev_to" {
  family = "dev-to"
  container_definitions = <<TASK_DEFINITION
  [
  {
    "portMappings": [
      {
        "hostPort": 80,
        "protocol": "tcp",
        "containerPort": 80
      }
    ],
    "cpu": 512,
    "environment": [
      {
        "name": "AUTHOR",
        "value": "Kieran"
      }
    ],
    "memory": 1024,
    "image": "dockersamples/static-site",
    "essential": true,
    "name": "site"
  }
]
TASK_DEFINITION

  network_mode = "awsvpc"
  requires_compatibilities = [
    "FARGATE"]
  memory = "1024"
  cpu = "512"
  execution_role_arn = var.ecs_role.arn
  task_role_arn = var.ecs_role.arn

  tags = {
    Name = "dev-to"
    Project = "dev-to"
    Billing = "dev-to"
  }
}

resource "aws_ecs_service" "dev_to" {
  name = "dev-to"
  cluster = aws_ecs_cluster.dev_to.id
  task_definition = aws_ecs_task_definition.dev_to.arn
  desired_count = 1
  launch_type = "FARGATE"
  platform_version = "1.4.0"

  lifecycle {
    ignore_changes = [
      desired_count]
  }

  network_configuration {
    subnets = [
      var.ecs_subnet_a.id,
      var.ecs_subnet_b.id,
      var.ecs_subnet_c.id]
    security_groups = [
      var.ecs_sg.id]
    assign_public_ip = true
  }

  load_balancer {
    target_group_arn = var.ecs_target_group.arn
    container_name = "site"
    container_port = 80
  }
}

There isn't much memory or CPU requirements for this example as it is only a static site. Make sure you adjust the cpu and memory attributes if you are using your own image.

If you are using a different image to the one used in this example, make sure you change any references to port 80 so it is correct for your application.

Adding the lifecycle block to ignore changes to desired_count is important as the desired_count attribute will be handled by the auto-scaling policies.

We need to output the cluster and service resources for use in other modules. Add the following to ecs/outputs.tf.



output "ecs_cluster" {
  value = aws_ecs_cluster.dev_to
}

output "ecs_service" {
  value = aws_ecs_service.dev_to
}

Auto-scaling

With everything we have created so far, we would have a working ECS cluster that is running a static site. The problem would be it wouldn't scale. If there was a large increase in the number of requests, the application could fail.

In this section, I will explain how to apply auto-scaling to your ECS service. There are three types of auto-scaling that can be applied to an ECS service:

Target Tracking Scaling Policies
Step Scaling Policies
Scheduled Scaling

This post will demonstrate the target tracking scaling type. You can read more about all three types here.

Create an auto-scaling directory under the modules directory and create the main.tf, variables.tf and outputs.tf files in this directory.

Add the following to auto-scaling/variables.tf.



variable "ecs_cluster" {}

variable "ecs_service" {}

After adding the above variables, open auto-scaling/main.tf and add the following.



resource "aws_appautoscaling_target" "dev_to_target" {
  max_capacity = 5
  min_capacity = 1
  resource_id = "service/${var.ecs_cluster.name}/${var.ecs_service.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace = "ecs"
}

resource "aws_appautoscaling_policy" "dev_to_memory" {
  name               = "dev-to-memory"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.dev_to_target.resource_id
  scalable_dimension = aws_appautoscaling_target.dev_to_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.dev_to_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    target_value       = 80
  }
}

resource "aws_appautoscaling_policy" "dev_to_cpu" {
  name = "dev-to-cpu"
  policy_type = "TargetTrackingScaling"
  resource_id = aws_appautoscaling_target.dev_to_target.resource_id
  scalable_dimension = aws_appautoscaling_target.dev_to_target.scalable_dimension
  service_namespace = aws_appautoscaling_target.dev_to_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = 60
  }
}

ECS provides the two metrics, ECSServiceAverageCPUUtilization and ECSServiceAverageMemoryUtilization, which you can use for the auto-scaling policies.

You can view the Terraform docs for more information about the aws_appautoscaling_policy resource, including how you can utilise other metrics.

You can obviously play around with the values used above, such as max_capacity, min_capacity, and target_value.

Creating these resources will make Terraform create four cloudwatch alarms, two for each auto-scaling policy. There will be an alarm for scaling up, and an alarm for scaling down again.

Route 53

Do not do this section if you do not have a hosted zone available.

Lastly, we need to create a Route 53 record to point our domain at the load balancer we created.

Create a route53 directory under the modules directory and create the main.tf, variables.tf and outputs.tf files in this directory.

Add the following to route53/variables.tf.



variable "elb" {}

variable "hosted_zone_id" {}

After adding these variables, add the following to route53/main.tf.



data "aws_route53_zone" "selected" {
  zone_id  = var.hosted_zone_id
}

resource "aws_route53_record" "dev_to" {
  zone_id = data.aws_route53_zone.selected.zone_id
  name    = data.aws_route53_zone.selected.name
  type    = "A"

  alias {
    name                   = var.elb.dns_name
    zone_id                = var.elb.zone_id
    evaluate_target_health = true
  }
}

Bringing it all together

If you don't have a hosted zone, remove any reference to hosted_zone_id below

Now we have all the separate modules configured, we can add them all to the terraform/main.tf file. This is the entry point when you create a Terraform plan for your infrastructure.

Before doing this, we need to define the global variables that need to be passed in as part of the plan. Add the following to terraform/variables.tf.



variable "region" {
  default = "eu-west-1"
  type = string
  description = "The region you want to deploy the infrastructure in"
}

variable "hosted_zone_id" {
  type = string
  description = "The id of the hosted zone of the Route 53 domain you want to use"
}

After adding the variables, add the following to terraform/main.tf.



provider "aws" {
  version = "~> 3.0"
  region = var.region
}

module "vpc" {
  source = "./modules/vpc"
}

module "elb" {
  source = "./modules/elb"
  hosted_zone_id = var.hosted_zone_id
  load_balancer_sg = module.vpc.load_balancer_sg
  load_balancer_subnet_a = module.vpc.load_balancer_subnet_a
  load_balancer_subnet_b = module.vpc.load_balancer_subnet_b
  load_balancer_subnet_c = module.vpc.load_balancer_subnet_c
  vpc = module.vpc.vpc
}

module "iam" {
  source = "./modules/iam"
  elb = module.elb.elb
}

module "ecs" {
  source = "./modules/ecs"
  ecs_role = module.iam.ecs_role
  ecs_sg = module.vpc.ecs_sg
  ecs_subnet_a = module.vpc.ecs_subnet_a
  ecs_subnet_b = module.vpc.ecs_subnet_b
  ecs_subnet_c = module.vpc.ecs_subnet_c
  ecs_target_group = module.elb.ecs_target_group
}

module "auto_scaling" {
  source = "./modules/auto-scaling"
  ecs_cluster = module.ecs.ecs_cluster
  ecs_service = module.ecs.ecs_service
}

module "route53" {
  source = "./modules/route53"
  elb = module.elb.elb
  hosted_zone_id = var.hosted_zone_id
}

You can see how you can reference the outputs of one module to be used as a variable in another module. Terraform will automatically resolve dependencies when a resource is referenced by a module.

Building the infrastructure

Make sure you have credentials configured for your AWS account using one of the methods supported by Terraform. You can find how to configure credentials here.

Run the following commands from the terraform directory to build the infrastructure:



terraform init

With Hosted Zone



terraform plan -out=plan -var="hosted_zone_id=<your-hosted-zone-id>" -var="region=eu-west-1"

Make sure you replace <your-hosted-zone-id> with your hosted zone id in the command above.

Without Hosted Zone



terraform plan -out=plan -var="region=eu-west-1"

Both



terraform apply plan

When this command has executed, everything should be up and running! To check it is working, you can either visit the domain associated with your hosted zone or go to the DNS name associated with your load balancer.

To destroy everything that has been created, you can run:



terraform plan -out=plan -destroy -var="hosted_zone_id=<your-hosted-zone-id>" -var="region=eu-west-1"
terraform apply plan

Adding the -destroy parameter tells Terraform to create a plan to tear down the infrastructure.

Improvements

There are a few improvements we can make to the infrastructure. I will explain what they are but I will leave you to work out how to do it using Terraform.

Add logging

If an error occurs inside your ECS task, you currently wouldn't be able to see what has gone wrong.

You can create a Cloudwatch log group and tell ECS to use this log group to store the logs from the docker containers.

When you have created the log group, you can add the following to your task definition to make ECS use the new log group.



"logConfiguration": {

  "logDriver": "awslogs",

  "secretOptions": null,

  "options": {

    "awslogs-group": "${var.ecs_dashboard_log_group.name}",

    "awslogs-region": "eu-west-1",

    "awslogs-stream-prefix": "ecs"

  }

}

Use a NAT Gateway

Our ECS subnets currently have a route to the internet using the internet gateway we created. The downside of this is internet gateways also allow inbound traffic. This means we have to make sure our security groups and NACL rules are correct or we risk exposing our application more than we want.

A more secure way to give your application access to the internet is to use a NAT Gateway. The benefit of a NAT Gateway is they only allow outbound access. In our case outbound access is all we need to get access to DockerHub.

The reason I didn't demonstrate using a NAT Gateway is that they are expensive, especially when you want a highly available setup. In our infrastructure, we would need three NAT Gateways, one for each availability zone.

ECR

The reason we need access to the internet is so we can pull the docker image from DockerHub. We can eliminate that requirement by storing our images in ECR. ECR is AWS's own container registry.

The benefit of using ECR is you can use a VPC Endpoint. This essentially gives you access to the ECR service without access to the internet, so you can remove the internet gateway from the ECS subnets.

A drawback to using ECR is if you don't control where the image is stored. If you are using a third-party image, like in this example, you would be responsible for keeping it updated in ECR.

I hope this has been useful. If you have any issues or feedback, drop a comment below!

Top comments (15)

Sathyajith Bhat • Oct 11 '20 • Edited

Hey, pretty good write up. I have only couple of suggestions:

better tags. I'm glad to see at-least one tag, adding some extra tags (for ex, I add a "used_for" tag for every stack I built) helps me analyze and possible revise stacks for cost savings/analysis
Terraform variable blocks can accept description, default and type constraints modifiers. Description is useful for documentation (alternatively, it prompts the description when used as an input variable), defaults can help set some sane default values and type constraints ensure you enter the right type.
For new folks, I really don't recommend touching the NACLs - the defaults are good and security groups provide a good enough firewall and updating NACLs without understanding the "stateless" mechanism can lead to lot of head-scratching on why something is broken.
Given DockerHub's new rate limiting policies, I think storing images in ECR is definitely worth it Great post overall!

Kieran Jennings • Oct 11 '20

Thanks for the feedback!

I tend to add a "Billing" tag to the resources so then they can be grouped in Cost Explorer. I will add that in!

I always seem to forget about variable descriptions and types!

I did mention the cost aspect of NAT gateways but I will make sure it's clearer.

I was so hesitant about putting in NACLs! I think everyone has been burned by NACLs at some point 😁 I will take them out to avoid confusion.

Again thanks for the feedback! I'm new to blogging and trying to make the call on what can be too confusing for people when reading. It's great to have someone else's opinion.

Tiago Correia • Oct 11 '20

Maybe put a note at the top of the NACLs section just to warn people "You can do this if you want, but the defaults are ok if you're a beginner"?

Sathyajith Bhat • Oct 11 '20

Yep, totally

Sathyajith Bhat • Oct 11 '20

Kieran,

I missed reading up about the NAT gateways and its quite clear, hence I edited out my post. Your first post is quite impressive and look forward to reading more. Cheers

Kieran Jennings • Oct 11 '20

Thank you!

Sergey Podalov • Apr 20 '21

Hi, thank you for great post!
I've found a lot of posts related to Terraform, ECS and Fargate, but this one is the best!

Kieran Jennings • Apr 20 '21

Thank you. That means a lot!

A. Ruiz • Nov 27 '21

So after a couple of days searching how to exactly do this process, I finally found the gold mine. Thanks a lot, Kieran!! You nailed it.

Kieran Jennings • Jan 2 '22

Thank you for reading! I'm glad it was useful

Primadi Setiawan • Sep 13 '22

There always low alarm for CPU and Memory in the cloudwatch. Is there anyway to prevent low alarm when instance number is 1 ? Thanks.

Kieran Jennings • Oct 20 '22

Hi! Sorry for the late reply. There isn't a way to prevent that but there is a tick box option in the cloud watch alarms panel to hide auto scaling alarms. I hope this helps.

Henrique Vilela • Apr 26 '23

Seems to have an error on first line starting Load balance explanation:

There are a few more variables we need to add in ecs/variables.tf.

shouldn't be elb/variables.tf?

Praveen HA • Aug 5 '24

Its too nice.. I have seen many blogs this is too neat .. we are trying to migrate from EC2 based ECS to Fargate....it took me 2 hours to migrate my entire code to to go with Fargate

Kieran Jennings • Aug 19 '24

I'm a confused as to what you mean by too neat. This is the process I used at the time to create an auto-scaling ECS Fargate service, nothing more, nothing less. If this is too difficult to do in your codebase I would consider refactoring it down to make it easier to manage, then work from there

View full discussion (15 comments)