DEV Community

Cover image for Create an AWS ECS Cluster Using Terraform
Tacio Nery
Tacio Nery

Posted on

Create an AWS ECS Cluster Using Terraform

Hey everyone, I'd like to share my experience with Terraform and AWS. In this post I'll describe the resources I used to build a infrastructure on AWS and deploy a NodeJS application on it.

Resources

The application I needed to deploy is a monolithic NodeJS application, so, to deploy and make it scalable I decided to use containers with an autoscaling tool to scale the application based on CPU and Memory usage. To build this environment on AWS I used the services listed below:

  1. VPC and Networking (Subnets, Internet Groups...)
  2. Elastic Container Registry
  3. Elastic Container Service
  4. Application Load Balancer
  5. Auto Scaling
  6. Cloud Watch

Terraform Initial Configuration

The Terraform configuration I used was quite simple. The first step is create a Bucket on AWS S3 to store the Terraform State. It's not required but, it'll make our life easier if someone else needs to maintain this infrastructure. This is the main.tf file with this configuration.

# main.tf | Main Configuration

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "2.70.0"
    }
  }

  backend "s3" {
    bucket = "terraform-state-bucket"
    key    = "state/terraform_state.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region     = var.aws_region
  access_key = var.aws_access_key
  secret_key = var.aws_secret_key
}
Enter fullscreen mode Exit fullscreen mode

The provider section is using some variables. We can define variables in a tfvars. I'll explain it later in this post.

UPDATE: With this initial configuration, just run terraform init.

VPC and Networking

Let's create a VPC and configure some Networking resources we're gonna use further. The sample code bellow will create a VPC.

# vpc.tf | VPC Configuration

resource "aws_vpc" "aws-vpc" {
  cidr_block           = "10.10.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = {
    Name        = "${var.app_name}-vpc"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

For Networking, it is necessary to create Public and Private Subnets within the VPC, also a Internet Gateway and Route Tables for Public Subnets. The sample bellow will create these resources

# networking.tf | Network Configuration

resource "aws_internet_gateway" "aws-igw" {
  vpc_id = aws_vpc.aws-vpc.id
  tags = {
    Name        = "${var.app_name}-igw"
    Environment = var.app_environment
  }

}

resource "aws_subnet" "private" {
  vpc_id            = aws_vpc.aws-vpc.id
  count             = length(var.private_subnets)
  cidr_block        = element(var.private_subnets, count.index)
  availability_zone = element(var.availability_zones, count.index)

  tags = {
    Name        = "${var.app_name}-private-subnet-${count.index + 1}"
    Environment = var.app_environment
  }
}

resource "aws_subnet" "public" {
  vpc_id                  = aws_vpc.aws-vpc.id
  cidr_block              = element(var.public_subnets, count.index)
  availability_zone       = element(var.availability_zones, count.index)
  count                   = length(var.public_subnets)
  map_public_ip_on_launch = true

  tags = {
    Name        = "${var.app_name}-public-subnet-${count.index + 1}"
    Environment = var.app_environment
  }
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.aws-vpc.id

  tags = {
    Name        = "${var.app_name}-routing-table-public"
    Environment = var.app_environment
  }
}

resource "aws_route" "public" {
  route_table_id         = aws_route_table.public.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.aws-igw.id
}

resource "aws_route_table_association" "public" {
  count          = length(var.public_subnets)
  subnet_id      = element(aws_subnet.public.*.id, count.index)
  route_table_id = aws_route_table.public.id
}
Enter fullscreen mode Exit fullscreen mode

Container Registry and ECS Cluster

Now, it's time to create the Container Registry and the ECS Cluster. First let's create the Container Registry with the code bellow:

# ecr.tf | Elastic Container Repository

resource "aws_ecr_repository" "aws-ecr" {
  name = "${var.app_name}-${var.app_environment}-ecr"
  tags = {
    Name        = "${var.app_name}-ecr"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

The ECR is a repository where we're gonna store the Docker Images of the application we want to deploy. It works like the Docker Hub, if you're familiar with Docker. You can build the Docker Image locally and push it to the ECR or use a CI/CD platform to do it.

Now we're going to create the ECS Cluster, Service and Task Definition.
A service is a configuration that enables us to run and maintain a number of tasks simultaneously in a cluster. The containers are defined by a Task Definition that are used to run tasks in a service.

Before we create the ECS Cluster, we need to create an IAM policy to enable the service to pull the image from ECR.

# iam.tf | IAM Role Policies

resource "aws_iam_role" "ecsTaskExecutionRole" {
  name               = "${var.app_name}-execution-task-role"
  assume_role_policy = data.aws_iam_policy_document.assume_role_policy.json
  tags = {
    Name        = "${var.app_name}-iam-role"
    Environment = var.app_environment
  }
}

data "aws_iam_policy_document" "assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role_policy_attachment" "ecsTaskExecutionRole_policy" {
  role       = aws_iam_role.ecsTaskExecutionRole.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}
Enter fullscreen mode Exit fullscreen mode

Now let's create what we need for ECS. First we create the ECS Cluster:

resource "aws_ecs_cluster" "aws-ecs-cluster" {
  name = "${var.app_name}-${var.app_environment}-cluster"
  tags = {
    Name        = "${var.app_name}-ecs"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

I created a Log Group on CloudWatch to get the containers logs.

resource "aws_cloudwatch_log_group" "log-group" {
  name = "${var.app_name}-${var.app_environment}-logs"

  tags = {
    Application = var.app_name
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

I created a Task Definition compatible with AWS FARGATE, I preferred to do so in order to have a better cost of this infrastructure.

data "template_file" "env_vars" {
  template = file("env_vars.json")
}

resource "aws_ecs_task_definition" "aws-ecs-task" {
  family = "${var.app_name}-task"

  container_definitions = <<DEFINITION
  [
    {
      "name": "${var.app_name}-${var.app_environment}-container",
      "image": "${aws_ecr_repository.aws-ecr.repository_url}:latest",
      "entryPoint": [],
      "environment": ${data.template_file.env_vars.rendered},
      "essential": true,
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "${aws_cloudwatch_log_group.log-group.id}",
          "awslogs-region": "${var.aws_region}",
          "awslogs-stream-prefix": "${var.app_name}-${var.app_environment}"
        }
      },
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080
        }
      ],
      "cpu": 256,
      "memory": 512,
      "networkMode": "awsvpc"
    }
  ]
  DEFINITION

  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  memory                   = "512"
  cpu                      = "256"
  execution_role_arn       = aws_iam_role.ecsTaskExecutionRole.arn
  task_role_arn            = aws_iam_role.ecsTaskExecutionRole.arn

  tags = {
    Name        = "${var.app_name}-ecs-td"
    Environment = var.app_environment
  }
}

data "aws_ecs_task_definition" "main" {
  task_definition = aws_ecs_task_definition.aws-ecs-task.family
}
Enter fullscreen mode Exit fullscreen mode

An observation about the Task Definition is that I'm using the Terraform data function to set some environment variables that I defined in a JSON file (it needs an improvement to use AWS EKS or some other way to store secrets).

Ok, now let's create the ECS Service.

resource "aws_ecs_service" "aws-ecs-service" {
  name                 = "${var.app_name}-${var.app_environment}-ecs-service"
  cluster              = aws_ecs_cluster.aws-ecs-cluster.id
  task_definition      = "${aws_ecs_task_definition.aws-ecs-task.family}:${max(aws_ecs_task_definition.aws-ecs-task.revision, data.aws_ecs_task_definition.main.revision)}"
  launch_type          = "FARGATE"
  scheduling_strategy  = "REPLICA"
  desired_count        = 1
  force_new_deployment = true

  network_configuration {
    subnets          = aws_subnet.private.*.id
    assign_public_ip = false
    security_groups = [
      aws_security_group.service_security_group.id,
      aws_security_group.load_balancer_security_group.id
    ]
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.target_group.arn
    container_name   = "${var.app_name}-${var.app_environment}-container"
    container_port   = 8080
  }

  depends_on = [aws_lb_listener.listener]
}
Enter fullscreen mode Exit fullscreen mode

I also defined a Security Group to avoid external connections to the containers.

resource "aws_security_group" "service_security_group" {
  vpc_id = aws_vpc.aws-vpc.id

  ingress {
    from_port       = 0
    to_port         = 0
    protocol        = "-1"
    security_groups = [aws_security_group.load_balancer_security_group.id]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  tags = {
    Name        = "${var.app_name}-service-sg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

Application Load Balancer

The next step is to setup a Load Balancer. As you could notice on the ECS configuration is that there's a reference to a load_balancer on it.

resource "aws_alb" "application_load_balancer" {
  name               = "${var.app_name}-${var.app_environment}-alb"
  internal           = false
  load_balancer_type = "application"
  subnets            = aws_subnet.public.*.id
  security_groups    = [aws_security_group.load_balancer_security_group.id]

  tags = {
    Name        = "${var.app_name}-alb"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

Now let's add a security group for the Load Balancer

resource "aws_security_group" "load_balancer_security_group" {
  vpc_id = aws_vpc.aws-vpc.id

  ingress {
    from_port        = 80
    to_port          = 80
    protocol         = "tcp"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }

  egress {
    from_port        = 0
    to_port          = 0
    protocol         = "-1"
    cidr_blocks      = ["0.0.0.0/0"]
    ipv6_cidr_blocks = ["::/0"]
  }
  tags = {
    Name        = "${var.app_name}-sg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

We also need to create a Load Balancer Target Group, it will relate the Load Balancer with the Containers.

resource "aws_lb_target_group" "target_group" {
  name        = "${var.app_name}-${var.app_environment}-tg"
  port        = 80
  protocol    = "HTTP"
  target_type = "ip"
  vpc_id      = aws_vpc.aws-vpc.id

  health_check {
    healthy_threshold   = "3"
    interval            = "300"
    protocol            = "HTTP"
    matcher             = "200"
    timeout             = "3"
    path                = "/v1/status"
    unhealthy_threshold = "2"
  }

  tags = {
    Name        = "${var.app_name}-lb-tg"
    Environment = var.app_environment
  }
}
Enter fullscreen mode Exit fullscreen mode

One very important thing here is the attribute path within health_check. This is a route on the application that the Load Balancer will use to check the status of the application.

At last let's create a HTTP listener for out Load Balancer.

resource "aws_lb_listener" "listener" {
  load_balancer_arn = aws_alb.application_load_balancer.id
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.target_group.id
  }
}
Enter fullscreen mode Exit fullscreen mode

Autoscaling

So, autoscaling is essential for the application I'm working on. To configure it on AWS I just needed to create an Autoscaling Target and two simple Autoscaling Policies. One to scale by CPU usage and another one for Memory usage.

# autoscaling.tf | Auto Scaling Group

resource "aws_appautoscaling_target" "ecs_target" {
  max_capacity       = 2
  min_capacity       = 1
  resource_id        = "service/${aws_ecs_cluster.aws-ecs-cluster.name}/${aws_ecs_service.aws-ecs-service.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "ecs_policy_memory" {
  name               = "${var.app_name}-${var.app_environment}-memory-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageMemoryUtilization"
    }

    target_value = 80
  }
}

resource "aws_appautoscaling_policy" "ecs_policy_cpu" {
  name               = "${var.app_name}-${var.app_environment}-cpu-autoscaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.ecs_target.resource_id
  scalable_dimension = aws_appautoscaling_target.ecs_target.scalable_dimension
  service_namespace  = aws_appautoscaling_target.ecs_target.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }

    target_value = 80
  }
}
Enter fullscreen mode Exit fullscreen mode

So, the application will scale up if the memory or the cpu usage reaches 80% of usage. When it comes bellow this value, the application will scale down.

Variables

I believe you noticed we used a lot of variables for the Terraform configuration files. To use variables I created a file called variables.tf. This file only have the variables definitions.

# variables.tf | Auth and Application variables

variable "aws_access_key" {
  type        = string
  description = "AWS Access Key"
}

variable "aws_secret_key" {
  type        = string
  description = "AWS Secret Key"
}

variable "aws_region" {
  type        = string
  description = "AWS Region"
}

variable "aws_cloudwatch_retention_in_days" {
  type        = number
  description = "AWS CloudWatch Logs Retention in Days"
  default     = 1
}

variable "app_name" {
  type        = string
  description = "Application Name"
}

variable "app_environment" {
  type        = string
  description = "Application Environment"
}

variable "cidr" {
  description = "The CIDR block for the VPC."
  default     = "10.0.0.0/16"
}

variable "public_subnets" {
  description = "List of public subnets"
}

variable "private_subnets" {
  description = "List of private subnets"
}

variable "availability_zones" {
  description = "List of availability zones"
}
Enter fullscreen mode Exit fullscreen mode

The values for each variable are defined in a file called terraform.tfvars.

aws_region        = "us-east-1"
aws_access_key    = "your aws access key"
aws_secret_key    = "your aws secret key"

# these are zones and subnets examples
availability_zones = ["us-east-1a", "us-east-1b"]
public_subnets     = ["10.10.100.0/24", "10.10.101.0/24"]
private_subnets    = ["10.10.0.0/24", "10.10.1.0/24"]

# these are used for tags
app_name        = "node-js-app"
app_environment = "production"
Enter fullscreen mode Exit fullscreen mode

This file is not committed in my repository. I created it locally and use S3 to manage access and control its versions. It needs some improvements as well that I'll do further.

UPDATE: Now, with all the configuration files properly written, run the command terraform plan to check what changes are going to be done and terraform apply to review and apply the changes.

You can also be asking about the Database. Well, in this project I created a Cluster on MongoCloud and put the credentials on the environment.

The full code can be found on my [Github].(https://github.com/thnery/terraform-aws-template)

Thanks

Thank you for reading this post. I hope it could be useful. If you have any feedback, please, let me know.

Oldest comments (10)

Collapse
 
marcellodesales profile image
Marcello de Sales

Any idea on how to simplify your approach by creating the basics for aws.amazon.com/blogs/containers/au...?

Collapse
 
_farrukh_saeed profile image
Farrukh Saeed

Can someone tell me please, where is the best tutorial for learning Terraform.

Collapse
 
_farrukh_saeed profile image
Farrukh Saeed

Very nice article. Thanks for sharing.

Collapse
 
petarmitrovic profile image
Petar

I found this article very helpful when setting up my ECS cluster.

A several improvements I'd like suggest:

  • use different roles for execution_role_arn and task_role_arn (in order to follow least permissions principle)
  • not store aws_access_key and aws_secret_key in .tfvars
Collapse
 
fomm profile image
foMM

Thank you very much for this article. I got most things working except I am getting an error for the task to pull the ecr image. service call has been retried 3 time(s): RequestError: send request failed caused by: Post https://api.ecr.ap-southeast-2.amazonaws.com/: dial tcp 99.82.184.189:443: i/o timeout

Anyone got the same issue?

Collapse
 
vardyng profile image
Vardy • Edited

You must use us-east-1 to uplaod the image, while pulling image doesn't necessary in us-east-1

Collapse
 
felipeco14 profile image
Felipe Chaves de Oliveira

The problem is that your service/task is in a private subnet, and is not able to access the ECR registry, to solve this you need to create a NAT gateway to provide such access... stackoverflow.com/a/66802973

Collapse
 
softmarshmallow profile image
UZU, J

Github link is dead

Collapse
 
srak profile image
rak

Hi,
Any way to get the env_vars.json file?
Can't access the repository either is no longer avilable.

Cheers

Collapse
 
nmphuong200288 profile image
nmphuong200288

Hi brother.
data "template_file" "env_vars" {
template = file("env_vars.json")
i need the content of env_vars file. I don't know the content of that file also you can share me your github to minhphuong49th@gmail.com ? Thanks. I just start to learn terraform about three months