Note: This is my first blog post! Any feedback is totally welcome.
You can access the completed code for this blog here.
Introduction
ECS (Elastic Container Service) is AWS's container orchestration service. You can read more about ECS here.
There are two deployment options that can be used, EC2 and Fargate.
With EC2 deployments, you need to manage the number of EC2 instances that are required for your container.
Fargate is a serverless compute engine provided by AWS. This means the servers that your containers are launched on are managed by AWS. All you need to do is specify the CPU and memory your container will use and AWS will provision an appropriate amount of compute resource.
In this blog post, I will be using Terraform (v0.13.4). Terraform is one of the most popular Infrastructure as Code tools.
I will be demonstrating how to configure the whole infrastructure in this post. If there is a certain section you are focusing on, feel free to skip to it!
Image
I will be using a sample docker image that uses nginx to show a simple static page. You can find information about the image here. The tag is dockersamples/static-site
.
Let's Start!
Directory Structure
Create the below directory structure, where modules
is an empty directory for now, and leave the files empty.
├── terraform
│ ├── modules
│ ├── main.tf
│ ├── variables.tf
│ ├── backend.tf
Backend Config
To store my terraform state I will be using S3. Open up the backend.tf
file and add the following:
terraform {
backend "s3" {
bucket = "<your-s3-bucket>"
region = "eu-west-1"
key = "terraform.tfstate"
}
}
Make sure you replace <your-s3-bucket>
with the name of your S3 bucket. You can also change the region and the key location if you want.
You can confirm that your configuration is working by initialising the terraform backend. To do this, open your terminal to the terraform directory you created above. When in that directory run terraform init
.
You should see output similar to the screenshot below.
If you have any problems, drop a comment below or take a look at Terraform's documentation on configuring backends here.
VPC
We need to create a VPC to hold all our infrastructure.
Create a directory under the modules directory called vpc
. Put three empty files, called main.tf
, variables.tf
and outputs.tf
in the vpc
directory.
├── terraform
│ ├── modules
│ ├── vpc
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── main.tf
│ ├── variables.tf
│ ├── backend.tf
In the main.tf
file, we want to add a VPC resource, like below.
resource "aws_vpc" "vpc" {
cidr_block = "192.0.0.0/16"
enable_dns_support = true
enable_dns_hostnames = true
tags = {
Name = "dev-to"
Project = "dev-to"
}
}
If you understand CIDR blocks then you can choose your own, just make sure you are consistent with the change throughout the blog! If you don't know much about CIDR blocks, you are probably better sticking with the CIDR blocks I use.
If you are just starting out with AWS, or any cloud platform, it would be beneficial to do some reading about CIDR blocks.
Internet Gateway
We need an internet gateway to give internet access to the load balancer and to the Fargate subnets, so they can download the docker image.
resource "aws_internet_gateway" "internal_gateway" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
Route table
We need a route table to describe the route for certain CIDR blocks. In this post, we will just route all traffic to the internet gateway.
In a production system, for security reasons, you would want two route tables. You would want a public route table that routes traffic to the internet gateway, and a second route table that routes traffic to a NAT Gateway. There is a section on adding a NAT Gateway under the Improvements section below.
resource "aws_route_table" "route_table" {
vpc_id = aws_vpc.vpc.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.internal_gateway.id
}
tags = {
Name = "dev-to"
Project = "dev-to"
}
}
Subnets
I'm going to create six subnets. There will be three subnets for the load balancer and three subnets for the ECS tasks to be placed in.
I use three subnets for each because there will a subnet in each availability zone in the eu-west-1
region. The number of subnets might be different for you, depending on which region you are using.
data "aws_availability_zones" "available" {}
resource "aws_subnet" "elb_a" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.0.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
map_public_ip_on_launch = true
tags = {
Name = "elb-a"
Project = "dev-to"
}
}
resource "aws_subnet" "elb_b" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.1.0/24"
availability_zone = data.aws_availability_zones.available.names[1]
map_public_ip_on_launch = true
tags = {
Name = "elb-b"
Project = "dev-to"
}
}
resource "aws_subnet" "elb_c" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.2.0/24"
availability_zone = data.aws_availability_zones.available.names[2]
map_public_ip_on_launch = true
tags = {
Name = "elb-c"
Project = "dev-to"
}
}
resource "aws_subnet" "ecs_a" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.3.0/24"
availability_zone = data.aws_availability_zones.available.names[0]
map_public_ip_on_launch = true
tags = {
Name = "ecs-a"
Project = "dev-to"
}
}
resource "aws_subnet" "ecs_b" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.4.0/24"
availability_zone = data.aws_availability_zones.available.names[1]
map_public_ip_on_launch = true
tags = {
Name = "ecs-b"
Project = "dev-to"
}
}
resource "aws_subnet" "ecs_c" {
vpc_id = aws_vpc.vpc.id
cidr_block = "192.0.5.0/24"
availability_zone = data.aws_availability_zones.available.names[2]
map_public_ip_on_launch = true
tags = {
Name = "ecs-c"
Project = "dev-to"
}
}
resource "aws_route_table_association" "elb_a" {
subnet_id = aws_subnet.elb_a.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "elb_b" {
subnet_id = aws_subnet.elb_b.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "elb_c" {
subnet_id = aws_subnet.elb_c.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "ecs_a" {
subnet_id = aws_subnet.ecs_a.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "ecs_b" {
subnet_id = aws_subnet.ecs_b.id
route_table_id = aws_route_table.route_table.id
}
resource "aws_route_table_association" "ecs_c" {
subnet_id = aws_subnet.ecs_c.id
route_table_id = aws_route_table.route_table.id
}
Security groups
For our application, we will create two security groups, one for the load balancer and one for the ECS tasks.
resource "aws_security_group" "load_balancer" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "load-balancer"
Project = "dev-to"
}
}
resource "aws_security_group" "ecs_task" {
vpc_id = aws_vpc.vpc.id
tags = {
Name = "ecs-task"
Project = "dev-to"
}
}
We can then assign rules to these security groups. The load balancer will need access from anywhere on ports 80
and 443
. The ECS task security group needs to allow traffic from the load balancer to the port that the docker container will run on. The default port for the image we are using is 80
.
resource "aws_security_group_rule" "ingress_load_balancer_http" {
from_port = 80
protocol = "tcp"
security_group_id = aws_security_group.load_balancer.id
to_port = 80
cidr_blocks = [
"0.0.0.0/0"]
type = "ingress"
}
resource "aws_security_group_rule" "ingress_load_balancer_https" {
from_port = 443
protocol = "tcp"
security_group_id = aws_security_group.load_balancer.id
to_port = 443
cidr_blocks = [
"0.0.0.0/0"]
type = "ingress"
}
resource "aws_security_group_rule" "ingress_ecs_task_elb" {
from_port = 80
protocol = "tcp"
security_group_id = aws_security_group.ecs_task.id
to_port = 80
source_security_group_id = aws_security_group.load_balancer.id
type = "ingress"
}
resource "aws_security_group_rule" "egress_load_balancer" {
type = "egress"
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [
"0.0.0.0/0"]
security_group_id = aws_security_group.load_balancer.id
}
resource "aws_security_group_rule" "egress_ecs_task" {
type = "egress"
from_port = 0
to_port = 65535
protocol = "tcp"
cidr_blocks = [
"0.0.0.0/0"]
security_group_id = aws_security_group.ecs_task.id
}
NACLs
This section isn't necessary but is an added layer of security. It can be confusing for a beginner, so feel free to skip to the outputs section if you want.
Network access control lists (NACLs) control access in and out of the subnets. The thing to remember with NACLs is they're stateless. This means that return traffic must be explicitly allowed by rules.
Let's create an NACL for our load balancer subnets and a different NACL for the ECS task subnets.
resource "aws_network_acl" "load_balancer" {
vpc_id = aws_vpc.vpc.id
subnet_ids = [
aws_subnet.elb_a.id,
aws_subnet.elb_b.id,
aws_subnet.elb_c.id]
}
resource "aws_network_acl" "ecs_task" {
vpc_id = aws_vpc.vpc.id
subnet_ids = [
aws_subnet.ecs_a.id,
aws_subnet.ecs_b.id,
aws_subnet.ecs_c.id]
}
resource "aws_network_acl_rule" "load_balancer_http" {
network_acl_id = aws_network_acl.load_balancer.id
rule_number = 100
egress = false
protocol = "tcp"
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 80
to_port = 80
}
resource "aws_network_acl_rule" "load_balancer_https" {
network_acl_id = aws_network_acl.load_balancer.id
rule_number = 200
egress = false
protocol = "tcp"
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 443
to_port = 443
}
resource "aws_network_acl_rule" "ingress_load_balancer_ephemeral" {
network_acl_id = aws_network_acl.load_balancer.id
rule_number = 300
egress = false
protocol = "tcp"
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 1024
to_port = 65535
}
resource "aws_network_acl_rule" "ecs_task_ephemeral" {
network_acl_id = aws_network_acl.ecs_task.id
rule_number = 100
egress = false
protocol = "tcp"
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 1024
to_port = 65535
}
resource "aws_network_acl_rule" "ecs_task_http" {
network_acl_id = aws_network_acl.ecs_task.id
rule_number = 200
egress = false
protocol = "tcp"
rule_action = "allow"
cidr_block = aws_vpc.vpc.cidr_block
from_port = 80
to_port = 80
}
resource "aws_network_acl_rule" "load_balancer_ephemeral" {
network_acl_id = aws_network_acl.load_balancer.id
rule_number = 100
egress = true
protocol = "tcp"
rule_action = "allow"
from_port = 0
to_port = 65535
cidr_block = "0.0.0.0/0"
}
resource "aws_network_acl_rule" "ecs_task_all" {
network_acl_id = aws_network_acl.ecs_task.id
rule_number = 100
egress = true
protocol = "tcp"
rule_action = "allow"
from_port = 0
to_port = 65535
cidr_block = "0.0.0.0/0"
}
Outputs
We then need to output the resources we have created in vpc/outputs.tf
.
output "vpc" {
value = aws_vpc.vpc
}
output "load_balancer_subnet_a" {
value = aws_subnet.elb_a
}
output "load_balancer_subnet_b" {
value = aws_subnet.elb_b
}
output "load_balancer_subnet_c" {
value = aws_subnet.elb_c
}
output "ecs_subnet_a" {
value = aws_subnet.ecs_a
}
output "ecs_subnet_b" {
value = aws_subnet.ecs_b
}
output "ecs_subnet_c" {
value = aws_subnet.ecs_c
}
output "load_balancer_sg" {
value = aws_security_group.load_balancer
}
output "ecs_sg" {
value = aws_security_group.ecs_task
}
Application Load Balancer
We need an application load balancer to route traffic to the ECS tasks and manage the load across all the ECS tasks.
I will also be pointing a Route 53 hosted zone record to the load balancer to use https
and give a better-looking URL. If you don't have a Route 53 hosted zone then don't worry, you can still do this part and only use http
.
Create a directory called elb
under the modules
directory. You should have a directory structure similar to below.
├── terraform
│ ├── modules
│ ├── vpc
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── elb
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── main.tf
│ ├── variables.tf
│ ├── backend.tf
Certificate
You can skip this section if you do not have a Route 53 hosted zone to use.
We need to create a certificate using ACM so that we can enable https
on our load balancer.
To do this we need to add a variable to the variables.tf
file.
variable "hosted_zone_id" {}
This variable will represent the id of the Route 53 hosted zone you want to use.
We then need to create the certificate and validate it using DNS. Terraform also allows you to wait for the validation to complete.
Open up elb/main.tf
and add the following.
data "aws_route53_zone" "selected" {
zone_id = var.hosted_zone_id
}
resource "aws_acm_certificate" "elb_cert" {
domain_name = data.aws_route53_zone.selected.name
validation_method = "DNS"
tags = {
Project = "dev-to"
Billing = "dev-to"
}
}
resource "aws_route53_record" "cert_validation" {
for_each = {
for dvo in aws_acm_certificate.elb_cert.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = data.aws_route53_zone.selected.zone_id
}
resource "aws_acm_certificate_validation" "elb_cert" {
certificate_arn = aws_acm_certificate.elb_cert.arn
validation_record_fqdns = [for record in aws_route53_record.cert_validation : record.fqdn]
}
Make sure you put the correct domain_name
attribute for the certificate. It needs to cover the domain name you intend to use.
Load balancer and listeners
There are a few more variables we need to add in ecs/variables.tf
.
variable "load_balancer_sg" {}
variable "load_balancer_subnet_a" {}
variable "load_balancer_subnet_b" {}
variable "load_balancer_subnet_c" {}
variable "vpc" {}
We can then create the application load balancer.
Open up elb/main.tf
and add the following.
resource "aws_lb" "elb" {
name = "dev-to"
internal = false
load_balancer_type = "application"
security_groups = [
var.load_balancer_sg.id]
subnets = [
var.load_balancer_subnet_a.id,
var.load_balancer_subnet_b.id,
var.load_balancer_subnet_c.id]
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
We now need to create the load balancer listeners. This section will depend on whether you are using a Route 53 hosted zone or not.
With Hosted Zone
resource "aws_lb_target_group" "ecs" {
name = "ecs"
port = 80
protocol = "HTTP"
vpc_id = var.vpc.id
target_type = "ip"
health_check {
enabled = true
interval = 300
path = "/"
timeout = 60
matcher = "200"
healthy_threshold = 5
unhealthy_threshold = 5
}
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
resource "aws_lb_listener" "https" {
load_balancer_arn = aws_lb.elb.arn
port = "443"
protocol = "HTTPS"
ssl_policy = "ELBSecurityPolicy-2016-08"
certificate_arn =aws_acm_certificate_validation.elb_cert.certificate_arn
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.ecs.arn
}
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.elb.arn
port = "80"
protocol = "HTTP"
default_action {
type = "redirect"
redirect {
port = "443"
protocol = "HTTPS"
status_code = "HTTP_301"
}
}
}
Make sure the port in the target group resource matches your application port.
Without Hosted Zone
resource "aws_lb_target_group" "ecs" {
name = "ecs"
port = 80
protocol = "HTTP"
vpc_id = var.vpc.id
target_type = "ip"
health_check {
enabled = true
interval = 300
path = "/"
timeout = 60
matcher = "200"
healthy_threshold = 5
unhealthy_threshold = 5
}
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.elb.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.ecs.arn
}
}
Make sure the port in the target group resource matches your application port.
Both
We then need to add the load balancer and target group resources as outputs. Open the elb/outputs.tf
file and add the following.
output "elb" {
value = aws_lb.elb
}
output "ecs_target_group" {
value = aws_lb_target_group.ecs
}
These resources can now be used in other modules. This will be done later on when we bring it all together.
IAM
We need to create an IAM role for the ECS service and task to assume.
Create an iam
directory under the modules
directory so your new directory structure is like the following.
├── terraform
│ ├── modules
│ ├── vpc
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── elb
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── iam
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── main.tf
│ ├── variables.tf
│ ├── backend.tf
In the iam/variables.tf
file add the following.
variable "elb" {}
Then open the iam/main.tf
file and add:
resource "aws_iam_role" "ecs_service" {
name = "ecs-service"
assume_role_policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Action": "sts:AssumeRole",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Effect": "Allow"
}
]
}
EOF
}
data "aws_iam_policy_document" "ecs_service_elb" {
statement {
effect = "Allow"
actions = [
"ec2:Describe*"
]
resources = [
"*"
]
}
statement {
effect = "Allow"
actions = [
"elasticloadbalancing:DeregisterInstancesFromLoadBalancer",
"elasticloadbalancing:DeregisterTargets",
"elasticloadbalancing:Describe*",
"elasticloadbalancing:RegisterInstancesWithLoadBalancer",
"elasticloadbalancing:RegisterTargets"
]
resources = [
var.elb.arn
]
}
}
data "aws_iam_policy_document" "ecs_service_standard" {
statement {
effect = "Allow"
actions = [
"ec2:DescribeTags",
"ecs:DeregisterContainerInstance",
"ecs:DiscoverPollEndpoint",
"ecs:Poll",
"ecs:RegisterContainerInstance",
"ecs:StartTelemetrySession",
"ecs:UpdateContainerInstancesState",
"ecs:Submit*",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
resources = [
"*"
]
}
}
data "aws_iam_policy_document" "ecs_service_scaling" {
statement {
effect = "Allow"
actions = [
"application-autoscaling:*",
"ecs:DescribeServices",
"ecs:UpdateService",
"cloudwatch:DescribeAlarms",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DeleteAlarms",
"cloudwatch:DescribeAlarmHistory",
"cloudwatch:DescribeAlarms",
"cloudwatch:DescribeAlarmsForMetric",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics",
"cloudwatch:PutMetricAlarm",
"cloudwatch:DisableAlarmActions",
"cloudwatch:EnableAlarmActions",
"iam:CreateServiceLinkedRole",
"sns:CreateTopic",
"sns:Subscribe",
"sns:Get*",
"sns:List*"
]
resources = [
"*"
]
}
}
resource "aws_iam_policy" "ecs_service_elb" {
name = "dev-to-elb"
path = "/"
description = "Allow access to the service elb"
policy = data.aws_iam_policy_document.ecs_service_elb.json
}
resource "aws_iam_policy" "ecs_service_standard" {
name = "dev-to-standard"
path = "/"
description = "Allow standard ecs actions"
policy = data.aws_iam_policy_document.ecs_service_standard.json
}
resource "aws_iam_policy" "ecs_service_scaling" {
name = "dev-to-scaling"
path = "/"
description = "Allow ecs service scaling"
policy = data.aws_iam_policy_document.ecs_service_scaling.json
}
resource "aws_iam_role_policy_attachment" "ecs_service_elb" {
role = aws_iam_role.ecs_service.name
policy_arn = aws_iam_policy.ecs_service_elb.arn
}
resource "aws_iam_role_policy_attachment" "ecs_service_standard" {
role = aws_iam_role.ecs_service.name
policy_arn = aws_iam_policy.ecs_service_standard.arn
}
resource "aws_iam_role_policy_attachment" "ecs_service_scaling" {
role = aws_iam_role.ecs_service.name
policy_arn = aws_iam_policy.ecs_service_scaling.arn
}
We can then output the role by adding the following to iam/outputs.tf
.
output "ecs_role" {
value = aws_iam_role.ecs_service
}
ECS
We can now create the ECS cluster, service, and task.
Add an ecs
directory under the modules
directory, and create the main.tf
, variables.tf
and outputs.tf
under the ecs
directory.
Add the following variables in ecs/variables.tf
.
variable "ecs_target_group" {}
variable "ecs_subnet_a" {}
variable "ecs_subnet_b" {}
variable "ecs_subnet_c" {}
variable "ecs_sg" {}
variable "ecs_role" {}
You can then add the following to ecs/main.tf
.
resource "aws_ecs_cluster" "dev_to" {
name = "dev-to"
capacity_providers = [
"FARGATE"]
setting {
name = "containerInsights"
value = "enabled"
}
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
resource "aws_ecs_task_definition" "dev_to" {
family = "dev-to"
container_definitions = <<TASK_DEFINITION
[
{
"portMappings": [
{
"hostPort": 80,
"protocol": "tcp",
"containerPort": 80
}
],
"cpu": 512,
"environment": [
{
"name": "AUTHOR",
"value": "Kieran"
}
],
"memory": 1024,
"image": "dockersamples/static-site",
"essential": true,
"name": "site"
}
]
TASK_DEFINITION
network_mode = "awsvpc"
requires_compatibilities = [
"FARGATE"]
memory = "1024"
cpu = "512"
execution_role_arn = var.ecs_role.arn
task_role_arn = var.ecs_role.arn
tags = {
Name = "dev-to"
Project = "dev-to"
Billing = "dev-to"
}
}
resource "aws_ecs_service" "dev_to" {
name = "dev-to"
cluster = aws_ecs_cluster.dev_to.id
task_definition = aws_ecs_task_definition.dev_to.arn
desired_count = 1
launch_type = "FARGATE"
platform_version = "1.4.0"
lifecycle {
ignore_changes = [
desired_count]
}
network_configuration {
subnets = [
var.ecs_subnet_a.id,
var.ecs_subnet_b.id,
var.ecs_subnet_c.id]
security_groups = [
var.ecs_sg.id]
assign_public_ip = true
}
load_balancer {
target_group_arn = var.ecs_target_group.arn
container_name = "site"
container_port = 80
}
}
There isn't much memory or CPU requirements for this example as it is only a static site. Make sure you adjust the cpu
and memory
attributes if you are using your own image.
If you are using a different image to the one used in this example, make sure you change any references to port 80
so it is correct for your application.
Adding the lifecycle
block to ignore changes to desired_count
is important as the desired_count
attribute will be handled by the auto-scaling policies.
We need to output the cluster and service resources for use in other modules. Add the following to ecs/outputs.tf
.
output "ecs_cluster" {
value = aws_ecs_cluster.dev_to
}
output "ecs_service" {
value = aws_ecs_service.dev_to
}
Auto-scaling
With everything we have created so far, we would have a working ECS cluster that is running a static site. The problem would be it wouldn't scale. If there was a large increase in the number of requests, the application could fail.
In this section, I will explain how to apply auto-scaling to your ECS service. There are three types of auto-scaling that can be applied to an ECS service:
- Target Tracking Scaling Policies
- Step Scaling Policies
- Scheduled Scaling
This post will demonstrate the target tracking scaling type. You can read more about all three types here.
Create an auto-scaling
directory under the modules
directory and create the main.tf
, variables.tf
and outputs.tf
files in this directory.
Add the following to auto-scaling/variables.tf
.
variable "ecs_cluster" {}
variable "ecs_service" {}
After adding the above variables, open auto-scaling/main.tf
and add the following.
resource "aws_appautoscaling_target" "dev_to_target" {
max_capacity = 5
min_capacity = 1
resource_id = "service/${var.ecs_cluster.name}/${var.ecs_service.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "dev_to_memory" {
name = "dev-to-memory"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.dev_to_target.resource_id
scalable_dimension = aws_appautoscaling_target.dev_to_target.scalable_dimension
service_namespace = aws_appautoscaling_target.dev_to_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageMemoryUtilization"
}
target_value = 80
}
}
resource "aws_appautoscaling_policy" "dev_to_cpu" {
name = "dev-to-cpu"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.dev_to_target.resource_id
scalable_dimension = aws_appautoscaling_target.dev_to_target.scalable_dimension
service_namespace = aws_appautoscaling_target.dev_to_target.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 60
}
}
ECS provides the two metrics, ECSServiceAverageCPUUtilization
and ECSServiceAverageMemoryUtilization
, which you can use for the auto-scaling policies.
You can view the Terraform docs for more information about the aws_appautoscaling_policy
resource, including how you can utilise other metrics.
You can obviously play around with the values used above, such as max_capacity
, min_capacity
, and target_value
.
Creating these resources will make Terraform create four cloudwatch alarms, two for each auto-scaling policy. There will be an alarm for scaling up, and an alarm for scaling down again.
Route 53
Do not do this section if you do not have a hosted zone available.
Lastly, we need to create a Route 53 record to point our domain at the load balancer we created.
Create a route53
directory under the modules
directory and create the main.tf
, variables.tf
and outputs.tf
files in this directory.
Add the following to route53/variables.tf
.
variable "elb" {}
variable "hosted_zone_id" {}
After adding these variables, add the following to route53/main.tf
.
data "aws_route53_zone" "selected" {
zone_id = var.hosted_zone_id
}
resource "aws_route53_record" "dev_to" {
zone_id = data.aws_route53_zone.selected.zone_id
name = data.aws_route53_zone.selected.name
type = "A"
alias {
name = var.elb.dns_name
zone_id = var.elb.zone_id
evaluate_target_health = true
}
}
Bringing it all together
If you don't have a hosted zone, remove any reference to hosted_zone_id
below
Now we have all the separate modules configured, we can add them all to the terraform/main.tf
file. This is the entry point when you create a Terraform plan for your infrastructure.
Before doing this, we need to define the global variables that need to be passed in as part of the plan. Add the following to terraform/variables.tf
.
variable "region" {
default = "eu-west-1"
type = string
description = "The region you want to deploy the infrastructure in"
}
variable "hosted_zone_id" {
type = string
description = "The id of the hosted zone of the Route 53 domain you want to use"
}
After adding the variables, add the following to terraform/main.tf
.
provider "aws" {
version = "~> 3.0"
region = var.region
}
module "vpc" {
source = "./modules/vpc"
}
module "elb" {
source = "./modules/elb"
hosted_zone_id = var.hosted_zone_id
load_balancer_sg = module.vpc.load_balancer_sg
load_balancer_subnet_a = module.vpc.load_balancer_subnet_a
load_balancer_subnet_b = module.vpc.load_balancer_subnet_b
load_balancer_subnet_c = module.vpc.load_balancer_subnet_c
vpc = module.vpc.vpc
}
module "iam" {
source = "./modules/iam"
elb = module.elb.elb
}
module "ecs" {
source = "./modules/ecs"
ecs_role = module.iam.ecs_role
ecs_sg = module.vpc.ecs_sg
ecs_subnet_a = module.vpc.ecs_subnet_a
ecs_subnet_b = module.vpc.ecs_subnet_b
ecs_subnet_c = module.vpc.ecs_subnet_c
ecs_target_group = module.elb.ecs_target_group
}
module "auto_scaling" {
source = "./modules/auto-scaling"
ecs_cluster = module.ecs.ecs_cluster
ecs_service = module.ecs.ecs_service
}
module "route53" {
source = "./modules/route53"
elb = module.elb.elb
hosted_zone_id = var.hosted_zone_id
}
You can see how you can reference the outputs of one module to be used as a variable in another module. Terraform will automatically resolve dependencies when a resource is referenced by a module.
Building the infrastructure
Make sure you have credentials configured for your AWS account using one of the methods supported by Terraform. You can find how to configure credentials here.
Run the following commands from the terraform
directory to build the infrastructure:
terraform init
With Hosted Zone
terraform plan -out=plan -var="hosted_zone_id=<your-hosted-zone-id>" -var="region=eu-west-1"
Make sure you replace <your-hosted-zone-id>
with your hosted zone id in the command above.
Without Hosted Zone
terraform plan -out=plan -var="region=eu-west-1"
Both
terraform apply plan
When this command has executed, everything should be up and running! To check it is working, you can either visit the domain associated with your hosted zone or go to the DNS name associated with your load balancer.
To destroy everything that has been created, you can run:
terraform plan -out=plan -destroy -var="hosted_zone_id=<your-hosted-zone-id>" -var="region=eu-west-1"
terraform apply plan
Adding the -destroy
parameter tells Terraform to create a plan to tear down the infrastructure.
Improvements
There are a few improvements we can make to the infrastructure. I will explain what they are but I will leave you to work out how to do it using Terraform.
Add logging
If an error occurs inside your ECS task, you currently wouldn't be able to see what has gone wrong.
You can create a Cloudwatch log group and tell ECS to use this log group to store the logs from the docker containers.
When you have created the log group, you can add the following to your task definition to make ECS use the new log group.
"logConfiguration": {
"logDriver": "awslogs",
"secretOptions": null,
"options": {
"awslogs-group": "${var.ecs_dashboard_log_group.name}",
"awslogs-region": "eu-west-1",
"awslogs-stream-prefix": "ecs"
}
}
Use a NAT Gateway
Our ECS subnets currently have a route to the internet using the internet gateway we created. The downside of this is internet gateways also allow inbound traffic. This means we have to make sure our security groups and NACL rules are correct or we risk exposing our application more than we want.
A more secure way to give your application access to the internet is to use a NAT Gateway. The benefit of a NAT Gateway is they only allow outbound access. In our case outbound access is all we need to get access to DockerHub.
The reason I didn't demonstrate using a NAT Gateway is that they are expensive, especially when you want a highly available setup. In our infrastructure, we would need three NAT Gateways, one for each availability zone.
ECR
The reason we need access to the internet is so we can pull the docker image from DockerHub. We can eliminate that requirement by storing our images in ECR. ECR is AWS's own container registry.
The benefit of using ECR is you can use a VPC Endpoint. This essentially gives you access to the ECR service without access to the internet, so you can remove the internet gateway from the ECS subnets.
A drawback to using ECR is if you don't control where the image is stored. If you are using a third-party image, like in this example, you would be responsible for keeping it updated in ECR.
I hope this has been useful. If you have any issues or feedback, drop a comment below!
Top comments (15)
Hey, pretty good write up. I have only couple of suggestions:
Thanks for the feedback!
I tend to add a "Billing" tag to the resources so then they can be grouped in Cost Explorer. I will add that in!
I always seem to forget about variable descriptions and types!
I did mention the cost aspect of NAT gateways but I will make sure it's clearer.
I was so hesitant about putting in NACLs! I think everyone has been burned by NACLs at some point 😁 I will take them out to avoid confusion.
Again thanks for the feedback! I'm new to blogging and trying to make the call on what can be too confusing for people when reading. It's great to have someone else's opinion.
Maybe put a note at the top of the NACLs section just to warn people "You can do this if you want, but the defaults are ok if you're a beginner"?
Yep, totally
Kieran,
I missed reading up about the NAT gateways and its quite clear, hence I edited out my post. Your first post is quite impressive and look forward to reading more. Cheers
Thank you!
Hi, thank you for great post!
I've found a lot of posts related to Terraform, ECS and Fargate, but this one is the best!
Thank you. That means a lot!
There always low alarm for CPU and Memory in the cloudwatch. Is there anyway to prevent low alarm when instance number is 1 ? Thanks.
Hi! Sorry for the late reply. There isn't a way to prevent that but there is a tick box option in the cloud watch alarms panel to hide auto scaling alarms. I hope this helps.
So after a couple of days searching how to exactly do this process, I finally found the gold mine. Thanks a lot, Kieran!! You nailed it.
Thank you for reading! I'm glad it was useful
Its too nice.. I have seen many blogs this is too neat .. we are trying to migrate from EC2 based ECS to Fargate....it took me 2 hours to migrate my entire code to to go with Fargate
I'm a confused as to what you mean by too neat. This is the process I used at the time to create an auto-scaling ECS Fargate service, nothing more, nothing less. If this is too difficult to do in your codebase I would consider refactoring it down to make it easier to manage, then work from there
Seems to have an error on first line starting Load balance explanation:
shouldn't be elb/variables.tf?