loading...
Lineup Ninja

Deploying Hasura on AWS with Fargate, RDS and Terraform

elgordino profile image Gordon Johnston Updated on ・8 min read

Hasura is a awesome GraphQL gateway for Postgres. You can get going really simply on Heroku but if you're looking to deploy onto AWS with a fully automated deploy this post will guide you though one possible method.

When deploying in AWS it is strongly recommended to deploy across multiple Availability Zones (AZs), this ensures that if one AZ fails your service should only suffer a brief interruption rather than being down until the AZ is restored.

The components used in this deployment are are:

  • Postgres RDS Database deployed in 'Multi-AZ'
  • Hasura deployed in Fargate across multiple AZ's
  • ALB Load balancing between the Hasura tasks
  • Certificate issued by ACM for securing traffic to the ALB.
  • Logging for RDS, ECS and ALB into Cloudwatch Logs.

This is the architecture we will build:

ECS Fargate Hasura RDS Diagram

You could use Cloudformation to build this but I selected Terraform for various reasons, not least the ability to do terraform plan.

BTW if you're just getting started out with Fargate then start with experimenting in the web admin console, it takes care of a lot of the complexity below, such as creating service roles, IAM permissions, log groups etc. When you want to automate things come back and dive in to the detail below.

Before you can configure ECS resources in an AWS account it must have the AWSServiceRoleForECS IAM role created in the account. If you have manually created a cluster in the web console then this will have been created for you. You can import it into your Terraform configuration if you want to manage it with Terraform.

It's important to note that the AWSServiceRoleForECS can only exist once per account (it does not support service role suffixes), so if you are deploying multiple Hasura stacks in one AWS account then the Terraform for the service role will need to live independently from the main stack.

Create the role like this

# Service role allowing AWS to manage resources required for ECS
resource "aws_iam_service_linked_role" "ecs_service" {
  aws_service_name = "ecs.amazonaws.com"
}

Before diving into the infrastructure components, some variables are required

# Which region to deploy to
variable "region" { }
# Which domain to use. Service will be deployed at hasura.domain
variable "domain" { }
# The access key to secure hasura with. For admin access
variable "hasura_access_key" { }
# The secret shared HMAC key for JWT authentication
variable "hasura_jwt_hmac_key" { }
# User name for RDS
variable "rds_username" { }
# Password for RDS
variable "rds_password" { }
# The DB name in the RDS instance. Note that this cannot contain -'s
variable "rds_db_name" { }
# The size of RDS instance, eg db.t2.micro
variable "rds_instance" { }
# How many AZ's to create in the VPC
variable "az_count" { default = 2 }
# Whether to deploy RDS and ECS in multi AZ mode or not
variable "multi_az" { default = true }

Next we will create a certificate for the ALB. If you are going to be regularly deleting and recreating your stack, say for a dev environment, then it is a good idea to create the certificate in a separate Terraform stack so that it is not destroyed and recreated each time. New AWS accounts have a default limit of 20 certificates per year so it's easy to accidentally exhaust this. It can be increased on request but, in my experience, it takes a day or two to go through.

If you're using Route 53 you can automatically have your ACM certificate validated, this is the easiest way to have a fully automated workflow. Alternatively if Terraform has support for your DNS provider you can have it add the DNS record there.

Create the certificate:

resource "aws_acm_certificate" "hasura" {
  domain_name       = "hasura.${var.domain}"
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

Validate the certificate

data "aws_route53_zone" "hasura" {
  name         = "${var.domain}."
}

resource "aws_route53_record" "hasura_validation" {
    depends_on = ["aws_acm_certificate.hasura"]
    name = "${lookup(aws_acm_certificate.hasura.domain_validation_options[0], "resource_record_name")}"
    type = "${lookup(aws_acm_certificate.hasura.domain_validation_options[0], "resource_record_type")}"
    zone_id = "${data.aws_route53_zone.hasura.zone_id}"
    records = ["${lookup(aws_acm_certificate.hasura.domain_validation_options[0], "resource_record_value")}"]
    ttl = 300
}

resource "aws_acm_certificate_validation" "hasura" {
    certificate_arn = "${aws_acm_certificate.hasura.arn}"
    validation_record_fqdns = ["${aws_route53_record.hasura_validation.*.fqdn}" ]
}

Ok now we can crack on with the body of the infrastructure.

First we need a VPC to put this infrastructure in. We will create one private subnet for RDS and a public subnet for ECS. The ECS tasks have been placed in a public subnet so they can fetch the Hasura image from docker hub. If you place them in a private subnet you will need to add a NAT gateway to enable them to pull their images.


### VPC

# Fetch AZs in the current region
data "aws_availability_zones" "available" {}

resource "aws_vpc" "hasura" {
  cidr_block = "172.17.0.0/16"
}

# Create var.az_count private subnets for RDS, each in a different AZ
resource "aws_subnet" "hasura_rds" {
  count             = "${var.az_count}"
  cidr_block        = "${cidrsubnet(aws_vpc.hasura.cidr_block, 8, count.index)}"
  availability_zone = "${data.aws_availability_zones.available.names[count.index]}"
  vpc_id            = "${aws_vpc.hasura.id}"
}

# Create var.az_count public subnets for Hasura, each in a different AZ
resource "aws_subnet" "hasura_ecs" {
  count                   = "${var.az_count}"
  cidr_block              = "${cidrsubnet(aws_vpc.hasura.cidr_block, 8, var.az_count + count.index)}"
  availability_zone       = "${data.aws_availability_zones.available.names[count.index]}"
  vpc_id                  = "${aws_vpc.hasura.id}"
  map_public_ip_on_launch = true
}

# IGW for the public subnet
resource "aws_internet_gateway" "hasura" {
  vpc_id = "${aws_vpc.hasura.id}"
}

# Route the public subnet traffic through the IGW
resource "aws_route" "internet_access" {
  route_table_id         = "${aws_vpc.hasura.main_route_table_id}"
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = "${aws_internet_gateway.hasura.id}"
}


Now create some security groups so the ALB can talk to ECS and the ECS tasks can talk to RDS:

# Security Groups

# Internet to ALB
resource "aws_security_group" "hasura_alb" {
  name        = "hasura-alb"
  description = "Allow access on port 443 only to ALB"
  vpc_id      = "${aws_vpc.hasura.id}"

  ingress {
    protocol    = "tcp"
    from_port   = 443
    to_port     = 443
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port = 0
    to_port   = 0
    protocol  = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}
# ALB TO ECS

resource "aws_security_group" "hasura_ecs" {
  name        = "hasura-tasks"
  description = "allow inbound access from the ALB only"
  vpc_id      = "${aws_vpc.hasura.id}"

  ingress {
    protocol        = "tcp"
    from_port       = "8080"
    to_port         = "8080"
    security_groups = ["${aws_security_group.hasura_alb.id}"]
  }

  egress {
    protocol    = "-1"
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# ECS to RDS
resource "aws_security_group" "hasura_rds" {
  name        = "hasura-rds"
  description = "allow inbound access from the hasura tasks only"
  vpc_id      = "${aws_vpc.hasura.id}"

  ingress {
    protocol        = "tcp"
    from_port       = "5432"
    to_port         = "5432"
    security_groups = ["${aws_security_group.hasura_ecs.id}"]
  }

  egress {
    protocol    = "-1"
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Now we can create our RDS instance. It needs a 'subnet group' to place the instance in, we will use the hasura_rds subnet created above.

resource "aws_db_subnet_group" "hasura" {
  name       = "hasura"
  subnet_ids = ["${aws_subnet.hasura_rds.*.id}"]
}

Then create the RDS instance itself

resource "aws_db_instance" "hasura" {
    name                        = "${var.rds_db_name}"
    identifier                  = "hasura"
    username                    = "${var.rds_username}"
    password                    = "${var.rds_password}"
    port                        = "5432"
    engine                      = "postgres"
    engine_version              = "10.5"
    instance_class              = "${var.rds_instance}"
    allocated_storage           = "10"
    storage_encrypted           = false
    vpc_security_group_ids      = ["${aws_security_group.hasura_rds.id}"]
    db_subnet_group_name        = "${aws_db_subnet_group.hasura.name}"
    parameter_group_name        = "default.postgres10"
    multi_az                    = "${var.multi_az}"
    storage_type                = "gp2"
    publicly_accessible         = false
    # snapshot_identifier       = "hasura"
    allow_major_version_upgrade = false
    auto_minor_version_upgrade  = false
    apply_immediately           = true
    maintenance_window          = "sun:02:00-sun:04:00"
    skip_final_snapshot         = false
    copy_tags_to_snapshot       = true
    backup_retention_period     = 7
    backup_window               = "04:00-06:00"
    final_snapshot_identifier   = "hasura"
}

In the configuration above a new RDS instance called hasura will be built. It is possible to have terraform restore the RDS instance from an existing snapshot. You can do this by uncommenting the # snapshot_identifier line. However I would suggest reading this issue before creating instances from snapshots. In short if you create an instance from snapshot you must always include the snapshot_identifier in future runs of the template or it will delete and recreate the instance as new.

Onwards to ECS / Fargate...

Create the ECS cluster

resource "aws_ecs_cluster" "hasura" {
  name = "hasura-cluster"
}

Before we create the Hasura service lets create somewhere for it to log to

resource "aws_cloudwatch_log_group" "hasura" {
  name = "/ecs/hasura"
}

Creating the log group is simple, allowing the ECS tasks to log to it is, like most things IAM, a little more complex!

data "aws_iam_policy_document" "hasura_log_publishing" {
  statement {
    actions = [
      "logs:CreateLogStream",
      "logs:PutLogEvents",
      "logs:PutLogEventsBatch",
    ]
    resources = ["arn:aws:logs:${var.region}:*:log-group:/ecs/hasura:*"]
  }
}

resource "aws_iam_policy" "hasura_log_publishing" {
  name        = "hasura-log-pub"
  path        = "/"
  description = "Allow publishing to cloudwach"

  policy = "${data.aws_iam_policy_document.hasura_log_publishing.json}"
}

data "aws_iam_policy_document" "hasura_assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ecs-tasks.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "hasura_role" {
  name               = "hasura-role"
  path               = "/system/"
  assume_role_policy = "${data.aws_iam_policy_document.hasura_assume_role_policy.json}"
}


resource "aws_iam_role_policy_attachment" "hasura_role_log_publishing" {
  role = "${aws_iam_role.hasura_role.name}"
  policy_arn = "${aws_iam_policy.hasura_log_publishing.arn}"
}

Then create a task definition. This is where you size your instance and also where you configure the environment properties that are passed to the docker container. Here we are configuring the instance for JWT authentication.

Update the image definition to whichever version you want to run. You will need to update the CORS setting for your application name, or remove it entirely.


resource "aws_ecs_task_definition" "hasura" {
  family                   = "hasura"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "256"
  memory                   = "512"
  execution_role_arn       = "${aws_iam_role.hasura_role.arn}"

  container_definitions = <<DEFINITION
    [
      {
        "image": "hasura/graphql-engine:v1.0.0-alpha34",
        "name": "hasura",
        "networkMode": "awsvpc",
        "portMappings": [
          {
            "containerPort": 8080,
            "hostPort": 8080
          }
        ],
        "logConfiguration": {
          "logDriver": "awslogs",
          "options": {
            "awslogs-group": "/ecs/hasura",
            "awslogs-region": "${var.region}",
            "awslogs-stream-prefix": "ecs"
          }
        },
        "environment": [
          {
            "name": "HASURA_GRAPHQL_ACCESS_KEY",
            "value": "${var.hasura_access_key}"
          },
          {
            "name": "HASURA_GRAPHQL_DATABASE_URL",
            "value": "postgres://${var.rds_username}:${var.rds_password}@${aws_db_instance.hasura.endpoint}/${var.rds_db_name}"
          },
          {
            "name": "HASURA_GRAPHQL_ENABLE_CONSOLE",
            "value": "true"
          },
          {
            "name": "HASURA_GRAPHQL_CORS_DOMAIN",
            "value": "https://app.${var.domain}:443"
          },

          {
            "name": "HASURA_GRAPHQL_PG_CONNECTIONS",
            "value": "100"
          },
          {
            "name": "HASURA_GRAPHQL_JWT_SECRET",
            "value": "{\"type\":\"HS256\", \"key\": \"${var.hasura_jwt_hmac_key}\"}"
          }
        ]
      }
    ]
DEFINITION

}

Now create the ECS service. If you have set the multi_az property to true it will start 2 tasks. It will automatically distribute tasks evenly over the subnets configured in the service, i.e. both AZ's.

resource "aws_ecs_service" "hasura" {
  depends_on      = ["aws_ecs_task_definition.hasura", "aws_cloudwatch_log_group.hasura"]
  name            = "hasura-service"
  cluster         = "${aws_ecs_cluster.hasura.id}"
  task_definition = "${aws_ecs_task_definition.hasura.arn}"
  desired_count   = "${var.multi_az == true ? "2" : "1"}"
  launch_type     = "FARGATE"

  network_configuration {
    assign_public_ip  = true
    security_groups   = ["${aws_security_group.hasura_ecs.id}"]
    subnets           = ["${aws_subnet.hasura_ecs.*.id}"]
  }

  load_balancer {
    target_group_arn = "${aws_alb_target_group.hasura.id}"
    container_name   = "hasura"
    container_port   = "8080"
  }

  depends_on = [
    "aws_alb_listener.hasura",
  ]
}

Now we have an ECS service and a RDS database, we just need some public access to it, which will be provided by an ALB.

Firstly create somewhere for the ALB to log (if you want logging). Start with an S3 bucket. You can add whatever lifecycle policy you want and also remember that bucket names are globally unique

resource "aws_s3_bucket" "hasura" {
  bucket = "hasura-${var.region}"
  acl    = "private"
}

Add an IAM policy to allow the ALB to log to it. Remember to update the bucket name.

data "aws_elb_service_account" "main" {}

resource "aws_s3_bucket_policy" "hasura" {
  bucket = "${aws_s3_bucket.hasura.id}"

  policy = <<POLICY
{
  "Id": "hasuraALBWrite",
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "hasuraALBWrite",
      "Action": [
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::hasura-${var.region}/alb/*",
      "Principal": {
        "AWS": [
          "${data.aws_elb_service_account.main.arn}"
        ]
      }
    }
  ]
}
POLICY
}

If you have put your ACM certificate in a separate Terraform stack then you will need to import it.

data "aws_acm_certificate" "hasura" {
  domain   = "hasura.${var.domain}"
  types       = ["AMAZON_ISSUED"] 
  most_recent = true
  statuses = ["ISSUED"]
}

Create the ALB itself.

resource "aws_alb" "hasura" {
  name            = "hasura-alb"
  subnets         = ["${aws_subnet.hasura_ecs.*.id}"]
  security_groups = ["${aws_security_group.hasura_alb.id}"]

  access_logs {
    bucket = "${aws_s3_bucket.hasura.id}"
    prefix = "alb"
    enabled = true
  }
}

Then create the target group. ECS will register the tasks with this target group when they stop/start

resource "aws_alb_target_group" "hasura" {
  name        = "hasura-alb"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = "${aws_vpc.hasura.id}"
  target_type = "ip"
  health_check {
    path = "/"
    matcher = "302"
  }
}

Then create the listener. Set the certificate_arn to "${data.aws_acm_certificate.hasura.arn}" if you have imported it.

resource "aws_alb_listener" "hasura" {
  load_balancer_arn = "${aws_alb.hasura.id}"
  port              = "443"
  protocol          = "HTTPS"
  certificate_arn   = "${aws_acm_certificate.hasura.arn}"

  default_action {
    target_group_arn = "${aws_alb_target_group.hasura.id}"
    type             = "forward"
  }
}

Finally create an Route 53 record to point to your ALB

resource "aws_route53_record" "hasura" {
  zone_id = "${data.aws_route53_zone.hasura.zone_id}"
  name    = "hasura.${var.domain}"
  type    = "A"

  alias {
    name                   = "${aws_alb.hasura.dns_name}"
    zone_id                = "${aws_alb.hasura.zone_id}"
    evaluate_target_health = true
  }
}

That completes the terraform config! You should be good to give it a go!

The stack should boot with an empty schema and an Hasura instance listening at https://hasura.domain

Best of luck, feel free to hit me up with any comments or you can find me at @elgordino in the Hasura Discord.


@rayraegah took this post and turned it into a proper Terraform module. If you're wanting to deploy this you should check it out here: https://github.com/Rayraegah/terraform-aws-hasura

Posted on by:

Discussion

markdown guide
 

Hi. I've using your "method" as a guide with python, flask and postgres. I did not use Route53, Hasura or GraphQL. However, I used a NAT gateway with an Elastic IP for each private subnet.
Now, I'm having this error when the service start, any idea? can you help me on this?
pastebin.com/Dx55pr8P

 
 

In your diagram, it seems like there are 2 RDS & Hasura nodes, but in the code it seems like it only describes 1 of each. Is this correct?