DEV Community

Josh Simpson for AWS Community Builders

Posted on • Originally published at man-yells-at.cloud

How I set up a Gatsby blog on AWS

Hi! In this post, I'm going to run through what I did to get my site https://man-yells-at.cloud onto the internet - the intent is to be a useful guide for anybody who's looking to do something similar, but as I'm writing this at the same time as I build everything, it'll likely include a bit of stream-of-consciousness on the process!

The best bit for you is that you get to skip the bits where I called AWS services foul things because I'd configured something wrong, and get right to the bits that work, with some explanations where things might have had me Googling or I had to backtrack on decisions.

The Stack

Gatsby

I won't be focusing too much on this in this post, but to get this built, I followed the Gatsby tutorial for a bit, threw a strop when I realised how much Gatsby loves GraphQL, and then begrudgingly came back when I realised that building much else was going to be a fair bit more work than I wanted to do.

Once I finished the guide I did a couple of modifications so that I could messily cram it all into a blog template theme built by Xiaoying Riley - you can find their templates here, which I'm going to hopefully be able to build and throw into AWS.

At a certain point, I realised that the routing wouldn't work without a plugin called gatsby-plugin-s3, which I gave the following config in my

gatsby-config

{
  resolve: `gatsby-plugin-s3`,
  options: {
    bucketName: "<BUCKET_NAME>",
    protocol: "https",
    hostname: "<DOMAIN_NAME>"
  }
}
Enter fullscreen mode Exit fullscreen mode

AWS

Most of my interactions with AWS are through Terraform at the moment (and then a bit of ClickOps when I'm confused or trying to ascertain the state of a system).

Route53

Route53 is AWS' DNS service, which allows me to manage where my domain points to. By pointing my name servers at it, I can then automate setting up new records via Terraform.

One caveat to doing this is that currently the control planes for Route53 are located in AWS 'us-east-1' zone, which has seen a couple of major outages in the last year. If you have critical applications in production and DNS changes are part of your disaster recovery plan then make sure to consider this whilst building!

S3

AWS' Simple Storage Service serves as a suitable service to situate my very serious site. Whilst primarily advertised as a 'file storage' service, S3 is one of my favourite AWS services because of it's versatility. In this instance, we're going to use it as the place where we host our site, meaning we don't have to spin up and manage any pesky servers. I'm hoping this pleases the Serverless cult crowd.

Cloudfront

Cloudfront is a CDN that helps tie together the entire project by making sure my site is available as close to users as it can be in 'edge locations', and caching that site at those edge locations. This helps me by reducing latency for users, and reducing the amount of requests that need to fetch information from the S3 bucket itself, which is a massive cost saver.


The Guide

In broad steps, I'll be walking through:

  • Managing my externally purchased domain using AWS Route53
  • Hosting a Gatsby site in S3
  • Serving that site from Cloudfront

So let's get into it.

My setup

This is not prescriptive, but if you're following along then this is how I'm doing things:

  • Using Terraform to manage infrastructure, installed via tfenv
    • Using a .tfvars file that I feed into my terraform apply for things like domain etc that might get repeated
  • Running everything in a new AWS Organizations sub account
  • The AWS CLI
  • Authenticating using Granted

Here's what my variables.tf file looks like:

variable domain_name {
  type = string
}

variable bucket_name {
  type = string
}

variable environment {
  type = string
}
Enter fullscreen mode Exit fullscreen mode

I then fill out these records into a .tfvars file and use them as shown here

Managing my domain through AWS

Before setting up this site, I had purchased the domain https://man-yells-at.cloud through Namecheap (as they seem to always be decently priced, making any DNS changes isn't frustrating as hell, and they're familiar).

I could manually point domains at things later on in the guide, but I like being able to manage everything through Terraform, so I'm going to point my Namecheap domain at AWS. In order to do this, I'm going to create a little Terraform module to manage a Route53 Hosted Zone. I set up the provider and remote backend in a main.tf file, and then created a new file -

Route53

route53.tf

resource "aws_route53_zone" "main" {
  name = var.domain_name

  tags = {
    Name = var.bucket_name
    Environment = var.environment
  }
}

output "nameservers" {
  value = aws_route53_zone.main.name_servers
}
Enter fullscreen mode Exit fullscreen mode

And running terraform apply. I included the output so that I'd get something that looks like this:

Terminal showing a set of AWS nameservers as the output of a Terraform apply

I then used these to change the nameservers settings in Namecheap. To check to make sure everything propagated and I used the correct nameservers, I created a TXT record by adding this block:

route53.tf

resource "aws_route53_record" "propagation_check" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "test.${var.domain_name}"
  type    = "TXT"
  ttl     = 300
  records = ["teststring"]
}
Enter fullscreen mode Exit fullscreen mode

And applying again. After a couple of minutes, I ran

dig -t txt test.man-yells-at.cloud
Enter fullscreen mode Exit fullscreen mode

And saw the record I'd just set. So far, so good.

Creating an S3 bucket and uploading my site

I had to come back to this bit after a bit of a reminder as to how some types of site operate in S3. This was initially a bucket with absolutely no public read, where our Cloudfront distribution had the permissions necessary to serve the objects in it - a setup that is generally best practice as it means nobody is interacting directly with the bucket. Unfortunately this isn't feasible without breaking the way some routes work in Gatsby, so I'll write a different post about how to do that securely at some point!

With the above addendum in mind, I switched to making a publicly readable bucket.

S3 Bucket

resource "aws_s3_bucket" "site_bucket" {
  bucket = var.bucket_name

  tags = {
    Name = var.bucket_name
    Environment = var.environment
  }
}

resource "aws_s3_bucket_acl" "site_bucket_acl" {
  bucket = aws_s3_bucket.site_bucket.id
  acl    = "public-read"
}

resource "aws_s3_bucket_website_configuration" "site_config" {
  bucket = aws_s3_bucket.site_bucket.id

  index_document {
    suffix = "index.html"
  }

  error_document {
    key = "/404.html"
  }
}

resource "aws_s3_bucket_public_access_block" "public_access_block" {
  bucket = aws_s3_bucket.site_bucket.id

  block_public_acls       = false
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}

locals {
  s3_origin_id = "myS3Origin"
}
Enter fullscreen mode Exit fullscreen mode

And another terraform apply.

Once the bucket and it's corresponding website configuration was set up, it was a simple matter of running the s3 deploy plugin in Gatsby

Getting this into Cloudfront

Okay so this got out of hand...

This bit is where it got frustrating. At first, as noted earlier - I had intended for the buckets to be locked down and only available via Cloudfront. It quickly became clear due to the way that routing works in Gatsby that this couldn't be the case, and so I slowly had to undo the pieces I had put in place to protect the bucket from direct access.

The most frustrating part of this was the subtle difference (when you're looking at code at least) that if you serve an S3 bucket on it's website endpoint, it changes from an s3_origin_config to a custom_origin_config which uses some different rules. It's a small gripe, but I am a drama queen and I will make a mountain out of this molehill.

Anyway, let's get building

Firstly, I knew I needed a certificate that I could attach to my Cloudfront distribution, so I made that first. Cloudfront only accepts certificates made in their us-east-1 zone, so remember this when you're building. I used a separate aliased Terraform provider, and considering how simple it makes things I used the AWS ACM module to set up the certificate - this handily does the minor lifting of validating the certificate using Route53 for me (but if you want to roll your own, Terraform gives you all the component pieces to do so pretty easily!):

Certificate

certificate.tf

provider "aws" {
  alias  = "us-east-1"
  region = "us-east-1"
}

module "acm" {
  source  = "terraform-aws-modules/acm/aws"
  version = "~> 3.0"

  domain_name  = var.domain_name
  zone_id      = aws_route53_zone.main.zone_id

  subject_alternative_names = [
    "*.${var.domain_name}",
  ]

  providers = {
    aws = aws.us-east-1
  }

  wait_for_validation = true
}
Enter fullscreen mode Exit fullscreen mode

Applied that - this can take slightly longer if the DNS validation doesn't happen straight away - and I had my certificate!

Cloudfront

With the certificate made, I proceeded onto the Cloudfront distribution. This uses the S3 Website Endpoint as an origin (which doesn't accept HTTPS - this revelation took me another short while to figure out as I refreshed to multiple 504 pages). I made sure to include a redirect-to-https in the configuration so that we don't have anybody accessing the site over HTTP:

cloudfront.tf

resource "aws_cloudfront_distribution" "s3_distribution" {
  origin {
    domain_name = aws_s3_bucket.site_bucket.website_endpoint
    origin_id   = local.s3_origin_id

    custom_origin_config {
      http_port = 80
      https_port = 443
      origin_protocol_policy = "http-only"
      origin_ssl_protocols = ["SSLv3", "TLSv1.2", "TLSv1.1", "TLSv1"]
    }
  }

  enabled             = true
  is_ipv6_enabled     = true
  comment             = "Some comment"
  default_root_object = "index.html"

  aliases = [ var.domain_name ]

  default_cache_behavior {
    allowed_methods  = ["GET", "HEAD", "OPTIONS"]
    cached_methods   = ["GET", "HEAD"]
    target_origin_id = local.s3_origin_id

    forwarded_values {
      query_string = false

      cookies {
        forward = "none"
      }
    }

    viewer_protocol_policy = "redirect-to-https"
    min_ttl                = 0
    default_ttl            = 3600
    max_ttl                = 86400
  }

  price_class = "PriceClass_200"

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  tags = {
    Environment = var.environment
  }

  viewer_certificate {
    acm_certificate_arn = module.acm.acm_certificate_arn
    ssl_support_method = "sni-only"
  }
}
Enter fullscreen mode Exit fullscreen mode

Finally, I added one more block to my route53.tf:

route53.tf

resource "aws_route53_record" "site" {
  zone_id = aws_route53_zone.main.zone_id
  name    = var.domain_name
  type    = "A"

  alias {
    name = aws_cloudfront_distribution.s3_distribution.domain_name
    zone_id = aws_cloudfront_distribution.s3_distribution.hosted_zone_id
    evaluate_target_health = false
  }
}
Enter fullscreen mode Exit fullscreen mode

Applying this all took the longest because Cloudfront distributions can take a little while (in their defence, they sort of have to take over the world).

Once the route had propagated... well, if you're reading this on https://man-yells-at.cloud then you're looking at it!

Conclusion

MVP is best

Some bits didn't work out the way I wanted, and that's okay! This is meant to be a rough side project that:

  • gives me the ability to post ✅
  • without having to worry about significant cost or security issues ✅
  • doesn't take long to set up and / or maintain ✅

It's hit the minimum of what I set out to do, and if it becomes infeasible to manage later down the line I can spend a bit of time moving the React bits into their own thing, and and making something a bit more fully formed later on. Maybe it's Lambdas!

Can we get more friendly?

Definitely the biggest issue I saw was between the Cloudfront -> S3 connection. Whilst typical setups are well documented, it took a fair bit of exploration and trial-and-error to get what I'd assume is a fairly common outlier up and running. Maybe it doesn't get documented a lot because there are friendlier platforms for Gatsby out there and I added complication by trying to put it on S3?

Stay tuned for more!

Getting this published once is all well and good, but I'm going to need to automate the deployment. In my next post, I'll build a secure pipeline that authenticates with AWS without needing an access key for deploying changes / new posts on my site.

Top comments (0)