Roko Romic for mklabs

Posted on Jun 17, 2021

Bootstrapping a Startup on AWS with AWS Serverless and Go

#aws #go #terraform #cloudflare

In this article we wanted to share an overview of a recent project to build a logistics platform from scratch using AWS cloud, Go and serverless. Yes, there is already a great abundance of blog posts focusing on all of these individual technologies. What we find often lacks is how to glue all of those together, which is exactly what we describe in this post.
Here is our full code supporting this blog post.

Pre-Requisites

If you want to try to play around with our examples, you'll need to set up those things in case you don't have already set it up:

AWS account
AWS CLI
AWS SAM CLI
Terraform
Docker
Make
AWS VPC
AWS S3 buckets

To be able to completely follow up an article, some basic understanding of those tools and frameworks would be nice, but it's not required.
The concept of this post and examples is to give you an idea how you can set up your own architecture and use those techniques.

Overview - architecture

So basically our architecture consists of:

AWS Lambdas - Go runtime running our business logic
AWS S3 buckets - for hosting frontend and Terraform backend configuration
AWS RDS (Postgres version 12) - instance for storing business related data
AWS DynamoDB - used for Terraform locking mechanism, preventing multiple users to apply simultaneously
AWS SSM - used to connect to our RDS instance within private VPC to be able to monitor and to maintain data
AWS Cognito - used for identity management and authentication
Terraform and AWS SAM (Serverless Application Model) as glue to keep all states synced and up-to-date

From an outside networking point of view, we decided to go with cloudflare as a DNS service. The main benefit of using cloudflare is it’s DDoS protection. The experiences that we had previously with cloudflare were very good, so it was logical to stay with cloudflare. Most of all, adding and maintaining DNS records, we found to be user friendly even for one’s without a networking background.
With this feature on, one thing less to think about while designing a solution. Last awesomeness is that Cloudflare solutions/features (such as automatically setting up SSL certificates for our (sub)domains) are completely free.

The whole project is handled by the GitLab CI/CD pipeline.
You can get free minutes on Gitlab to run your own CI/CD pipelines, when it comes to small projects, this is a fair deal.

You can see in the next picture how we designed a solution for the project.
This illustration represents an overview of the technical solution.

Programming Language

Regarding the programming language choice, Go seemed like an easy choice: low memory footprint (allowing thin run times), fast startup times, static typing, and ease of usage and readability. A lot of work in this platform involved integration with different partners, so the simplicity in Go’s concurrency model was also a big appeal.
Regarding ORM side of things, in the mkops project this was handled by a java 11 backend based on spring boot and hibernate, and go was a pure stateless supporting microservice. This time we had a chance to dive into GORM to model our entities persisted on a Postgres 12.

Serverless model

Due to the nature of most of our projects (related to big data), our default infrastructure goto platform is kubernetes. Here, however, a serverless model for the API provided a quite nice fit.
After reviewing all the workflows that needed to happen, it was clear to us: the logistics platform would be heavy on the background processes that integrate between different platforms that could be started in a cron fashion. The database provided all state locking required, so the processes could run independently and without communicating to each other.
Another argument was the relative low amount of user traffic expected. The cost and administration effort of running a standalone and permanent fargate instance (ECS/EKS) did not add up.
In summary, the motives for our choice are much of the usual suspects: low administration effort, higher security due to ephemeral nature, and lower operating costs.

In order to provide a seamless local development environment AWS provides a cool framework named AWS SAM. It allows you to run AWS Lambdas locally, and, most importantly, with an option to easily create the relevant Service events like API Gateway. So our solution for running things locally and for testing involved Docker with SAM template and of course Localstack for simulating other AWS resources that we were using, such as AWS SES, AWS S3 and AWS SecretManager.

AWS SAM

In the spirit of keeping low a maintenance effort while having a speedy implementation, we decided to start by considering serverless frameworks. The two main options boiled down to AWS SAM or the Serverless Framework.
The main difference between the two is that Serverless is used to deploy FaaS (Function as a Service) functions with support for different providers. SAM, on the other hand, is used specifically for AWS as a cloud provider, deploying not only cloud functions but also the (minimum) underlying infrastructure to expose some business logic with an HTTP Endpoint.

AWS SAM allows us to spin up (most) of the resources it manages locally, including Lambdas and the API Gateway. With this in mind, it made it easier for us to develop and test Lambda functions before deploying them to AWS.

We were fairly set on using AWS as a cloud provider, which helped our decision to go for AWS SAM.

AWS SAM follows a declarative approach for defining the stack and makes use of a command line utility to deploy: resources are declared in one key file called template.yaml. More details on the anatomy of the AWS SAM template here.

Under the hood, AWS SAM is an oversimplified version of CloudFormation albeit very similar. Resources declared in the template.yaml get automatically converted, by the sam cli to CloudFormation so it’s very easy to inspect what exactly is getting created on our behalf.

After installing sam cli for your operating system, these are the commands that you will need to create AWS resources are:

sam package --template-file template.yaml --s3-bucket your_s3_bucket --output-template-file package.yaml
sam deploy --template-file package.yaml --stack-name your_stack_name --capabilities CAPABILITY_IAM

sam package is used for converting the template.yaml into deployable artifacts as well as uploading said artifacts to an AWS S3 bucket. It’s important to note that this bucket needs to be created beforehand.
An important caveat to these deployments to note is that there is a hard limit of how much your total package size can be: 250MB. Meaning, if you have multiple lambdas defined, and their total size exceeds the limit, SAM will throw an error. If you are tempted to create an individual function for each endpoint, then it’s likely you will reach this limit quite fast. One way to solve this, is to use your lambas as docker images (a recent feature of Lambdas, where the hard limit is 10GB),upload it to AWS ECR and reference it in the SAM template or mount EFS volumes within Lambda. More details you can find here. Another way is to write a single lambda with its own router and point all traffic there (read more for an example).
The output generated from command sam package is the input template file for next command, sam deploy.
To delete your resources created by AWS SAM run following command:
aws cloudformation delete-stack --stack-name *your_stack_name*
Here is an example of the AWS SAM template.yaml file. The file is used to create AWS resources like AWS Lambda, and API Gateway with Custom domain through which lambda will be exposed on the Internet.
We’re setting up a Custom domain for API Gateway to be exposed to the Internet over a user-friendly domain name, instead of a generic one created by AWS.

---
AWSTemplateFormatVersion: 2010-09-09

Transform: AWS::Serverless-2016-10-31

Parameters:
  TargetStage:
    Description: "dev/prd"
    Type: String
  DomainName:
    Type: String
  AcmCertificateArn:
    Type: String
  VPCSecurityGroupIDs:
    Description: "An comma-delimited list of strings - the security groups that your Lambda function should be in"
    Type: CommaDelimitedList
  VPCSubnetIDs:
    Description: "An comma-delimited list of strings - the subnet IDs that your Lambda function should be assigned to"
    Type: CommaDelimitedList

Globals:
  Api:
    Cors:
      AllowMethods: "'GET, POST, PUT, OPTIONS, DELETE'"
      AllowHeaders: "'*'"
      AllowOrigin: "'*'"

  Function:
    Runtime: go1.x
    Tracing: Active # https://docs.aws.amazon.com/lambda/latest/dg/lambda-x-ray.html
    Timeout: 30
    VpcConfig:
      SecurityGroupIds:
        Ref: VPCSecurityGroupIDs
      SubnetIds:
        Ref: VPCSubnetIDs

Resources:
  ApiDetails:
    Type: AWS::Serverless::Api
    Properties:
      StageName: !Ref TargetStage
      Auth:
        UsagePlan:
          CreateUsagePlan: PER_API
          Description: Usage plan for this API
          Quota:
            Limit: 3000
            Period: MONTH
          Throttle:
            BurstLimit: 50
            RateLimit: 20

  ApiDomain:
    Type: AWS::ApiGateway::DomainName
    Properties:
      RegionalCertificateArn: !Ref AcmCertificateArn
      DomainName: !Ref DomainName
      EndpointConfiguration:
        Types:
          - REGIONAL
      SecurityPolicy: TLS_1_2

  ApiDomainMappings:
    Type: AWS::ApiGateway::BasePathMapping
    Properties:
      DomainName: !Ref DomainName
      RestApiId: !Ref ApiDetails
      Stage: !Ref ApiDetails.Stage

  EchoFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: bin/echo-sample
      MemorySize: 128
      Events:
        AllEvents:
          Type: Api
          Properties:
            Path: /{proxy+}
            Method: any
            RestApiId: !Ref ApiDetails

Outputs:
  ApiCustomDomainRegionalDomainName:
    Description: 'Regional domain name for the API'
    Value: !GetAtt ApiDomain.RegionalDomainName

As you can see in example, there are some sections defined like Parameters, Globals, Resources, Outputs. More details about each section you can read here.
To set up some parameters or override them, you can use option --parameter-overrides for aws sam deploy command. Like an example: sam deploy --stack-name test--template-file packaged.yaml --capabilities CAPABILITY_IAM --parameter-overrides TargetStage=dev, where we’re setting to parameter TargetStage a value dev.
The section of interest is the Resource: this is where we describe how our API Gateway, Custom Domain used for API Gateway and our simple Lambda should look like.

It’s also worthwhile mentioning that, for each API, you can define an Invocation Quota which can be both (1) a way to protect your deployment and (2) avoid cost escalation.
In this example we’re setting API Gateway to Custom Domain. To do so, we’re defining two resources: ApiDomain and ApiDomainMappings. To completely set up a custom domain on API Gateway you have to be an owner of that domain and it doesn’t need to be registered on AWS Route53 - it can be any DNS Registrar. We’re using Cloudflare as our DNS Registrar, so in this example we will use the same DNS.
Next step is to create an ACM certificate. AWS Certificate Manager (ACM) is a service that provisions and manages SSL certificates on AWS. This is a prerequisite if we want to do anything custom domain related in API Gateway. Note: you have to create an ACM certificate in the same AWS region as your API Gateway will be, if you are using an API endpoint type as regional as we do in our example.
In our case, we’re using Cloudflare, and our domain is registered on Cloudflare service. In that case, we're going to need to create an SSL certificate on Cloudflare and import that one into AWS ACM.
Here is a blog post from Cloudflare which can help you to create a certificate, it’s pretty straightforward. After successfully creating an origin certificate, you need to store the origin certificate and private key generated for you in PEM format. You will need this data for importing the certificate into AWS ACM. Note if you don’t save a private key and exit from the page with the generated one, you won’t be able to retrieve it again. In that case you will have to first revoke the certificate and then create a new one again.
To import an existing origin certificate published by Cloudflare, follow these instructions.
Save the newly generated ARN key of the imported certificate, you will assign it in the SAM template in resource ApiDomain.

Note that your custom DomainName property in resource ApiDomain must match the one that was added in the ACM certificate, otherwise it won't be created. In this example we’re using an SSL certificate for api-test.your.domain.com, so we have to set the parameter DomainName exactly like that subdomain.

The last thing to set up for API Gateway and custom domain is to define resource ApiDomainMappings. Basically it will allow us to point the custom domain at an existing API Gateway. It’s a simple mapping that lets us point API stages at specific paths. This is a useful feature if you have multiple APIs making up a suite that you want behind a shared domain.

We ran into an issue while setting up Stage property. At first we tried to simply add value as reference to Parameter Stage: !Ref TargetStage, but we got a strange error saying >“Invalid stage identifier specified”. After some digging we found a github topic with the same problem.
The problem lies in that creating a Stage resource is outside of the SAM template and with just referencing it to name will fail, because the resource ApiDomainMappings it’s being created before resource Stage. Thanks to fellows on github, this suggestion worked.
So we changed Stage property to: Stage: !Ref ApiDetails.Stage.

The very last thing to finish setting up API Gateway is to update over DNS with new records pointing to the domain name of API Gateway generated by AWS.
Generic domain name looks like https://api-id.execute-api.region.amazonaws.com/stage, where api-id is generated by API Gateway, region (AWS Region) is specified by you when creating the API, and stage is specified by you when deploying the API.

All you have to do is to login to your Cloudflare account and add a new CNAME record with a subdomain that matches the one you decided to add into the SAM template in DomainName parameter and pointing to the generated domain name of your API Gateway.

With this set up, you just provide to your developers newly created user-friendly REST API endpoint (exp. https://app-test.your-domain.com/stage).
If you want to avoid manually adding new DNS records to your Cloudflare provider, this is an example of how we managed to automate adding new records to Cloudflare.

resource "null_resource" "get_api_gateway_endpoint" {
  triggers = {
    template   = sha1(file("../../../backend/template.yaml"))
    stack_name = local.stack_name
  }
  provisioner "local-exec" {
    interpreter = ["bash", "-c"]
    command     = "aws cloudformation describe-stacks --stack-name ${self.triggers.stack_name} | jq '.Stacks[0].Outputs[0].OutputValue'| sed 's/\"//g' > ${path.module}/gateway_endpoint.txt"
  }
  depends_on = [null_resource.build_deploy_sam_resource]
}

In this part, we used null_resource to call aws command aws cloudformation describe-stacks to get API Gateway domain name (we defined Output section in the AWS SAM template as what it should look like) and store it in file. With sed command we’re stripping away https:// from domain name, and only leaving domain name that we going to add to new DNS record.

data "local_file" "api_gateway_endpoint" {
  filename   = "${path.module}/gateway_endpoint.txt"
  depends_on = [null_resource.get_api_gateway_endpoint]
}

resource "cloudflare_record" "api_gateway_endpoint" {
 depends_on = [data.local_file.api_gateway_endpoint]
 name       = var.subdomain_name_backend
 value      = data.local_file.api_gateway_endpoint.content
 type       = "CNAME"
 proxied    = true
 zone_id    = lookup(data.cloudflare_zones.default.zones[0], "id")
}

Using Terraform resource cloudflare_record, we set the value as content of the file that we generated above (it’s value where our CNAME record will point to), and the name value represents the subdomain that we want to use for our REST API endpoint.

The last resource defined in the SAM template example is our lambda EchoFunction.
Key things to keep in mind is to set up a lambda handler (pointing to Go executable file located in our project structure) and events (defining what kind of type will trigger our lambda).
In this example we are using the API as a trigger for our lambda.

   Type: Api
          Properties:
            Path: /{proxy+}
            Method: any
            RestApiId: !Ref ApiDetails

As you can see, we defined that any path and method can trigger our lambda over API Gateway that we had previously created. With this approach and with using this solution with the Go Echo framework, we reduced the number of lambdas (also with this approach we solved the problem with a hard limit of total size that all lambdas can have). Using the aws-lambda-go-api-proxy with echo framework we defined a REST controller that is managing real paths and methods that our lambda consumes.

Note, if your lambda needs more permissions to access some other AWS resources, you can set up Role property. One way you can create a role with Terraform and then create a property in the SAM template to pass it ARN of the created role and reference it to Role property.
Another way is to create a role inside of a SAM template and reference it to the Role property.

To expose your lambda over API Gateway the last thing to do is to enable, you already guess, CORS. To do so, all you have to is to define CORS section in the SAM template. In our case we set up as globals, so our API Gateway will automatically have CORS enabled.
Even with enabled CORS on API Gateway, you still have to handle in your lambda CORS headers, otherwise invocation of your lambda will not work.

With the Echo framework we created CORS middlerware to make our lambda functional.
Here is an example how to do it:

e := echo.New()
e.Use(middleware.CORSWithConfig(middleware.CORSConfig{
AllowOrigins: []string{"*"},
    AllowMethods: []string{echo.GET, echo.PUT, echo.POST, echo.DELETE, echo.OPTIONS},
    AllowHeaders: []string{"Accept", "Content-Type", "Content-Length", "Accept-Encoding", "X-CSRF-Token", "Authorization"},
    }))

AWS Lambda

Lambda is the Serverless computing platform service provided on AWS. Using a Lambda function you can run or execute your application code without actually provisioning any application servers.

Since December of 2020, AWS has increased memory that can be assigned to lambda function to max 10GB and increased the number of vCPU to allocate to max 6 vCPU.
Also a new cool feature is that you can now package and deploy Lambda functions as container images of up to 10 GB in size. If you have some big lambda function with huge dependencies, you can then easily use this approach to solve the situation where your lambda handler can have a max file size of 250MB.

The way we approach this hard limit, we used the aws-lambda-go-api-proxy solution to packt multiple lambas into one. Basically this solution allows us to use one lambda to handle multiple methods or paths for one API Gateway, so we don’t need to have multiple lambdas for each path or method. It makes it easier to maintain a code base.

Very cool thing is, if you have already developed your REST API with Echo framework or some other like Gin, Iris, Negroni, Mux, Fiber or Chi and you’re planning to move to serverless (AWS Lambda, GCP Cloud Function, ...) you can easily achieve it by using this solution.
If usage goes up and it becomes price prohibitive, it's a one line change to swap it to a container & run via AWS services like Fargate, ECS.

In the next example we’ll show you how to use the Echo framework with aws-lambda-go-api-proxy to achieve the solution of having one lambda function for multiple paths and methods.
We have defined two paths, /user and /organization, with two supported methods each.

package main

import (
    "context"
    "net/http"

    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"

    echoadapter "github.com/awslabs/aws-lambda-go-api-proxy/echo"
    "github.com/labstack/echo/v4"
    "github.com/labstack/echo/v4/middleware"
)

type UserData struct {
    UserId      string `json:"userId"`
    DisplayName string `json:"displayname"`
    Status      string `json:"status"`
}

type User struct {
    User UserData `json:"user"`
}

type OrganizationData struct {
    OrganizationId string `json:"organizationId"`
    DisplayName    string `json:"displayname"`
    Status         string `json:"status"`
}

type Organization struct {
    Organization OrganizationData `json:"organization"`
}

var echoLambda *echoadapter.EchoLambda

func init() {
    e := echo.New()
    //define routes
    registerUserRoutes(e)
    registerOrganizationRoutes(e)

    //define CORS
    e.Use(middleware.CORSWithConfig(middleware.CORSConfig{
        AllowOrigins: []string{"*"},
        AllowMethods: []string{echo.GET, echo.PUT, echo.POST, echo.DELETE, echo.OPTIONS},
        AllowHeaders: []string{"Accept", "Content-Type", "Content-Length", "Accept-Encoding", "X-CSRF-Token", "Authorization"},
    }))
    e.Use(middleware.Logger())

    echoLambda = echoadapter.New(e)
}

func handler(ctx context.Context, req events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {
    return echoLambda.ProxyWithContext(ctx, req)
}

func main() {
    lambda.Start(handler)
}

func registerUserRoutes(e *echo.Echo) {
    user := e.Group("/user")
    user.GET("", getUser)
    user.POST("", createUser)
}

func registerOrganizationRoutes(e *echo.Echo) {
    organization := e.Group("/organization")
    organization.GET("", getOrganization)
    organization.PUT("", updateOrganization)
}

func getUser(c echo.Context) error {
    return c.JSON(http.StatusOK, User{
        User: UserData{UserId: "1", DisplayName: "Test", Status: "active"},
    })
}

func createUser(c echo.Context) error {
    return c.JSON(http.StatusOK, User{
        User: UserData{UserId: "2", DisplayName: "Test2", Status: "created"},
    })
}

func getOrganization(c echo.Context) error {
    return c.JSON(http.StatusOK, Organization{
        Organization: OrganizationData{
            OrganizationId: "1",
            DisplayName:    "Test organization",
            Status:         "active",
        },
    })
}

func updateOrganization(c echo.Context) error {
    return c.JSON(http.StatusOK, Organization{
        Organization: OrganizationData{
            OrganizationId: "1",
            DisplayName:    "Test organization",
            Status:         "updated",
        },
    })
}

This simple lambda function is returning a JSON response in case you have sent a request to the correct path using the correct method.

One cool way of using AWS Lambda is for running migration codes. AS mentioned earlier, we are using GORM as an ORM library for modeling and interacting with RDS-based Postgres.
GORM has the ability to define data models which represent database tables with appropriate constraints, which allows us to have a dedicated Lambda that runs migration/creations of resources (such as tables, indexes, constraints, pre-populated data, etc) in the database without the need to maintain incremental SQL scripts. If you are curious about GORM checkout their documentation here (bonus: major version 2 was recently released with some very nice improvements).

Local testing

To test our lambda functions locally we’re using Docker and AWS SAM.
As we’re using Go runtime environment for our lambda functions, we have defined Docker file with Go base image and also installed awscli and asw-sam-cli tools.

FROM golang:1.15

RUN apt-get update
RUN apt-get install python3 python3-pip -y

RUN pip3 install --upgrade pip
RUN pip3 install awscli
RUN pip3 install aws-sam-cli==1.12

WORKDIR /var/opt

EXPOSE 3003

Next step is to create a docker-compose service that will use our docker base image defined in Dockerfile, and serve our Lambda function with an API Gateway-like local deployment.

In docker-compose service we’re mounting our backend folder which includes SAM template.yaml and the lambda executable. To run a lambda function with API Gateway defined in template.yaml, all you have to do is to define an entrypoint to use SAM command sam local start-api.

version: "3.5"

services:
  backend:
    build:
      context: ./backend
      dockerfile: Dockerfile
    image: backend:local
    hostname: backend
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./backend:/var/opt
    working_dir: /var/opt
    environment:
      # don’t share statistics usage with AWS
      SAM_CLI_TELEMETRY: 0
      LOG_LEVEL: DEBUG
    ports:
      - "3003:3003"
    networks:
      - sample
    entrypoint:
      - /bin/bash
      - -c
      - |
        sam local start-api \
          --host 0.0.0.0 -p 3003 \
          --docker-volume-basedir "$PWD"/backend \
          --docker-network sample

networks:
  sample:
    name: sample

To start local development api, just use command docker-compose up, or our make command make up-be.

The API is now ready to be tested.

If you try to run a simple curl command curl http://localhost:3003/user you will get a response from the api.

Cool thing about using SAM local command in combo with docker-compose, is that you can easily define new docker-compose services like postgres, localstack (simulate other AWS services) or frontend and connect your lambda function to those services.
This approach allows us to have a full local development cycle without the need to deploy to AWS to test.

Terraform

Terraform is a general purpose infrastructure as code tool that can create infrastructure on different cloud providers like AWS, GCP, AZURE and Oracle Cloud Infrastructure.
Besides support for cloud providers, terraform has support for plenty of different kinds of providers, such as Cloudflare, Helm, etc.
If interested, you can find a full list of supported providers here.
Having the very useful modules system, it's relatively straightforward to deploy large infrastructures to any kind of cloud provider.

We start with Terraform, as it is the glue for all these services and frameworks; not just because we’re big fans of it. In practice we are stitching AWS and Cloudflare together; and, in a clunky way, AWS SAM. Yes, by far not our favorite solution, using null_resource.

At the moment of writing this post there is no better option to run AWS SAM via Terraform, or at least we couldn’t find one. Hoping in future Terraform will support creating SAM resources directly within a native provider.

To glue things up, we go with the previous release of Terraform, at the moment of writing it was version 0.12. To find out some cool features that were released with this version or newer versions, you can read about in our blog post here.

If you take a closer look at our architecture, you’ll notice that Terraform is responsible directly or indirectly for managing all resources that run on AWS.
It’s directly responsible for creating AWS VPC, AWS RDS, AWS Cognito, AWS S3 buckets, AWS DynamoDB, AWS CloudFront distribution, AWS SSM and Cloudflare DNS records.
Indirectly, it’s responsible for creating AWS Lambdas, AWS API Gateway. Meaning that the creation of those resources is managed by AWS SAM cli, defined by SAM template via a null_resouce.

To manage different stages, we’re using the Terraform workspaces. We choose a strategy of naming Terraform workspaces by names of our branches, allowing us to have simple flow, mostly because of managing resources by Gitlab CI/CD.

Below follows an example of how you could deploy SAM resources using a null_resource within your Terraform scripts. Essentially, you need to run both package and deploy commands and ensure all parameters (e.g. VPC, SGs, etc.) are injected into the deployment. To ensure the resource is called when the dependent files change, one could use the sha1 of template as a trigger (as shown).

resource "null_resource" "build_deploy_sam_resource" {
  triggers = {
    s3_bucket     = var.s3_bucket_artifacts
    template      = sha1(file("../../../backend/template.yaml"))
    stack_name    = local.stack_name
    sec_group_ids = join(",", aws_security_group.lambda_sg.*.id)
    subnet_ids    = join(",", var.private_subnet_ids)
    domain_name   = "${var.subdomain_name_backend}.${var.domain_name}"
    target_stage  = "dev"
  }
  provisioner "local-exec" {
    interpreter = ["bash", "-c"]
    command     = <<-EOT
      sam package --template-file ${local.template} --s3-bucket ${self.triggers.s3_bucket} --output-template-file ${local.packaged}
      sam deploy --stack-name ${self.triggers.stack_name} --template-file ${local.packaged} --capabilities CAPABILITY_IAM --parameter-overrides TargetStage=${self.triggers.target_stage} DomainName=${self.triggers.domain_name} VPCSecurityGroupIDs=${self.triggers.sec_group_ids} VPCSubnetIDs=${self.triggers.subnet_ids} AcmCertificateArn=${aws_acm_certificate_validation.backend.certificate_arn}
    EOT
  }
  depends_on = [aws_security_group.lambda_sg, aws_acm_certificate_validation.backend]
}

Cron jobs

Another use case we needed to implement required some business logic to run on a schedule, which you can also create easily with SAM - which will create a CloudWatch alarm to trigger the appropriate Lambda. AWS supports two ways of defining scheduled time to trigger some AWS resources. One is to use cron expression - in format cron(Minutes Hours Day-of-month Month Day-of-week Year). Second one is to use rate expression - in format rate(Value Unit), Where Value is a positive integer and Unit can be minute(s), hour(s), or day(s).
More details on how to define schedule expressions can be found on the AWS site.

Keep in mind that rate expression is being executed when a rule is created. For example, if you set up a scheduler to run every 1 hour using keyword rate, rate(1 hour), the next run time will be the time when your rule was created plus 1 hour. Let's say you created a rule at 10:20 AM, so the next run will occur at 11:20 AM.

An example of how to define scheduled resources within the SAM template.

  ScheduledLambda:
    Type: AWS::Serverless::Function
    Properties:
      Role: !Ref Role
      Timeout: 30
      Handler: bin/scheduled_lambda
      Events:
        ScheduleExample:
          Type: Schedule
          Properties:
            Schedule: cron(0/30 * * * ? *)
            Description: "Example of cron jobs that runs lambda every 30 min"

The most crucial part to define scheduler is assigning type as Scheduler and setting valid cron or rate values for Schedule properties.

 Events:
        ScheduleExample: #custom name of event
          Type: Schedule #mandatory so AWS SAM can know what kind of event this gonna be
          Properties:
            Schedule: cron(0/30 * * * ? *)

Inside the Schedule property you can define your own expressions.
Here is the list of all properties that you can assign to the cloudwatch event resource in the SAM template.

AWS SSM Sessions

After setting up our environment on AWS, we were looking at how to access a database located in a private network. We found System Manager remote sessions service is a very nice solution, because it allows us to have access to private networks without exposing any service publicly. This is great for troubleshooting purposes, accessing our database.
Also allowing us easily create a cron job for backuping databases or any kind of extra work that is required for maintenance.

To be able to create SSM sessions/tunnels, you need to create an EC2 instance running the SSM Agent. Note that SSM Agent is preinstalled, by default, on Amazon's AMIs. In our case,we used Amazon Linux 2 for convenience Of course, in our case, we are using Terraform to spawn an instance.

In the following example, we’ll show you how to create this subset of our stack, including the installation of a Postgres client in the EC2 instance.

locals {
  // internal user used for admin maintenance tasks
  ssm_instance_user  = "example"
  ssm_instance_group = "example"
}

/**
  * SSM instance security group
*/
resource "aws_security_group" "ssm_instance" {
  vpc_id      = var.vpc_id
  name        = "ssm-sg"
  description = "Allow egress from SSM Agent to Internet."
  tags = {
    "Name" = "ssm-sg"
  }
}

resource "aws_security_group_rule" "ssm_instance_allow_egress_https" {
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "TCP"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.ssm_instance.id
}

resource "aws_security_group_rule" "ssm_instance_allow_egress_http" {
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "TCP"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.ssm_instance.id
}

data "aws_iam_policy_document" "ssm_instance_default" {
  statement {
    effect  = "Allow"
    actions = ["sts:AssumeRole"]

    principals {
      type        = "Service"
      identifiers = ["ec2.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "ssm_instance" {
  name                      = "ssm-iam-role"
  assume_role_policy = data.aws_iam_policy_document.ssm_instance_default.json
}

resource "aws_iam_role_policy_attachment" "ssm_instance_policy" {
  role           = aws_iam_role.ssm_instance.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

resource "aws_iam_instance_profile" "ssm_instance" {
  name  = "ssm-iam-instance-profile"
  role    = aws_iam_role.ssm_instance.name
}

data "template_file" "install_ssm_instance" {
  template = file("install-ssm-instance.yaml")
  vars = {
    user  = local.ssm_instance_user
    group = local.ssm_instance_group
  }
}

/*
 * Ref: https://registry.terraform.io/providers/hashicorp/template/latest/docs/data-sources/cloudinit_config
 */
data "template_cloudinit_config" "ssm_instance_config" {
  part {
    content_type = "text/cloud-config"
    content      = data.template_file.install_ssm_instance.rendered
  }
  // cloud-init has apparently a 8 year unresolved issue (face-palm) [1], and is unable
  // to create users before write_files directive . Thus, we use this hack.
  // [1] - https://bugs.launchpad.net/cloud-init/+bug/1486113
  part {
    content_type = "text/x-shellscript"
    content      = "/usr/bin/install-pg.sh"
  }
}

resource "aws_instance" "ec2" {
  ami                    = "ami-0d712b3e6e1f798ef"
  instance_type          = "t2.micro"
  subnet_id              = var.private_subnet_ids[0]
  vpc_security_group_ids = [aws_security_group.ssm_instance.id]
  iam_instance_profile   = aws_iam_instance_profile.ssm_instance.name

  root_block_device {
    delete_on_termination = true
    volume_type           = "gp2"
    volume_size           = 20
  }
  user_data = data.template_file.install_ssm_instance.rendered
  tags = {
    "Name" = "ssm-ec2"
  }
}

Example of template install-ssm-instance.yaml

#cloud-config

# See docs for more details: https://cloudinit.readthedocs.io/en/latest/topics/examples.html

# Upgrade database on first boot (run 'apt-get upgrade').
package_upgrade: true

users:
  - default
  - name: ${user}
    gecos: ${user}
    shell: /bin/bash
    primary_group: ${group}
    sudo: ALL=(ALL) NOPASSWD:ALL
    groups: users, admin
    lock_passwd: false

# download & install following packages
packages:
  - curl

write_files:
  - permissions: '0750'
    owner: root:root
    content: |
      #!/bin/bash
      set -euo pipefail
      tee /etc/yum.repos.d/pgdg.repo<<EOF
      [pgdg12]
      name=PostgreSQL 12 for RHEL/CentOS 7 - x86_64
      baseurl=https://download.postgresql.org/pub/repos/yum/12/redhat/rhel-7-x86_64
      enabled=1
      gpgcheck=0
      EOF
      yum makecache
      yum install -y postgresql12 postgresql12-server
      chown ${user}:${group} -R /home/${user}/
    path: /usr/bin/install-pg.sh
final_message: "The system is finally up, after $UPTIME seconds"

After setting up an EC2 instance with the installed postgres client, the next step is to connect to the EC2 instance, using the following command.

aws ssm start-session --target “instance name” - instance name is you EC2 instance name

In case you are having problems remembering the EC2 instance name, you can easily use make command to make it more straightforward.

Basically everything that you need is EC2 tag value, so using aws cli command to describe EC2 instance filtering by tag will get us the desired instance name.
After everything is ready, to get connected to your EC2 instance, all you have to do is run make command.
In our example we are calling that command ssm-instance, so running make ssm-instance, you will get to the place you wanted to be :).

Here is an example how can you achieve it using make:

SSM_TAG=ssm-ec2
SSM_INSTANCE := $(shell aws ec2 describe-instances --filter "Name=tag:Name,Values=$(SSM_TAG)" --query "Reservations[].Instances[?State.Name == 'running'].InstanceId[]" --output text)

.PHONY: ssm-instance
ssm-instance:
    @echo Connecting to SSM INSTANCE
    @aws ssm start-session --target $(SSM_INSTANCE)

The last thing is to verify if the install-pg.sh script has been created on your EC2 instance by the Terraform example above.

Just type sudo -s in the terminal and run command ./install-pg.sh. Now your postgres client should be downloaded and installed. Voila!
To verify that you have successfully installed postgres client, just run command psql.

In this example you saw how you can use AWS SSM and the EC2 running instance to get deployed to a private VPC.

Cost

If you are a startup at an early stage, then you should definitely consider applying to the AWS startup program as early as possible. While an AWS VPC itself doesn’t cost anything, it does cost to have the bare minimum resources for it to be production ready (e.g. Private/Public subnets with NAT Gateway). We have set up a VPC with a single NAT and internet gateway as a shared resource for our multiple stages (development, integration, production). We agree it’s not ideal, but for projects at early stages, we were avoiding escalation on expenses - we can all agree on this :).

Besides the NAT Gateway, our biggest contributors to cost are of course the encrypted RDS instances db.t2.small.

To sum up, this basic (but very functional) setup costs us around 100´euros per month, which is very fair considering we have multiple stages and the encryption used for all possible resources (RDS, S3, secrets).

Cloudflare

There is a bit of a challenge using Cloudflare strict mode with frontend deployed on AWS S3 bucket. Cloudflare's strict mode requires that the origin server needs to have an SSL certificate and in case of using AWS S3 as hosting there is a bit of a problem.
As you might know, AWS S3 bucket is publicly exposed only on HTTP protocol.

So to overcome this issue, we needed to create an SSL certificate on AWS Route53 associated with our publicly exposed AWS S3 bucket, alongside with AWS Cloudfront distribution applying that SSL certificate. Basically AWS Cloudfront distribution allows us to expose AWS S3 buckets over HTTPS protocol.

In this example we are creating an ACM certificate for a domain that will be used for our frontend exposed via Cloudfront distribution.

provider "aws" {
  alias = "virginia"
  region = "us-east-1"
}

This provider in region us-east-1 needs to be defined so an ACM certificate can be created.

resource "aws_acm_certificate" "default" {
  provider          = aws.virginia
  domain_name       = "${var.subdomain_name}.${var.domain_name}"
  validation_method = "DNS"

    lifecycle {
    create_before_destroy = true
  }
}

data "cloudflare_zones" "default" {
  filter {
    name = var.domain_name
  }
}

resource "cloudflare_record" "validation_domain" {
  name    = aws_acm_certificate.default.domain_validation_options[0]["resource_record_name"]
  value   = trimsuffix(aws_acm_certificate.default.domain_validation_options[0]["resource_record_value"], ".")
  type    = aws_acm_certificate.default.domain_validation_options[0]["resource_record_type"]
  zone_id = lookup(data.cloudflare_zones.default.zones[0], "id")
  depends_on = [aws_acm_certificate.default]
}

This resource will create CNAME records in cloudflare DNS to validate our ACM certificate.

resource "aws_acm_certificate_validation" "default" {
  provider                = aws.virginia
  certificate_arn         = aws_acm_certificate.default.arn
  validation_record_fqdns = cloudflare_record.validation_domain.*.hostname
}

Note that certificate validation can take up to 20min, so have some patience.

The next part is an example of creating a Cloudfront distribution and exposing it’s generic domain name as a user-friendly one that we will assign in Cloudflare.

resource "aws_cloudfront_distribution" "frontend" {
  enabled         = true
  aliases         = ["${var.subdomain_name}.${var.domain_name}"]
  is_ipv6_enabled = true
  // cheapest: https://github.com/laurilehmijoki/s3_website/issues/150
  price_class = "PriceClass_100"

  default_cache_behavior {
    allowed_methods        = ["GET", "HEAD", "OPTIONS"]
    cached_methods         = ["GET", "HEAD"]
    target_origin_id       = var.frontend_s3_origin_id
    viewer_protocol_policy = "redirect-to-https"
    default_ttl            = 0
    max_ttl                = 0

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
  }

  origin {
    domain_name = var.frontennd_s3_origin_domain_name
    origin_id   = var.frontend_s3_origin_id

    custom_origin_config {
      http_port                = 80
      https_port               = 443
      origin_keepalive_timeout = 5
      origin_protocol_policy   = "http-only" // setting defined after terraform import. can try with https-only
      origin_read_timeout      = 30
      origin_ssl_protocols     = ["TLSv1", "TLSv1.1", "TLSv1.2"]
    }
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn            = aws_acm_certificate_validation.default.certificate_arn
    cloudfront_default_certificate = false
    minimum_protocol_version       = "TLSv1.2_2019"
    ssl_support_method             = "sni-only"
  }
}

resource "cloudflare_record" "frontend_service" {
  name    = "${var.subdomain_name}.${var.domain_name}"
  value   = aws_cloudfront_distribution.frontend.domain_name
  type    = "CNAME"
  proxied = true
  zone_id = lookup(data.cloudflare_zones.default.zones[0], "id")
}

After successfully creating those resources, your frontend will be now available over the internet on domain app-test.your-domain.com. Ta-da! We are now happy, having everything set up on HTTPS. Hurray!

Also worth noticing it that at some point AWS marked cloudflare as non-trusted authority (there are references to Mozilla’s authorities list, more can be found here), so resources that we found online to solve our issue with exposing AWS S3 bucket over SSL weren’t helpful.
Those solutions were suggesting to use Cloudflare SSL certificates and import them into AWS certificate manager and use it within AWS Cloudfront distribution.

Gitlab runners

We used Gitlab CI/CD to power up our Terraform scripts to automatically handle deploying on different environments like test, staging and production. To set up the Gitlab runners to run successful workflows, we simply ensure that AWS cli, SAM cli, Terraform and Make are available/installed during the execution. We are aware this step is highly improvable - we could have a prebuilt Docker image containing all of these or use a community maintained one, but we wanted to first focus on product use cases and later focus on improvements.

Here is an example of how to set up a Gitlab project with gitlab-ci.yaml. The most important part of this example is in before_script section of gitlab-ci.yaml file.

workflow:

variables:
  AWS_DEFAULT_REGION: "eu-west-1"
  WORKSPACE: "dev"
  TF_ENVIRONMENT: "terraform"

stages:
  - terraform-apply

terraform-apply:
  image: google/cloud-sdk:slim
  stage: terraform-apply
  before_script:
    - apt-get install -y unzip make jq
    - curl https://releases.hashicorp.com/terraform/0.12.29/terraform_0.12.29_linux_amd64.zip --output /tmp/terraform.zip
    - unzip /tmp/terraform.zip -d /tmp
    - chmod +x /tmp/terraform
    - mv /tmp/terraform /usr/local/bin/
    - curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.0.30.zip" -o "awscliv2.zip"
    - unzip -q awscliv2.zip
    - ./aws/install
    - aws sts get-caller-identity
    - curl --location "https://github.com/aws/aws-sam-cli/releases/download/v1.18.1/aws-sam-cli-linux-x86_64.zip" -o "awssamcli.zip"
    - unzip -q awssamcli.zip
    - ./install
    - sam --version
    - curl --location https://github.com/terraform-linters/tflint/releases/download/v0.21.0/tflint_linux_amd64.zip -o /tmp/tflint.zip
    - unzip /tmp/tflint.zip -d /tmp
    - chmod +x /tmp/tflint
    - mv /tmp/tflint /usr/local/bin/
  script:
    - echo yes | make tf-apply

Final thoughts

Overall we are pretty satisfied with the usability of AWS framework for serverless environments.
The major drawback with this solution that we want to make sure we highlight is the usage of null_resource to integrate with AWS SAM. Terraform null_resources is something everyone should try to stay away from, so we sincerely hope to see future integration of SAM in the AWS provider.
And that is it. Thank you for reading, we hope this has been useful. Feel free to reach out to us if you have questions or suggestions.

Once again, you can find all the code supporting this post here.

DEV Community