This blog post is part of my AWS series:
- Infrastructure as Code - Managing AWS With Terraform
- Deploying an HTTP API on AWS using Lambda and API Gateway
- Deploying an HTTP API on AWS using Elastic Beanstalk
- Deploying and Benchmarking an AWS RDS MySQL Instance
- Event Handling in AWS using SNS, SQS, and Lambda
- Continuous Delivery on AWS With Terraform and Travis CI
- Sensor Data Processing on AWS using IoT Core, Kinesis and ElastiCache
- Monitoring AWS Lambda Functions With CloudWatch
Introduction
In the previous posts we introduced and extensively used Terraform to automate infrastructure deployments. If you are aiming at true continuous delivery a high degree of automation is crucial. Continuous delivery (CD) is about producing software in short cycles with high confidence, reducing the risk of delivering changes.
In this blog post we want to combine Terraform with an automated build pipeline on Travis CI. In order to use Terraform in a shared setting we have to configure it to use remote state, as local state cannot be used for any project which involves multiple developers or automated build pipelines. The application we are deploying will be a static website generated by Jekyll.
The remainder of the post is structured as follows. In the first section we will briefly discuss the overall solution architecture, putting the focus on the continuous deployment infrastructure. The next section is going to elaborate on two solutions how to provision the remote state resources using Terraform. Afterwards there will be a walk through the implementation of the remote state bootstrapping, the static website deployment, and the automation using Travis CI. We are closing the blog post by summarizing the main ideas.
Architecture
The above figure visualizes the solution architecture including the components for continuous integration (CI) and CD. The client is the developer in this case as we are looking at the setup from the development point of view.
As soon as a developer pushes new changes to the remote GitHub repository it triggers a Travis CI build. Travis CI is a hosted build service that is free to use for open source projects. Travis then builds the website artifacts, deploys the infrastructure, and pushes the artifacts to production.
We are using an S3 backend with DynamoDB for Terraform. Terraform will store the state within S3 and use DynamoDB to acquire a lock while performing changes. The lock is important to avoid that two Terraform binaries are modifying the same state concurrently.
To use the S3 remote state backend we need to create the S3 bucket and DynamoDB table beforehand. This bootstrapping is also done and automated with Terraform. But how do we manage infrastructure with Terraform that is required to use Terraform? The next section will discuss two approaches to solve this 🐔 & 🥚 problem.
Remote State Chicken And Egg Problem
How can we use Terraform to setup the S3 bucket and DynamoDB table we want to use for the remote state backend? First we create the remote backend resources with local state. Then we somehow need to share this state to allow modifications of the backend resources later on. From what I can tell there are two viable solutions to do that:
- Shared local state. Commit local state to your version control and share it in a remote repository.
- Migrated state. Migrate local state to remote state backend.
Both solutions involve creating the remote state resources using local state. They differ in the way how the state for provisioning the remote state resources is shared. While the first option is easy to setup there are two major risks that need to be taken into account:
- Terraform state might contain secrets. In the case of only the S3 bucket and DynamoDB table there is only one variable which might be problematic: The AWS access key. If you are working with a private repository, this might not be a huge issue. When working on open source code it might be useful to encrypt the state file before committing it. You can do this with OpenSSL or more specialized tools like Ansible Vault.
- Shared local state has no locking or synchronization mechanism. When publishing your local Terraform state to the remote source code repository you have to manually make sure to keep this state file in sync with all developers. If someone is making modifications to the resources he or she has to commit and push the updated state file and make sure that no one else is modifying the infrastructure at the same time.
The second option is a bit safer with regards to the above-mentioned issues. S3 supports encryption at rest out of the box and you can have fine granular access control on the bucket. Also if DynamoDB is used for locking, two parties cannot modify the resources concurrently. The disadvantage is that the solution is more complex.
After we migrate the local state to the created remote state backend, it will contain the state for the backend itself plus the application infrastructure state. Luckily Terraform provides a built-in way to isolate state of different environments: Workspaces. We can create a separate workspace for the backend resources to avoid interference between changes in our backend infrastructure and application infrastructure.
Working with workspaces is a bit difficult to wrap your head around so we are going to tackle this option in the course of this post to get it to know in detail. In practice I am not sure if the increased complexity is worth the effort, especially as you usually do not touch the backend infrastructure unless you want to shut down the project. The next section will explain the bootstrapping and application implementation and deployment step by step.
Implementation
Development Tool Stack
To develop the solution we are using the following tools:
- Terraform v0.11.7
- Jekyll 3.8.3
- Git 2.15.2
- IntelliJ + Terraform Plugin
The source code is available on GitHub. Now let's look into the implementation details of each component.
Remote State Bootstrapping And Configuration
We will organize our Terraform files in workspaces and folders. Workspaces isolate the backend resource state from the application resource state. Folders will be used to organize the Terraform resource files.
We will create two workspaces: state
and prod
. The state
workspace will manage the remote state resources, i.e. the S3 bucket and the DynamoDB table. The prod
workspace will manage the production environment of our website. You can add more workspaces for staging or testing later but this is beyond the scope of this blog post.
We will create three folders containing Terraform files: bootstrap
, backend
, and website
. The next listing outlines the directory and file structure of the project.
.
├── locals.tf
├── providers.tf
├── backend
│ ├── backend.tf
│ ├── backend.tf.tmpl
│ ├── locals.tf -> ../locals.tf
│ ├── providers.tf -> ../providers.tf
│ └── state.tf -> ../bootstrap/state.tf
├── bootstrap
│ ├── locals.tf -> ../locals.tf
│ ├── providers.tf -> ../providers.tf
│ └── state.tf
└── website
├── backend.tf -> ../backend/backend.tf
├── locals.tf -> ../locals.tf
├── providers.tf -> ../providers.tf
└── website.tf
The project root will contain a shared AWS provider configuration providers.tf
, as well as a project name variable inside locals.tf
. We will go into details about the file contents later.
In addition to the shared files bootstrap
contains state.tf
, which defines the S3 bucket and DynamoDB table backend resources. We share them across folders using symbolic links. The backend
folder will have the same resources but uses the already present S3 backend defined in backend.tf
. When switching from bootstrap
to backend
after the initial provisioning, Terraform will migrate the local state to the remote backend.
The website
folder contains the remote backend configuration and all resources related to the actual website deployment. We will access backend
and bootstrap
from the state
workspace and website
from prod
and any other additional workspace related to the application.
The next listing shows what the bootstrap/state.tf
file looks like. The project_name
local variable is defined within the shared locals.tf
file. The current aws_caller_identity
and aws_region
are defined within the shared providers.tf
file.
# state.tf
locals {
state_bucket_name = "${local.project_name}-${data.aws_caller_identity.current.account_id}-${data.aws_region.current.name}"
state_table_name = "${local.state_bucket_name}"
}
resource "aws_dynamodb_table" "locking" {
name = "${local.state_table_name}"
read_capacity = "20"
write_capacity = "20"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
resource "aws_s3_bucket" "state" {
bucket = "${local.state_bucket_name}"
region = "${data.aws_region.current.name}"
versioning {
enabled = true
}
server_side_encryption_configuration {
"rule" {
"apply_server_side_encryption_by_default" {
sse_algorithm = "AES256"
}
}
}
tags {
Name = "terraform-state-bucket"
Environment = "global"
project = "${local.project_name}"
}
}
output "BACKEND_BUCKET_NAME" {
value = "${aws_s3_bucket.state.bucket}"
}
output "BACKEND_TABLE_NAME" {
value = "${aws_dynamodb_table.locking.name}"
}
Here we define the S3 bucket and enable encryption as well as versioning. Encryption is important because Terraform state might contain secret variables. Versioning is highly recommended to be able to roll back in case of accidental state modifications.
We also configure the DynamoDB table which is used for locking. Terraform uses an attribute called LockID
so we have to create it and make it the primary key. When using DynamoDB without auto scaling you have to specify a maximum read and write capacity before request throttling kicks in. To be honest I think you should go with the minimum here.
We can now create the state
workspace and start bootstrapping with local state:
terraform workspace new state
terraform init bootstrap
terraform apply bootstrap
After the S3 bucket and DynamoDB table are created we will migrate the local state. This is done by initializing the state resources with the newly created remote backend. Before we can proceed however we need to include the BACKEND_BUCKET_NAME
and BACKEND_TABLE_NAME
variables into backend/backend.tf
. I did it by generating the file using envsubst
and backend/backend.tf.tmpl
:
# backend.tf.tmpl
terraform {
backend "s3" {
bucket = "${BACKEND_BUCKET_NAME}"
key = "terraform.tfstate"
region = "eu-central-1"
dynamodb_table = "${BACKEND_TABLE_NAME}"
}
}
Now let's initialize the remote backend resources to migrate the local state.
$ terraform init backend
Initializing the backend...
Do you want to migrate all workspaces to "s3"?
Both the existing "local" backend and the newly configured "s3" backend support
workspaces. When migrating between backends, Terraform will copy all
workspaces (with the same names). THIS WILL OVERWRITE any conflicting
states in the destination.
Terraform initialization doesn't currently migrate only select workspaces.
If you want to migrate a select number of workspaces, you must manually
pull and push those states.
If you answer "yes", Terraform will migrate all states. If you answer
"no", Terraform will abort.
Enter a value: yes
Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.
That's it! We created the remote state backend using local state and migrated the local state afterwards. Next we will deploy some actual application resources using the remote state backend.
Static Website
I chose a static webpage as an example application for this post. The reason is that the main focus lies in automation and working with remote state so this one will be kept rather simple. The website is generated using Jekyll and the source code is stored in website/static
.
To make the website publicly available we will use another S3 bucket and configure it to display files as a website. Here is the the configuration of the bucket within website/website.tf
.
# website.tf
locals {
website_bucket_name = "${local.project_name}-${terraform.workspace}-website"
}
resource "aws_s3_bucket" "website" {
bucket = "${local.website_bucket_name}"
acl = "public-read"
policy = <<POLICY
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"PublicReadGetObject",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::${local.website_bucket_name}/*"]
}
]
}
POLICY
website {
index_document = "index.html"
error_document = "error.html"
}
tags {
Environment = "${terraform.workspace}"
}
}
We configure the bucket to be publicly readable using the appropriate ACL and policy. We can setup website hosting using the website
stanza. The index_document
will be served when no specific resource is requested, while the error_document
is used if the requested resource does not exist.
Next we have to specify the HTML and CSS files. This is a bit cumbersome as we cannot tell Terraform to upload a whole folder structure. We will also output the URL which can be used to access the website in the end.
# website.tf
locals {
site_root = "website/static/_site"
index_html = "${local.site_root}/index.html"
about_html = "${local.site_root}/about/index.html"
post_html = "${local.site_root}/jekyll/update/2018/06/30/welcome-to-jekyll.html"
error_html = "${local.site_root}/404.html"
main_css = "${local.site_root}/assets/main.css"
}
resource "aws_s3_bucket_object" "index" {
bucket = "${aws_s3_bucket.website.id}"
key = "index.html"
source = "${local.index_html}"
etag = "${md5(file(local.index_html))}"
content_type = "text/html"
}
resource "aws_s3_bucket_object" "post" {
bucket = "${aws_s3_bucket.website.id}"
key = "jekyll/update/2018/06/30/welcome-to-jekyll.html"
source = "${local.post_html}"
etag = "${md5(file(local.post_html))}"
content_type = "text/html"
}
resource "aws_s3_bucket_object" "about" {
bucket = "${aws_s3_bucket.website.id}"
key = "about/index.html"
source = "${local.about_html}"
etag = "${md5(file(local.about_html))}"
content_type = "text/html"
}
resource "aws_s3_bucket_object" "error" {
bucket = "${aws_s3_bucket.website.id}"
key = "error.html"
source = "${local.error_html}"
etag = "${md5(file(local.error_html))}"
content_type = "text/html"
}
resource "aws_s3_bucket_object" "css" {
bucket = "${aws_s3_bucket.website.id}"
key = "assets/main.css"
source = "${local.main_css}"
etag = "${md5(file(local.main_css))}"
content_type = "text/css"
}
output "url" {
value = "http://${local.website_bucket_name}.s3-website.${aws_s3_bucket.website.region}.amazonaws.com"
}
Before we deploy the changes we should create a new workspace. The state
workspace will only be used in case we need to make modifications to the remote state backend resources. We'll call the new workspace prod
and use it to initialize and deploy the website resources.
terraform workspace new prod
terraform init website
cd website/static && jekyll build && cd -
terraform apply website
- 🎉🎉🎉
"But what about continuous delivery", I hear you ask? The following section is going to cover setting up the automated Travis job.
Travis Job
To use Travis CI we have to provide a build configuration file called .travis.yml
. Simply put it tells the build server which commands to execute. Here is what we are going to do:
# .travis.yml
language: generic
install:
- gem install bundler jekyll
script:
- ./build.sh
The build.sh
file contains the actual logic. While it is possible to put all the commands in the YAML file directly, it is a bit clumsy. The following listing contains the contents of the build script. Note that we committed the Terraform Linux binary inside the repository so we do not have to download it on every build and make sure to have the correct version.
# build.sh
cd website/static
bundle install
bundle exec jekyll build
cd -
./terraform-linux init
./terraform-linux validate website
if [[ $TRAVIS_BRANCH == 'master' ]]
then
./terraform-linux workspace select prod
./terraform-linux apply -auto-approve website
fi
Notice that we are only deploying the changes to production from the master branch. On other branches Terraform only validates the syntax and checks that all required variables are defined.
To enable the Terraform binary to talk to AWS from within the build server, we also need to setup AWS credentials. This can be done by setting up secret environment variables in the build settings:
Then we only have to enable the repository on the Travis dashboard and trigger a build either by pushing a commit or using the UI. If everything works as expected you will receive a green build:
Conclusion
In this post we have seen how to use Terraform in an automated setting for continuous deployment. Using a combination of the AWS remote state backend and workspaces, we were able to solve the chicken and egg problem when provisioning the remote state resources. We then deployed a Jekyll-generated static website using S3.
In my opinion however, the solution with the state migration and the all the symbolic links is rather complex. If possible I would probably go for local state only and store it directly within the repository. What do you think? Did you ever use remote state with Terraform? How did you provision it? Let me know in the comments.
If you liked this post, you can support me on ko-fi.
Top comments (3)
This is really interesting. I'm a big user of remote state but the S3 bucket and DynamoDB table are shared between lots of components so they're created as a set of minimal bootstrap resources in another project. There's then a wrapper tool which delivers the values wherever Terraform is invoked.
I use workspaces to manage different environments rather than different sets of resources.
Always very keen to see how other people organise their Terraform. How do you find this method? Any drawbacks?
Hi Graham,
Thank you for your comment! Would you mind sharing the wrapper tool? Is it open source? The method shown in this post only works if you are using the DynamoDB table and S3 bucket exclusively, I guess. When sharing the remote state resources I assume you specify a different S3 key in every project and create a new DynamoDB table for each project centrally?
I personally found the workspaces and symbolic links pretty awkward to use and I'd go with committing the local state to version control. I wanted to give it a try though and thought it's worth sharing :)
Hey Frank,
The wrapper code is open but it's not well documented enough or supported outside the organisation in which it's used just yet for it to be properly open-source: github.com/mergermarket/cdflow
You're welcome to take a look and marvel at the lack of warranty.