DEV Community

Cover image for Use Cloudflare Workers to store your Terraform states
Adrien F
Adrien F

Posted on

Use Cloudflare Workers to store your Terraform states

TL;DR: Check out the resulting worker on Github

When starting a Cloudflare focused project, you’re probably tempted like me to use Terraform to configure everything the right way, and quickly comes the question of where to store the states.

Terraform Cloud offers a free tier but is limited to 500 managed resources across all states (from what I understand of the pricing page), and if you're really starting from scratch, you might not already have one of the supported backends ready, be it an object storage engine, Kubernetes or PostgreSQL. So what are we left with?

Well, for a few years now, Cloudflare has offered an object storage engine called R2! And what’s even better, is that it’s compatible with the S3 API, enabling us to use it with the Terraform S3 backend. Will it be that easy?

The non-obvious S3 way

You might be thinking that since Cloudflare implemented the S3 API, it should be quite easy to configure Terraform to use it.

Create a new R2 bucket in the Cloudflare dashboard, and then a new API token scoped to the bucket. You will be provided with the proper credentials and URLs to provide to Terraform:

R2 API Creation

Finally, we just need to override the S3 URL with our bucket, provide our access keys and voilà, we can run terraform init and go on our way. You start to read the documentation and see that you can indeed override the endpoints. It might look like this:

terraform {
  backend "s3" {
    bucket     = "tf-states"
    key        = "foo.tfstate"
    endpoints  = { s3 = "https://xxx.r2.cloudflarestorage.com" }
    access_key = "xxx"
    secret_key = "xxx"
  }
}
Enter fullscreen mode Exit fullscreen mode

⚠ Please don't commit your access/secret keys, use environment variables or a secret engine 😅

But sadly, you will quickly get an error:

❯ terraform init

Initializing the backend...
╷
│ Error: Missing region value
│
│   on main.tf line 2, in terraform:
│    2:   backend "s3" {
│
│ The "region" attribute or the "AWS_REGION" or "AWS_DEFAULT_REGION" environment variables must be set.
Enter fullscreen mode Exit fullscreen mode

Alright, that makes sense, let’s add a region parameter. Reading R2 documentation, we can safely set it to us-east-1 .

This will look like this, will we finally get lucky?

terraform {
  backend "s3" {
    bucket     = "tf-states"
    key        = "foo.tfstate"
    endpoints  = { s3 = "https://xxx.r2.cloudflarestorage.com" }
    access_key = "xxx"
    secret_key = "xxx"
    region = "us-east-1"
  }
}
Enter fullscreen mode Exit fullscreen mode

Wrong!

❯ terraform init

Initializing the backend...
╷
│ Error: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: abb0da81-771e-4724-ad9b-842ceb81e6de, api error InvalidClientTokenId: The security token included in the request is invalid.
│
│
╵
Enter fullscreen mode Exit fullscreen mode

And so this will go on and on. This Github issue summarizes everything everyone went through to identify the right set of parameters, and you will also find more information if you want to use the AWS provider to use some S3 resources with R2. This is how your backend configuration should look like:

terraform {
  backend "s3" {
    bucket     = "tf-stats"
    key        = "foo.tfstate"
    endpoints  = { s3 = "https://xxx.r2.cloudflarestorage.com" }
    access_key = "xxx"
    secret_key = "xxx"
    region     = "us-east-1"

    skip_credentials_validation = true
    skip_region_validation      = true
    skip_requesting_account_id  = true
    skip_metadata_api_check     = true
    skip_s3_checksum            = true
  }
}
Enter fullscreen mode Exit fullscreen mode

And finally, it goes through 🎉

❯ terraform init

Initializing the backend...

Successfully configured the backend "s3"! Terraform will automatically
use this backend unless the backend configuration changes.

Initializing provider plugins...

Terraform has been successfully initialized!
Enter fullscreen mode Exit fullscreen mode

So we can store states, that’s great, right? The configuration is a bit verbose, and you’ll need to repeat these as well so if you use CDKTF, you can quickly write it as a function:

import { Construct } from "constructs";
import { S3Backend } from "cdktf";

/**
 * Configure a S3 backend for Cloudflare R2
 * You will need to set the following environment variables:
 * - AWS_ACCESS_KEY_ID > xxx
 * - AWS_SECRET_ACCESS_KEY > xxx
 * - AWS_ENDPOINT_URL_S3 > https://xxx.r2.cloudflarestorage.com
 *
 * @param scope This needs to be a Stack
 * @param stateName The name of the state file
 * @param bucketName The name of the R2 bucket
 * @returns
 */
export const cloudflareR2Backend = (
  scope: Construct,
  stateName: string,
  bucketName: string
): S3Backend =>
  new S3Backend(scope, {
    bucket: bucketName,
    key: stateName,
    region: "us-east-1",
    skipCredentialsValidation: true,
    skipMetadataApiCheck: true,
    skipRegionValidation: true,
    skipRequestingAccountId: true,
    skipS3Checksum: true,
  });
Enter fullscreen mode Exit fullscreen mode

⚠️ At the time of writing, all the S3 backend configuration keys were not backported to CDKTF. They’ve been updated with this PR and will land in the v0.20 release. You can install a pre-release in the meantime.

Alright, this was fun right, we’re done, right? Well… Not exactly. You see, one key feature of the AWS S3 Backend is that it also supports state locking via DynamoDB, a NoSQL datastore. These locks prevent others from overwriting the state file while you’re also doing something on the stack and are pretty much needed when multiple people work on the same projects. If you need this feature, please read on.

The Worker way

Cloudflare does not have a DynamoDB API compatible service, so if we want to have lock support, we have to either implement some kind of Frankenstein DynamoDB API Worker for Terraform to use it (which sounds like a fun weekend to be honest), or evaluate another backend. In this case, we’ll look at the HTTP backend and implement it with a Cloudflare Worker, backed by R2.

Design

The HTTP backend needs an API service to implement the basic GET, POST, and DELETE methods. For the lock feature, it can use the LOCK and UNLOCK methods from RFC2518 (WebDav). Not all frameworks and HTTP servers might support these unusual methods so they are configurable by Terraform. In our case, no need to do that, they will work natively.

So we will develop a quick Worker to manage Terraform states with R2 storage. What about our locks?

You might be aware that Cloudflare has a KV value product which could be a tempting use case for the locks and it was my first thought but in the case of locks, consistency is important (to avoid race conditions if you have a coworker in the US and another in Asia), and only R2 has a strong consistency model so we will also use it to store locks. (I’m also aware of the Durable Objects product, it could be an interesting alternative!)

Writing a worker

I’ve been wanting to write a Worker for quite some time and already had a library on my radar: Hono

Hono homepage

It is a full-featured HTTP router with support for Cloudflare Workers and so onward I went with it and created my project like so: npm create hono@latest tf-state-worker

This will set up a basic project and configure wrangler, a Cloudflare tool to develop and deploy your brand-new worker.

From then, it was a matter of implementing the API (which is not exactly documented, a lot of reading Terraform source code) for reading and writing states, as well as the locking logic:

export const statesRouter = new Hono<{ Bindings: { STATE_BUCKET: R2Bucket }>();

statesRouter.get('/:stateId{[a-zA-Z0-9][\\w\\-\\.]*}', async (c) => { ... });
statesRouter.post('/:stateId{[a-zA-Z0-9][\\w\\-\\.]*}', async (c) => { ... });
statesRouter.delete('/:stateId{[a-zA-Z0-9][\\w\\-\\.]*}', async (c) => { ... });
statesRouter.on('LOCK', '/:stateId{[a-zA-Z0-9][\\w\\-\\.]*}', async (c) => { ... });
statesRouter.on('UNLOCK', '/:stateId{[a-zA-Z0-9][\\w\\-\\.]*}', async (c) => { ... });
Enter fullscreen mode Exit fullscreen mode

As a bonus, I quickly wrote a listing endpoint, which could also probably send some HTML later:

❯ curl https://tf-state-worker.xxx.workers.dev/states
{"states":[{"id":"states/foo.tfstate","size":247,"uploaded":"2024-01-07T22:21:52.030Z"}],"locks":[]}
Enter fullscreen mode Exit fullscreen mode

The result

The resulting worker can be found at https://github.com/adrien-f/tf-state-worker and you can follow along the README to deploy it on your Cloudflare account. As an interesting exercise to you dear reader, I suppose with a few tweaks this could be deployed as a Deno worker with their KV storage solution.

Clone the repository on your development environment, install everything then move the wrangler_example.toml to wrangler.toml, and finally run npx wrangler deploy to get an URL pointing to your new worker:

 npx wrangler deploy
 ⛅️ wrangler 3.22.3
-------------------
Your worker has access to the following bindings:
- R2 Buckets:
  - STATE_BUCKET: tf-states
- Vars:
  - AUTH_PLUGIN: "fail"
Total Upload: 55.37 KiB / gzip: 12.93 KiB
Uploaded tf-state-worker (3.06 sec)
Published tf-state-worker (6.00 sec)
  https://tf-state-worker.xxx.workers.dev
Current Deployment ID: xxx
Enter fullscreen mode Exit fullscreen mode

Aaaand wait a minute, no security? Dear astute reader, we're indeed missing something quite important. We do not want our worker to accept any requests to read and write anything in your R2 bucket, especially since states could contain sensitive data 😱!

Since security could be implemented in many ways, by default the worker will reject all requests. There is a plugin system with the first one being a basic username/password combo, but JWT support is planned. By default, the worker will not work until an implementation is configured.

As an alternative, you could also put that worker behind Zero Trust or an mTLS-secured route with API Shield. The choice is yours.

With that done, it’s just a matter of configuring your backend:

terraform {
  backend "http" {
    address = "https://foo:bar@tf-state-worker.xxx.workers.dev/states/foo"
    lock_address = "https://foo:bar@tf-state-worker.xxx.workers.dev/states/foo"
    unlock_address = "https://foo:bar@tf-state-worker.xxx.workers.dev/states/foo"
  }
}
Enter fullscreen mode Exit fullscreen mode

And there we go, a functional Terraform state solution with locks:

 terraform apply -auto-approve
Acquiring state lock. This may take a few moments...

Changes to Outputs:
  + foo = "bar"

You can apply this plan to save these new output values to the Terraform state, without changing any real infrastructure.

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

foo = "bar"
Enter fullscreen mode Exit fullscreen mode

What’s next?

On the TODO list I have for the worker is a better HTML view, some audit logs, webhook support, and more. Contributions are super much welcomed so let’s get in touch in Github's issues 🙂

Terraform also has a Remote backend which is used by some external vendors like Jfrog to store state, but implementation documentation can not be found and I've reached my limit of reverse-engineering APIs for the weekend, until next time!

Top comments (0)