loading...
Cover image for Delivering APIs at the edge with Cloudflare Workers

Delivering APIs at the edge with Cloudflare Workers

peasey profile image Peasey ・14 min read

TL;DR

The background is given about why Cloudflare Workers were chosen to deliver an API, there's an exploration phase covering constraints, architecture, development, delivery and operations aspects, followed by an implementation phase with demo videos covering using Node.js and VS Code for local development and debugging, logical Cloudflare environments, blue/green deployments, middleware and routing, and observability.

Background

While we were looking at solutions for a new service, we faced uncertainty over some requirements, and if they could be met with a third-party solution we’d found. We also considered if we should build a solution ourselves or wrap the third-party solution to plug any requirement gaps. We decided that the most likely outcomes would require us to build an API of some description. We made good progress on an innovative approach to building APIs using Cloudflare Workers, so we thought we’d share the approach.

This article is a summary of a series of posts I wrote on my blog about this, there’s a GitHub repo accompanying most of the posts so I’ll link to the relevant posts for those that want a deeper dive.

Our high-level API requirements

At the time, our primary concern was the lack of Open ID Connect integration with the third-party solution. We wanted to ensure only end-users that had been authenticated with our identity provider could use the service.

We also needed to store a small amount of data and some processing logic for each user that wasn’t currently configurable with the third-party solution.

We knew that any solution had to be highly available and capable of handling the demand of our global user base.

In line with our design guidelines, we wanted to keep costs and operational complexity to a minimum and leverage serverless technology where possible.

Finally, in line with our CI/CD guidelines, we wanted to automate everything and ensure the solution was always up.

Why Cloudflare Workers?

Good question. Originally, we looked at a more typical serverless architecture in AWS using API Gateway and Lambda functions. The new HTTP API type had just been introduced to API Gateway and we were weighing up the pros and cons of choosing that over the REST API type. As a team, we’d also recently had a frustrating experience trying to automate the delivery of multi-region zero downtime (blue/green deployments) architectures with the serverless tech in AWS.

It just felt like there should be a simpler way to deploy highly available and scalable APIs using serverless technology.

Another team had recently used Cloudflare Workers to process HTTP headers on requests before they hit their API and we thought that was an interesting approach to running code with global availability, scale and performance, and might offer an interesting solution for the API “wrapper” architecture we were considering, without the headache of multi-region architectures and other deployment complexity.

We decided to commit some time to explore the idea.

Exploration

Cloudflare Workers weren’t specifically designed to deliver APIs, so we needed to focus our attention on the following to test the feasibility of the idea:

Runtime constraints

The Workers platform limits are published, we have an enterprise agreement so are subject to the “bundled” limits. For us, the constraints of note are:

CPU runtime

At first glance, 50ms seems low, but it's important to note that this is CPU time you use on the edge servers per request, it's not your request duration. So, while your Worker is waiting for asynchronous I/O to complete, it's not counting towards your CPU usage.

Interestingly, not long after we’d finished looking at this, Cloudflare announced Workers Unbound with the CPU restriction removed altogether, which I think is confirmation that Workers are being used for increasingly more complex use cases.

Programming environment

You have two options for programming Workers: JavaScript or a WebAssembly compatible language. A quick look at both approaches showed that the JavaScript approach seemed more mature and benefited from better community engagement and tooling support.

The Worker JavaScript environment is aligned to Web Workers, so writing JavaScript for Workers is more akin to writing a Worker in a browser than a server-side environment like Node.js. This means care needs to be taken when adding dependencies to ensure they are compatible with the runtime APIs. As an example, you can’t use the standard AWS JavaScript SDK as it doesn’t use the Fetch API for HTTP.

Worker script size

The maximum size for a Worker script is 1MB. This shouldn’t be an issue when using webpack to bundle your JavaScript, and if you use a (smaller) script per Worker rather than sharing a (large) script across all Workers.

Although we did see an issue with this when we added the moment package to perform some date processing - the default package size is very large due to the locale files, but you can optimise it (or just replace it with something else).

API architecture and routing

When building APIs, your service/framework typically allows you to define API routes based on properties of the HTTP request. For RESTful APIs, the HTTP method and path are typically used to map requests to resource handlers. Popular API frameworks such as Express and ASP.NET Core allow you to define middleware that enables you to factor out common tasks into pipelines that can be applied in sequence to multiple API routes.

The route matching capabilities in Cloudflare Workers are quite basic. You can use a wildcard (*) in matching patterns but only at the beginning of the hostname and the end of the path, and there's no support for parameter placeholders. So, the following are ok:

*api.somewhere.com/account*
api.somewhere.com/account/action*
Enter fullscreen mode Exit fullscreen mode

But these aren’t:

api.somewhere.com/*/account*
api.somewhere.com/account/:id/action
Enter fullscreen mode Exit fullscreen mode

The last example above is a valid route, it just won't do what you're probably trying to do, i.e. use :id as a placeholder for any value and provide that value in an easily accessible way in the Worker.

Also, note in the valid examples that the pattern doesn't include the trailing slash of the path before the wildcard, this is so the pattern still matches on requests to the root of said path/resource (with or without the trailing slash).

This all means we must move the API route handling logic into our Worker, as you would with frameworks like Express:

const express = require('express')

const app = express()
app.get('/account/:id', readAccount)

function readAccount(req, res) {
  const id = req.params.id
  ...
}
Enter fullscreen mode Exit fullscreen mode

The above code is configuring the express middleware to run the readAccount function on the get method for paths that match /account/:id in the HTTP request (where :id is a placeholder for an arbitrary value).

Development experience

When developing applications/services, engineers want fast local feedback cycles to quickly iterate on their work and deliver efficiently. Working with cloud services can significantly slowdown that cycle while you're waiting for code to deploy and execute.

Cloudflare provides the wrangler CLI to support local development and publishing of Workers, the dev mode aims to enable a faster local feedback cycle by listening to requests on a local server.

However, the ability to easily debug the code using local development tools such as VS Code is key to effective and efficient development.

It’s also worth considering the consistency of tooling between local development and CI/CD processes.

Delivery experience

Deliverability of the API is crucial. From the outset, we want to know how we're going to provision resources in environments and how we can deploy and roll-back/forward/sideways with zero downtime to ensure high availability.

We're also going to deploy other services in AWS that we’ll be integrating with, so ideally, we’ll have a consistent tooling experience for our CI/CD processes across different service providers.

Operations experience

Once the API is deployed, we want to keep an eye on it and make sure we can react to any issues.

Cloudflare offers some basic Worker metrics you can periodically query via their GraphQL API, but it won’t give you an API centric view, or the ability to easily trigger alerts, so some custom metrics will be required to monitor the API effectively.

By default, log messages in Workers are ephemeral and simply sent to the standard output/error streams. This is ok to support local development and debugging in the Cloudflare workers.dev dashboard, but it would be useful to persist these logs from production workloads to support potential troubleshooting scenarios.

Implementation

After a phase of exploration, we had an idea how we could implement it that would tie all the above together and enable a global serverless API that was cost-effective to run, highly available, scalable, and easy to deliver. So, we built a proof of concept that incorporated the following elements:

Serverless framework

From a delivery point of view, we decided to use the Serverless framework to provide a common approach to provisioning and deploying our Cloudflare and AWS resources, both locally and from our CI/CD processes.

The AWS provider in the Serverless framework is an abstraction over CloudFormation and other AWS service APIs, and the Cloudflare provider is an abstraction over the Cloudflare APIs:

Illustration of multi-provider architecture in the Serverless framework

The plugin model for the Serverless framework allows you to augment/extend the capabilities of each provider where there are gaps in the framework, or if you want to provide custom functionality:

Illustration of plugin augmentation architecture in the Serverless framework

For instance, we wrote a plugin that would hydrate KV (Cloudflare’s key/value data store) with data such as signing certificates and reference data.

Blue/Green deployments

While exploring Cloudflare Workers, the simplicity of the routing capability struck us as a great way to flexibly and quickly change the code that would run for requests to a given endpoint. The idea was to use this flexibility to enable blue/green deployments for our API by using state embedded in a naming convention of the Workers and dynamically update the Worker route mappings at the point of deployment.

By creating a Serverless plugin we could hook into the before:deploy hook to inspect the current Worker route mappings and determine the current slot, and then pre-process the template to configure it for deployment to the next slot. We could do the same for the before:remove hook to ensure the correct resources were removed when required.

In addition to those hooks, we could create plugin commands that are actionable from the Serverless CLI to activate and rotate slots by calling the appropriate Cloudflare APIs.

Those plugin commands would be available locally and in CI/CD processes, so the rotate slot command could be executed at the end of a Continuous Deployment process, or via an approval trigger after a Continuous Delivery process.

Watch a demo of blue/green deployments using the Serverless framework:

You can read more about blue/green deployments with the Serverless framework and details on accessing the code in the blog post on the subject.

Node.js and VS Code

The dev command in the wrangler CLI enables you to send HTTP requests to an instance of your Worker running locally, but to be honest we didn't find the mapping of Workers to scripts and routes in the required wrangler.toml file as intuitive, flexible or extensible as it is with the Serverless framework. We also struggled to find a way to easily launch (i.e. hit F5) into a debugging session with VS Code when using wrangler.

Since we preferred the Serverless framework for provisioning and deploying anyway, we decided to design a development experience that would allow us to use VS Code and Node.js to build and debug our API without using wrangler.

To do that we embedded the principles of substitutable dependencies and substitutable execution context into our design.

Substitutable dependencies is an inversion of control technique that requires identification of specific runtime features you will depend on when running in a given execution context (Cloudflare Workers) that may require an alternative implementation in another execution context (Node.js), and making sure you have a mechanism for substituting the dependencies (a form of dependency injection). An example is environment variables, in Node.js you access process.env and in Cloudflare they are accessible in the global scope.

Substitutable execution context follows on from the principle of substitutable dependencies and is the principle that your code should be appropriately encapsulated so that it is runnable in any execution context, with minimal integration to acquire input and generate output. Practically speaking this involves identifying the entry and exit points of your execution context and ensuring as much of your code as possible is contained within portable abstractions. This allows you to test most of your application code irrespective of the target execution context, and for those thin layers of integration, you can use appropriate mocks and integration tests at appropriate points in your delivery pipeline.

With appropriate abstractions in place for configuration etc and a substitution mechanism that took advantage of the global scope used in Cloudflare Workers, we were able to easily run and test our API resources locally in Node.js. Since we were able to run in a Node.js process, this meant we could create a debug launch configuration in VS Code that allowed us to easily debug via the debugging tools or by hitting F5.

Watch a demo of Worker debugging in VS Code:

Logical environments

The approach above enabled us to iterate quickly while working locally, but we wanted a way to test the integration of our code into Cloudflare Workers while working locally before committing to the shared repo. When we do commit to the shared repo, we want to have CI/CD processes running on our commits and pull requests (PRs) that can deploy our Workers and run integration tests. Having a separate Cloudflare account per developer and CI/CD process isn't feasible, especially when premium features are required, and we share resources such as DNS records/TLS certs.

Enter the logical environment. This is a concept that allows multiple deployments of the same resources to exist in the same physical environment. The concept follows the blue/green deployments approach where an environment label forms part of the naming convention for the routes and Worker scripts and is dynamically embedded at the point of deployment. We modified the Serverless plugin to include the concept of an environment.

Practically speaking this means that each engineer can have a private local environment file (.env) that contains an environment identifier specific to them, which ensures any resources they deploy are uniquely namespaced to them. Likewise, CI/CD processes can set the environment identifier appropriately to create resources for specific purposes, and then remove them at the end of a lifecycle (such as closing/merging a PR).

Watch a demo of a logical environment being used for local development:

Watch a demo of a logical environment being used for a GitHub Pull Request review:

You can read more on using Node.js, VS Code and logical environments and accessing the code in the blog post on the subject.

Routing and Middleware

While the simplicity of the Workers routing is great for enabling use cases like zero-downtime deployments, it’s not great for mapping HTTP requests to API endpoints – but Cloudflare Workers wasn’t designed to be an API gateway.

The solution is not so different from how you might do it in other execution contexts, such as containers if you aren’t using an API gateway - middleware.

We considered the feasibility of running existing middleware frameworks like Express in a Worker, but they’re too dependent on the Node.js runtime, and/or would require extensive customisation/adaptation and unlikely to fit within the 1MB script size limit.

Instead, we borrowed concepts such as route matching and found lightweight modules we could integrate and adapt to enable modular asynchronous pipelines to handle different combinations of HTTP methods and paths.

Watch a demo of middleware with authorisation and validation middleware responding accordingly:

You can read more on the middleware architecture and accessing the code in the blog post on the subject.

AWS CloudWatch Logs and Metrics

Since part of our solution was going to be in AWS anyway, we decided that CloudWatch would be a good option for observability. There’s some impedance between the availability of a global solution like Cloudflare Workers and regional solutions in AWS, but the cross-region reporting capabilities of CloudWatch gave us confidence we could have a global solution to observability if we implemented failure detection and multi-region capabilities in our Workers (although we only implemented a single region for the proof of concept).

There were three options to integrate AWS CloudWatch, which are also relevant for other AWS services, these were:

  1. Direct from Cloudflare Workers to AWS Service APIs, but this required implementing the AWS v4 request signing process with CPU intensive crypto functions.
  2. Via API Gateway, a Lambda function and the AWS SDK, but the cost of running Lambda was orders of magnitude higher than the cost to run the entire API in Cloudflare.
  3. Via API Gateway but mapped directly to the AWS Service APIs, i.e. no Lambda.

We chose the third option as it offered minimal cost and there was no need for CPU intensive crypto in our Workers, balanced against a little bit of complexity to setup the API Gateway mappings.

For logs, we wanted the logger to be easily accessible to all code and for log messages to go to standard output regardless of the execution context. When running in Cloudflare, we also wanted the messages to be persisted so they can be flushed to an observability endpoint at the end of the request. We created a logging abstraction that was substitutable to handle those requirements.

For metrics, we were only interested in creating/seeing them when running in Cloudflare. Most of the metrics could be derived from data in the original request or the response, the exception was duration, for that, we needed to track the start and end time of the request. We created a substitutable observability abstraction that encapsulated the steps to create the stream, log messages and metrics.

The logs and metrics are asynchronously dispatched to the observability endpoint at the end of each Cloudflare Worker request.

Watch a demo of observability for Cloudflare Workers using AWS CloudWatch:

You can read more on observability and accessing the code in the blog post on the subject.

Conclusion and recommendations

It took a little bit of effort to create an ideal development, delivery and operations experience for using Cloudflare Workers as an API. I think in total we spent 1-2 months exploring and implementing it, and at the end of that, we had a good slice of the API ready to go.

My recommendation to Cloudflare would be to provide local development tooling that can be decoupled from wrangler and easily integrated into local development and debugging workflows. It would be useful to allow more complex route matching too.

I love the simplicity of deploying Cloudflare Workers and the use cases they open up, due to their global scale and performance characteristics I think they’re perfect for so-called “wrapper” APIs, or abstraction layers, that enable you to mitigate vendor lock-in, plug feature gaps and allow you to augment the vendor offering, or even provide a short to long term migration strategy from a vendor based solution to a bespoke solution. You could even just use as a filter layer for authentication, authorisation and validation for other APIs, that would remove a lot of duplication and deployment trade-offs you get with some other API technologies.

Edge network serverless computing could be the next big thing, but a major part of that is having global data persistence solutions. Not long after we’d completed our work on this, Cloudflare announced the “Durable Objects” beta, which is a new way of thinking about persistence, but a step in that direction. There are also services like Fauna emerging to offer solutions in that space. It’s exciting times for the way we think about cloud computing, I think the ultimate experience for cloud computing should be to simply deploy code to a cloud service and have it run performantly at scale and near your end-users without having to concern ourselves with choosing regions and the trade-offs in multi-region architectures. That's the dream, and I don't think we're very far away.

Discussion

pic
Editor guide