DEV Community: Peasey

Using AWS Lambda and Slack to find Xbox Series X stock, so you don't have to

Peasey — Mon, 28 Dec 2020 08:56:23 +0000

Creating an event-driven serverless web browsing and notification tool to automate web-based tasks with AWS Lambda, Chrome, Puppeteer and Slack.

TL;DR

Some fun examples including stock availability checks for the Xbox Series X are used to demonstrate the automation of web browsing tasks and notifications using AWS Lambda, headless Chrome,
Puppeteer and Slack. The design decisions are explained, the code repo and implementation notes are shared, and video demos show the tool in action.

The idea

During lockdown earlier this year, I wanted to buy a specific outdoor storage solution for the garden. However, this particular product was only available from one retailer and seemingly always out of stock. The retailer didn’t have a stock alerting feature, and I got tired of periodically checking the website to see it was still out of stock. I decided it would be cool to have a little tool that did it for me and notify me when it’s back in stock. I've been meaning to write this post for a while, then just recently, stock availability for the Xbox Series X became a thing, so a good topical reason to do it.

Design goals

These are the design goals I had for the tool:

I’d like to be able to quickly script the automation of basic web browsing tasks (script/test/deploy in around 30 mins)
I’d like to run multiple tasks
I’d like to run the tasks on a schedule, such as daily or hourly, with each task having a different schedule
I’d like to receive a notification on my phone when the task has something worth telling me, i.e. something is in stock or there was an unexpected error while running the task (so I can investigate/fix it)
I don’t want to spend much (any) money to do this

Conceptual design

This is the conceptual design of the tool I want to create:

Technology selection

Since we were in lockdown, I had some spare time on my hands and decided to invest some time researching how to build a tool/framework that would allow me to easily automate web browsing tasks.

Programming environment

JavaScript/Node.js and its package ecosystem and community is my goto to get up and running quickly, so I’d be using that to build the tool and task framework.

Web browser automation

There are several tools in the JavaScript/Node.js ecosystem you can use to do this, Puppeteer seems to be the most popular, and I’ve used it successfully for other automation tasks recently. Puppeteer is headless by default so ideal for automation.

Zero-cost infrastructure

The cost goal might seem a bit unreasonable, but due to the scheduling requirement, I knew this was a perfect fit for an event-driven serverless architecture. I’ve worked with AWS Lambda quite a lot for work and personal projects, and the free tier is quite generous, for personal projects I don’t think I’ve paid anything for it yet - if I have, it’s been pennies. However, I needed to validate if I could run web browsing tasks within the constraints of a Lambda function.

Headless browser

Puppeteer automates Chromium browsers (headless and non-headless), but can Chromium run in a Lambda function? Not without some great work from the community to create a Chrome build for the AWS Lambda runtime. There’s also a Lambda layer solution for this too, although I haven’t tried this approach yet. Another great feature of this package is that it runs headless when running in Lambda and non-headless when running locally - so it’s frictionless to develop, test and run your scripts.

Notifications

Getting push notifications on your phone usually requires you have an app you can publish the notification to via the vendor’s push notification service. There’s no chance I’m developing an app just to get notifications. I could use Twilio/SNS to send SMS messages instead of push notifications, but SMS isn’t a very flexible messaging format, plus it wouldn’t be completely free (although arguably a negligible cost for my usage). I already use Slack to get notifications for AWS billing alerts etc via SNS, and I know its Webhook API provides a simple but powerful way to deliver fairly rich messages that can appear as notifications on your devices. Plus it would be a cost-free solution (for my usage).

Validation

Feeling comfortable I had all the components to build this tool, I created a quick proof of concept to validate the technology choices and the approach. I used the serverless framework to get up and running quickly with a single function that ran a basic web scraping task using chrome-aws-lambda and puppeteer-core. The serverless framework enables you to add AWS CloudWatch event rules as schedules to your Lambda functions with a few of lines of YAML. Sure enough, the solution was packaged in under 50MB and once deployed it ran on schedule and did exactly what I expected.

Design

After the technology selection and validation, the conceptual design evolved into something more concrete:

Implementation

I’ve published the code for the tool on Github with the examples from the demos further on in the post, feel free to use it and adapt it. Below are some notes on the implementation:

Plugins

To make it quick and easy to add/remove tasks in the future I decided to create a plugin model where the tasks are dynamically loaded at runtime from a specified directory. The plugin implementation recursively scans the specified directory and requires any JavaScript modules it finds:

if (!pluginPath.endsWith('.test.js') && pluginPath.endsWith('.js')) {
  if (!require.cache[pluginPath]) {
    log.info(`loading plugin: ${pluginPath}`)
    // eslint-disable-next-line import/no-dynamic-require
    return require(pluginPath)(container)
  }
  log.info(`plugin already loaded: ${pluginPath}`)
}

Each plugin is passed a plugin container (array) that it should push itself into. I also wanted to develop my tasks using TDD, and my preference is to colocate the tests file with the subject file, so I had to specifically ignore test scripts in the loading sequence (line 1).

I originally designed this as an ephemeral process and loaded the plugins on each invocation, but it turns out a Lambda process can hang around for a while, which makes sense from an optimisation point of view (especially if it has scheduled events within a relatively short time frame). Anyway, I had to add a check to see if the plugin was already loaded (line 2).

Tasks

Now adding a task is as simple as adding a new JavaScript module, but what would a task look like? I decided each task should have the following structure:

name: used as the display name in notifications
url: the entry point for the task and also a link in the notification for quick access
emoji: to easily distinguish the content for each task in a notification I decided to include an emoji as a prefix to the content
schedule: the event schedule to run the task with, I decided to use the AWS CloudWatch ‘rate’ expression for event schedules as it covers my needs and is easy to parse (I can always add ‘cron’ support later if I ever need it)
run: a function that performs the task (async of course), it should return a result that can be used in subsequent notifications
shouldNotify: a function that is provided with the result of the task and returns true/false to signal whether a notification should be sent, this enables flexibility about what gets notified. For example, I might only want a notification if stock is available or if the task failed, otherwise don’t notify me at all.

Here’s a basic example from the task scheduling test for a task that runs every 5 minutes (demo later on):

const task = () => ({
  name: 'Every 5 mins',
  url: 'http://localhost/task/minutes/5',
  emoji: ':five:',
  schedule: 'rate(5 minutes)',
  shouldNotify: () => true,
  run: async function run() {
    return `${this.name} just ran`
  },
})

A plugin task provider loads the tasks from a specified location and parses the schedule into a more filterable object representation using the schedule parser:

const matches = schedule.match(/(.*)\((\d*) (.*)\)/)
if (matches && matches.length >= 4) {
  if (matches[1] === 'rate') {
    return {
      type: 'rate',
      unit: matches[3],
      value: parseInt(matches[2], 10),
    }
  }
}

Now a chainable task filter can easily filter a list of tasks based on their schedules.

Task schedules

I want a single Lambda function to run the tasks, which means I'll need multiple event schedules defined on the function. Since one of my design goals is to make it as simple as possible to add a new task, I don't want to have to remember to add new schedules to my function as and when the need for them comes up. I'd prefer the schedule requirements were picked up automatically from the tasks that have been defined.

One of the reasons I chose the serverless framework is due to its extensibility, I've previously written about using plugins and lifecycle hooks to add new capabilities. I created a serverless framework plugin that hooks into the before:package:initialize lifecycle hook to load the tasks and build a unique list of schedules, which it adds to the function definition dynamically before the function is packaged and deployed.

Task host

The task host is the execution environment that receives the invocation event and is responsible for resolving the invocation schedule. In this case, the host is a Lambda function, and unfortunately the event payload only contains a reference to the CloudWatch event rule ARN that invoked the Lambda, rather than the rule itself. So, I have to jump through some hoops to split the rule ARN to get the rule name using the resource parser, then get the rule with its schedule from the CloudWatch events API before parsing it with the schedule parser. This all comes together in the host to load the tasks and filter them based on the invocation schedule, and if there are any, runs them via the task runner and awaits the results:

const ruleName = resourceParser.parse({ resource: event.resources[0] })
if (ruleName) {
  const rule = await rules.byName({ name: ruleName })
  if (rule) {
    log.info(
      `invocation schedule is ${rule.schedule.type}(${rule.schedule.value} ${rule.schedule.unit})`,
    )
    log.info('loading tasks')
    const tasks = await taskProvider.tasks()
    if (tasks.length > 0) {
      log.info(`loaded ${tasks.length} tasks`)
      const scheduledTasks = taskFilter(tasks).schedule(rule.schedule).select()
      log.info(`running ${scheduledTasks.length} scheduled tasks`)
      result.tasks = await runner.run({ tasks: scheduledTasks })
      result.tasks.total = tasks.length
      result.completed = true
      log.info('done')
    }
  } else {
    log.info('could not parse the schedule')
  }
}

The host augments the result from the task runner with the total tasks provided to the runner and signals that the process completed successfully.

Task runner

The first thing the task runner does is map through all the provided tasks and runs them, adding any successfully run tasks and their results to a list of successful runs, and the failed tasks and their results to a list of failed runs, which are returned with a count of the tasks run:

const result = {
  run: 0,
  succeeded: [],
  failed: [],
}

const promises = tasks.map(async (task) => {
  try {
    log.info(`running ${task.name} task`)
    result.run += 1
    const taskResult = await task.run()
    result.succeeded.push({ task, result: taskResult })
  } catch (err) {
    log.error(`error running ${task.name} task`, err)
    result.failed.push({ task, result: err })
  }

  return result
})

await Promise.all(promises)

return result

Once the task runs are complete, the task runner determines which tasks should have notifications and sends them via the notifier.

Notifier

In this case, the notifier is sending the notifications via Slack. First, each task result is summarised into a block of text:

text: `<${success.task.url}|${success.task.name}>\n${success.task.emoji} ${success.result}`

Failed tasks are summarised similarly, except an ❗ emoji is used.

The task result summaries (for success and failures) are sent in a single Slack message, with each summary in a separate block and interspersed with dividers:

const message = {
  blocks: [],
}

const toBlock = (summary) => ({
  type: 'section',
  text: {
    type: 'mrkdwn',
    text: summary.text,
  },
})

const blocks = summaries.map(toBlock)

const divider = {
  type: 'divider',
}

message.blocks = intersperse(blocks, divider)

return message

The message is then sent to the Slack Webhook endpoint configured in the environment:

const endpoint = process.env.SLACK_ENDPOINT
...
const response = await fetch(endpoint, {
  method: 'POST',
  body: JSON.stringify(message),
  headers: { 'Content-Type': 'application/json' },
})

That’s the gist of it, time for some demos.

Demos

I have 2 demos for this tool. The first demo is of a test I created to ensure scheduled events worked with tasks as expected. The second is a more practical example of some real-world tasks, a daily check for rumours about my football club (Newcastle United) and a topical/seasonal example, checking stock availability for an Xbox Series X.

Schedule task runner

I set up this demo to test the scheduled running of tasks, it consists of 4 tasks that are scheduled to run every 5 minutes, 10 minutes, once an hour and every 2 hours. The tasks don’t do much other than return some text detailing that they ran, but each has a number emoji so I can see if it’s working correctly:

Footy gossip and Xbox Series X stock checks

Examples of some tasks I’m using right now are to scrape any rumours about Newcastle United from the BBC football gossip page which I run on a daily schedule, and checking the Xbox website for stock availability of the Series X, which I run on an hourly schedule.

Footy gossip

This task loads the gossip page, finds all the individual paragraphs and applies a regular expression (rumourMatcher) to filter paragraphs that contain the words Newcastle or Toon:

const rumourMatcher = /(Newcastle|Toon)/
...
const page = await browser.newPage()

await page.goto(url)
const allRumours = (await page.$$('article div p')) || []

log.info(`found ${allRumours.length} total rumours...`)

const text = await Promise.all(
  [...allRumours].map((rumour) => rumour.getProperty('innerText').then((item) => item.jsonValue()),
),)

const matchedRumours = text.filter((rumour) => rumour.match(context.rumourMatcher))

log.info(`found ${matchedRumours.length} matching rumours...`)

result = matchedRumours.length > 0 ? matchedRumours.join(`\n\n`) : 'No gossip today.'

Any matching rumours are concatenated together with some spacing lines, and if none are matched the text ‘No gossip today.’ is returned. The task is configured with a football emoji.

Xbox Series X stock availability

This task loads the stock availability page for the standalone Xbox Series X, finds all the retailers, extracts the retailer name (or domain) from the alt text of the logo image and the stock availability text:

const page = await browser.newPage()

await page.goto(url)
const retailerElements = (await page.$$('div.hatchretailer')) || []

log.info(`found ${retailerElements.length} retailers...`)

const retailerName = async (retailer) =>
retailer.$eval(
  `span.retlogo img`,
  (element) => element.getAttribute('alt').slice(0, -' logo'.length), // trim ' logo' off the end of the alt text to get the retailer name
)

const retailerStock = async (retailer) =>
retailer.$eval(`span.retstockbuy span`, (element) => element.innerHTML)

const hasStock = (retailers) =>
retailers.reduce((acc, curr) => {
  if (curr.stock.toUpperCase() !== 'OUT OF STOCK') {
    acc.push(curr)
  }

  return acc
}, [])

const retailers = await Promise.all(
  [...retailerElements].map(async (retailer) => ({
    name: await retailerName(retailer),
    stock: await retailerStock(retailer),
  })),
)

const retailersWithStock = hasStock(retailers)

result =
  retailersWithStock.length > 0
  ? retailersWithStock.map((retailer) => `${retailer.name} (${retailer.stock})`).join(`\n\n`)
: 'No stock.'

I don’t know what the text is when there is stock, so I’m testing the stock availability text for anything that isn’t ‘OUT OF STOCK’ to determine retailers that might have stock, and again, concatenating any retailers with potential stock together with some spacing lines, and if none are matched the text ‘No stock.’ is returned. The task is configured with a joystick emoji.

Here are the tasks in action:

Note: I changed the schedules to 1 minute to quickly demo the tasks running.

Wrapping up

Well if you didn’t unwrap an Xbox Series X for Xmas, now you can be one of the first to know when they’re available again. I’ve shown you some fun examples of how you can use this technology, it’s especially useful where you want to act on data that isn’t available via other means, such as an alert or API. There's loads of things you can do, for fun or profit, I'll leave it to your imagination - the world wide web is your oyster.

The original title of this article (Using AWS Lambda and Slack to browse the web, so you don't have to) was published on my blog.

Delivering APIs at the edge with Cloudflare Workers

Peasey — Mon, 16 Nov 2020 14:53:10 +0000

TL;DR

The background is given about why Cloudflare Workers were chosen to deliver an API, there's an exploration phase covering constraints, architecture, development, delivery and operations aspects, followed by an implementation phase with demo videos covering using Node.js and VS Code for local development and debugging, logical Cloudflare environments, blue/green deployments, middleware and routing, and observability.

Background

While we were looking at solutions for a new service, we faced uncertainty over some requirements, and if they could be met with a third-party solution we’d found. We also considered if we should build a solution ourselves or wrap the third-party solution to plug any requirement gaps. We decided that the most likely outcomes would require us to build an API of some description. We made good progress on an innovative approach to building APIs using Cloudflare Workers, so we thought we’d share the approach.

This article is a summary of a series of posts I wrote on my blog about this, there’s a GitHub repo accompanying most of the posts so I’ll link to the relevant posts for those that want a deeper dive.

Our high-level API requirements

At the time, our primary concern was the lack of Open ID Connect integration with the third-party solution. We wanted to ensure only end-users that had been authenticated with our identity provider could use the service.

We also needed to store a small amount of data and some processing logic for each user that wasn’t currently configurable with the third-party solution.

We knew that any solution had to be highly available and capable of handling the demand of our global user base.

In line with our design guidelines, we wanted to keep costs and operational complexity to a minimum and leverage serverless technology where possible.

Finally, in line with our CI/CD guidelines, we wanted to automate everything and ensure the solution was always up.

Why Cloudflare Workers?

Good question. Originally, we looked at a more typical serverless architecture in AWS using API Gateway and Lambda functions. The new HTTP API type had just been introduced to API Gateway and we were weighing up the pros and cons of choosing that over the REST API type. As a team, we’d also recently had a frustrating experience trying to automate the delivery of multi-region zero downtime (blue/green deployments) architectures with the serverless tech in AWS.

It just felt like there should be a simpler way to deploy highly available and scalable APIs using serverless technology.

Another team had recently used Cloudflare Workers to process HTTP headers on requests before they hit their API and we thought that was an interesting approach to running code with global availability, scale and performance, and might offer an interesting solution for the API “wrapper” architecture we were considering, without the headache of multi-region architectures and other deployment complexity.

We decided to commit some time to explore the idea.

Exploration

Cloudflare Workers weren’t specifically designed to deliver APIs, so we needed to focus our attention on the following to test the feasibility of the idea:

Runtime constraints

The Workers platform limits are published, we have an enterprise agreement so are subject to the “bundled” limits. For us, the constraints of note are:

CPU runtime

At first glance, 50ms seems low, but it's important to note that this is CPU time you use on the edge servers per request, it's not your request duration. So, while your Worker is waiting for asynchronous I/O to complete, it's not counting towards your CPU usage.

Interestingly, not long after we’d finished looking at this, Cloudflare announced Workers Unbound with the CPU restriction removed altogether, which I think is confirmation that Workers are being used for increasingly more complex use cases.

Programming environment

You have two options for programming Workers: JavaScript or a WebAssembly compatible language. A quick look at both approaches showed that the JavaScript approach seemed more mature and benefited from better community engagement and tooling support.

The Worker JavaScript environment is aligned to Web Workers, so writing JavaScript for Workers is more akin to writing a Worker in a browser than a server-side environment like Node.js. This means care needs to be taken when adding dependencies to ensure they are compatible with the runtime APIs. As an example, you can’t use the standard AWS JavaScript SDK as it doesn’t use the Fetch API for HTTP.

Worker script size

The maximum size for a Worker script is 1MB. This shouldn’t be an issue when using webpack to bundle your JavaScript, and if you use a (smaller) script per Worker rather than sharing a (large) script across all Workers.

Although we did see an issue with this when we added the moment package to perform some date processing - the default package size is very large due to the locale files, but you can optimise it (or just replace it with something else).

Note: the script size limitation is no longer 1MB, recently it got bumped up to 25MB.

API architecture and routing

When building APIs, your service/framework typically allows you to define API routes based on properties of the HTTP request. For RESTful APIs, the HTTP method and path are typically used to map requests to resource handlers. Popular API frameworks such as Express and ASP.NET Core allow you to define middleware that enables you to factor out common tasks into pipelines that can be applied in sequence to multiple API routes.

The route matching capabilities in Cloudflare Workers are quite basic. You can use a wildcard (*) in matching patterns but only at the beginning of the hostname and the end of the path, and there's no support for parameter placeholders. So, the following are ok:

*api.somewhere.com/account*
api.somewhere.com/account/action*

But these aren’t:

api.somewhere.com/*/account*
api.somewhere.com/account/:id/action

The last example above is a valid route, it just won't do what you're probably trying to do, i.e. use :id as a placeholder for any value and provide that value in an easily accessible way in the Worker.

Also, note in the valid examples that the pattern doesn't include the trailing slash of the path before the wildcard, this is so the pattern still matches on requests to the root of said path/resource (with or without the trailing slash).

This all means we must move the API route handling logic into our Worker, as you would with frameworks like Express:

const express = require('express')

const app = express()
app.get('/account/:id', readAccount)

function readAccount(req, res) {
  const id = req.params.id
  ...
}

The above code is configuring the express middleware to run the readAccount function on the get method for paths that match /account/:id in the HTTP request (where :id is a placeholder for an arbitrary value).

Development experience

When developing applications/services, engineers want fast local feedback cycles to quickly iterate on their work and deliver efficiently. Working with cloud services can significantly slowdown that cycle while you're waiting for code to deploy and execute.

Cloudflare provides the wrangler CLI to support local development and publishing of Workers, the dev mode aims to enable a faster local feedback cycle by listening to requests on a local server.

However, the ability to easily debug the code using local development tools such as VS Code is key to effective and efficient development.

It’s also worth considering the consistency of tooling between local development and CI/CD processes.

Delivery experience

Deliverability of the API is crucial. From the outset, we want to know how we're going to provision resources in environments and how we can deploy and roll-back/forward/sideways with zero downtime to ensure high availability.

We're also going to deploy other services in AWS that we’ll be integrating with, so ideally, we’ll have a consistent tooling experience for our CI/CD processes across different service providers.

Operations experience

Once the API is deployed, we want to keep an eye on it and make sure we can react to any issues.

Cloudflare offers some basic Worker metrics you can periodically query via their GraphQL API, but it won’t give you an API centric view, or the ability to easily trigger alerts, so some custom metrics will be required to monitor the API effectively.

By default, log messages in Workers are ephemeral and simply sent to the standard output/error streams. This is ok to support local development and debugging in the Cloudflare workers.dev dashboard, but it would be useful to persist these logs from production workloads to support potential troubleshooting scenarios.

Implementation

After a phase of exploration, we had an idea how we could implement it that would tie all the above together and enable a global serverless API that was cost-effective to run, highly available, scalable, and easy to deliver. So, we built a proof of concept that incorporated the following elements:

Serverless framework

From a delivery point of view, we decided to use the Serverless framework to provide a common approach to provisioning and deploying our Cloudflare and AWS resources, both locally and from our CI/CD processes.

The AWS provider in the Serverless framework is an abstraction over CloudFormation and other AWS service APIs, and the Cloudflare provider is an abstraction over the Cloudflare APIs:

The plugin model for the Serverless framework allows you to augment/extend the capabilities of each provider where there are gaps in the framework, or if you want to provide custom functionality:

For instance, we wrote a plugin that would hydrate KV (Cloudflare’s key/value data store) with data such as signing certificates and reference data.

Blue/Green deployments

While exploring Cloudflare Workers, the simplicity of the routing capability struck us as a great way to flexibly and quickly change the code that would run for requests to a given endpoint. The idea was to use this flexibility to enable blue/green deployments for our API by using state embedded in a naming convention of the Workers and dynamically update the Worker route mappings at the point of deployment.

By creating a Serverless plugin we could hook into the before:deploy hook to inspect the current Worker route mappings and determine the current slot, and then pre-process the template to configure it for deployment to the next slot. We could do the same for the before:remove hook to ensure the correct resources were removed when required.

In addition to those hooks, we could create plugin commands that are actionable from the Serverless CLI to activate and rotate slots by calling the appropriate Cloudflare APIs.

Those plugin commands would be available locally and in CI/CD processes, so the rotate slot command could be executed at the end of a Continuous Deployment process, or via an approval trigger after a Continuous Delivery process.

Watch a demo of blue/green deployments using the Serverless framework:

You can read more about blue/green deployments with the Serverless framework and details on accessing the code in the blog post on the subject.

Node.js and VS Code

The dev command in the wrangler CLI enables you to send HTTP requests to an instance of your Worker running locally, but to be honest we didn't find the mapping of Workers to scripts and routes in the required wrangler.toml file as intuitive, flexible or extensible as it is with the Serverless framework. We also struggled to find a way to easily launch (i.e. hit F5) into a debugging session with VS Code when using wrangler.

Since we preferred the Serverless framework for provisioning and deploying anyway, we decided to design a development experience that would allow us to use VS Code and Node.js to build and debug our API without using wrangler.

To do that we embedded the principles of substitutable dependencies and substitutable execution context into our design.

Substitutable dependencies is an inversion of control technique that requires identification of specific runtime features you will depend on when running in a given execution context (Cloudflare Workers) that may require an alternative implementation in another execution context (Node.js), and making sure you have a mechanism for substituting the dependencies (a form of dependency injection). An example is environment variables, in Node.js you access process.env and in Cloudflare they are accessible in the global scope.

Substitutable execution context follows on from the principle of substitutable dependencies and is the principle that your code should be appropriately encapsulated so that it is runnable in any execution context, with minimal integration to acquire input and generate output. Practically speaking this involves identifying the entry and exit points of your execution context and ensuring as much of your code as possible is contained within portable abstractions. This allows you to test most of your application code irrespective of the target execution context, and for those thin layers of integration, you can use appropriate mocks and integration tests at appropriate points in your delivery pipeline.

With appropriate abstractions in place for configuration etc and a substitution mechanism that took advantage of the global scope used in Cloudflare Workers, we were able to easily run and test our API resources locally in Node.js. Since we were able to run in a Node.js process, this meant we could create a debug launch configuration in VS Code that allowed us to easily debug via the debugging tools or by hitting F5.

Watch a demo of Worker debugging in VS Code:

Logical environments

The approach above enabled us to iterate quickly while working locally, but we wanted a way to test the integration of our code into Cloudflare Workers while working locally before committing to the shared repo. When we do commit to the shared repo, we want to have CI/CD processes running on our commits and pull requests (PRs) that can deploy our Workers and run integration tests. Having a separate Cloudflare account per developer and CI/CD process isn't feasible, especially when premium features are required, and we share resources such as DNS records/TLS certs.

Enter the logical environment. This is a concept that allows multiple deployments of the same resources to exist in the same physical environment. The concept follows the blue/green deployments approach where an environment label forms part of the naming convention for the routes and Worker scripts and is dynamically embedded at the point of deployment. We modified the Serverless plugin to include the concept of an environment.

Practically speaking this means that each engineer can have a private local environment file (.env) that contains an environment identifier specific to them, which ensures any resources they deploy are uniquely namespaced to them. Likewise, CI/CD processes can set the environment identifier appropriately to create resources for specific purposes, and then remove them at the end of a lifecycle (such as closing/merging a PR).

Watch a demo of a logical environment being used for local development:

Watch a demo of a logical environment being used for a GitHub Pull Request review:

You can read more on using Node.js, VS Code and logical environments and accessing the code in the blog post on the subject.

Routing and Middleware

While the simplicity of the Workers routing is great for enabling use cases like zero-downtime deployments, it’s not great for mapping HTTP requests to API endpoints – but Cloudflare Workers wasn’t designed to be an API gateway.

The solution is not so different from how you might do it in other execution contexts, such as containers if you aren’t using an API gateway - middleware.

We considered the feasibility of running existing middleware frameworks like Express in a Worker, but they’re too dependent on the Node.js runtime, and/or would require extensive customisation/adaptation and unlikely to fit within the 1MB script size limit.

Instead, we borrowed concepts such as route matching and found lightweight modules we could integrate and adapt to enable modular asynchronous pipelines to handle different combinations of HTTP methods and paths.

Watch a demo of middleware with authorisation and validation middleware responding accordingly:

You can read more on the middleware architecture and accessing the code in the blog post on the subject.

AWS CloudWatch Logs and Metrics

Since part of our solution was going to be in AWS anyway, we decided that CloudWatch would be a good option for observability. There’s some impedance between the availability of a global solution like Cloudflare Workers and regional solutions in AWS, but the cross-region reporting capabilities of CloudWatch gave us confidence we could have a global solution to observability if we implemented failure detection and multi-region capabilities in our Workers (although we only implemented a single region for the proof of concept).

There were three options to integrate AWS CloudWatch, which are also relevant for other AWS services, these were:

Direct from Cloudflare Workers to AWS Service APIs, but this required implementing the AWS v4 request signing process with CPU intensive crypto functions.
Via API Gateway, a Lambda function and the AWS SDK, but the cost of running Lambda was orders of magnitude higher than the cost to run the entire API in Cloudflare.
Via API Gateway but mapped directly to the AWS Service APIs, i.e. no Lambda.

We chose the third option as it offered minimal cost and there was no need for CPU intensive crypto in our Workers, balanced against a little bit of complexity to setup the API Gateway mappings.

For logs, we wanted the logger to be easily accessible to all code and for log messages to go to standard output regardless of the execution context. When running in Cloudflare, we also wanted the messages to be persisted so they can be flushed to an observability endpoint at the end of the request. We created a logging abstraction that was substitutable to handle those requirements.

For metrics, we were only interested in creating/seeing them when running in Cloudflare. Most of the metrics could be derived from data in the original request or the response, the exception was duration, for that, we needed to track the start and end time of the request. We created a substitutable observability abstraction that encapsulated the steps to create the stream, log messages and metrics.

The logs and metrics are asynchronously dispatched to the observability endpoint at the end of each Cloudflare Worker request.

Watch a demo of observability for Cloudflare Workers using AWS CloudWatch:

You can read more on observability and accessing the code in the blog post on the subject.

Conclusion and recommendations

It took a little bit of effort to create an ideal development, delivery and operations experience for using Cloudflare Workers as an API. I think in total we spent 1-2 months exploring and implementing it, and at the end of that, we had a good slice of the API ready to go.

My recommendation to Cloudflare would be to provide local development tooling that can be decoupled from wrangler and easily integrated into local development and debugging workflows. It would be useful to allow more complex route matching too.

I love the simplicity of deploying Cloudflare Workers and the use cases they open up, due to their global scale and performance characteristics I think they’re perfect for so-called “wrapper” APIs, or abstraction layers, that enable you to mitigate vendor lock-in, plug feature gaps and allow you to augment the vendor offering, or even provide a short to long term migration strategy from a vendor based solution to a bespoke solution. You could even just use as a filter layer for authentication, authorisation and validation for other APIs, that would remove a lot of duplication and deployment trade-offs you get with some other API technologies.

Edge network serverless computing could be the next big thing, but a major part of that is having global data persistence solutions. Not long after we’d completed our work on this, Cloudflare announced the “Durable Objects” beta, which is a new way of thinking about persistence, but a step in that direction. There are also services like Fauna emerging to offer solutions in that space. It’s exciting times for the way we think about cloud computing, I think the ultimate experience for cloud computing should be to simply deploy code to a cloud service and have it run performantly at scale and near your end-users without having to concern ourselves with choosing regions and the trade-offs in multi-region architectures. That's the dream, and I don't think we're very far away.