Timothy Ng

Posted on Dec 19, 2020

50% Faster Testing with Mocha's Parallel Mode

#node #testing #webdev #javascript

Hey all! I originally published this post to LeaseLock's Engineering Blog, but I wanted to share it with the community here as well. In this post we

This article references features of the Mocha testing library available from v8.2.0 onwards.

At LeaseLock, we take pride in our codebase's ~93% test coverage. Despite being a small team, we rarely introduce new functionality without accompanying tests - this rule has served us well by keeping us away from silly mistakes. At the time of writing, we have just over 3,000 test cases in our test suite powered by Mocha and Chai.

A Good Problem to Have

While most of our tests are rapid-fire unit tests, there are a significant number of integration and end-to-end tests that hit our test database. As one would expect, these I/O bound tests significantly slow down the overall runtime of our tests.

From start to finish, our test suite takes about 2 minutes to run, give or take a few seconds depending on hardware. It's not terrible, but it will quickly become a problem in our high-growth environment as we bring on more engineers and build out new features.

A relevant xkcd, except we'd be saying, "My tests are running." (source)

Acknowledging that our test suite was only going to get slower, we looked to Mocha's v8 major release, which introduced parallel mode by utilizing worker pools.

Just Add the `--parallel` Flag

If only it were that easy.

By running our tests serially, we were able to make the nice assumption that exactly one test case was accessing the database at a given moment.

With multiple worker processes chipping away at our test suite, contention between two or more test cases for the same database table is bound to happen.

In parallel mode, we faced the challenge of making the aforementioned one-connection-at-a-time guarantee.

What are the chances that multiple tests compete for the same database table at the same time? (Hint: Pretty likely.)

Concurrency Woes

Core to arriving at our solution was understanding a few things about Mocha's parallel mode:

We can control the number of worker processes that Mocha spawns via the --jobs flag. Without this flag, Mocha defaults to(num CPU cores-1)`.
Each worker process is a Node child_process.
Workers run test suites file-by-file, but the order in which files get processed - and by which worker - is arbitrary. (In other words, each test file must run successfully in isolation.)
Mocha's lifecycle hooks can be used to bootstrap our test environment. We can use global fixtures to run setup and teardown exactly once. On the other hand, we can use root hook plugins to run beforeAll before each test file. (Note: the behavior of root hooks varies between parallel and serial modes, but for this article, we are only concerned with the parallel case.)

With these points in mind, we concluded that we could assign a dedicated database to each worker process.

The idea was simple: for each worker that Mocha spawns, we'd want to create a copy of the test database that only that worker should connect to. With this design, we'd prevent contention between multiple worker processes by eliminating concurrent access to the same test database.

Since each worker runs tests serially, having a dedicated database for each worker removes the issue of concurrent access to the test database.

From here, all we had to do was find the right places to bootstrap the databases. A few questions stood out when we first approached this solution:

How would we bootstrap database copies? Do we have to run our migrations on each database we spin up?
How can we force the tests in a worker process to connect to the worker's dedicated database copy?

The Brewing Method

The Mocha library provides hooks into its lifecycle in the form of global fixtures and root hook plugins. We used these hooks to bootstrap our test databases in the appropriate stages of Mocha's lifecycle.

Using global fixtures, which is guaranteed to fire the mochaGlobalSetup and mochaGlobalTeardown functions exactly once per run, we perform two things: 1) spin up a Docker container of the Postgres engine, and 2) create a template database that can be copied for each worker process.

Having the Postgres databases in a Docker container provides a nice ephemeral environment - perfect for ensuring a clean slate between test runs.

To save us from having to run our schema migrations every time we spin up a database for a worker process, we create a template database so we can simply run createdb --template my_template test_db_1 to stand up a new database with the most up-to-date schema.

Our global fixtures file --required by Mocha looked roughly like:
{% gist https://gist.github.com/timorthi/13228a9ec10de4f9bbe486c0c864c7ba %}

Great! Now that we have a database engine active while our tests are running, we had to actually create the databases for each worker process.

Our problems were two-fold:

First, our codebase relies on environment variables to fetch database connections. We needed to ensure that the worker process started up with the correct environment variables to connect to its dedicated database.

Second, there aren't any hooks for when a worker process is spawned by Mocha. We needed a way to create the worker's dedicated database exactly once per worker, but had no Mocha hook to do so.

These issues are closely intertwined. If we can't hook into the worker-spawning process, how can we provide the worker processes with the correct environment, or spin up its database efficiently?

A Blank Slate Each Time

Mocha creates child processes with the workerpool library which sits atop the child_process module. At the end of the day, each new Mocha worker is just a fork() call.

Each worker has no relation to each other nor its parent, so it can be manipulated freely without worrying about contaminating other environments.

A child process's memory space is isolated from sibling and parent Node processes. This takes care of both the aforementioned problems. First, regarding the environment variables, we can safely edit the process.env property within a worker. Second, we can manipulate the global state within our code to maintain a flag on whether a database for a given worker process had already been created.

We opted to use the pid as the unique identifier for each database copy and conjured up the following hooks file, also --required by Mocha:

Future Hours Saved

With this setup, we are now able to run our full test suite in parallel.

With some tuning of the number of workers - 4 seems to be a good number for our team's hardware - we've seen anywhere from a 30% to 60% improvement in overall runtime, saving us precious minutes daily in our development loop. An added benefit is that our CI build times are down too!

In addition to the initial gains in performance, we're excited to see what happens as we increase the number of test suites in our codebase. In theory, if we run Mocha with a parallelism of N, it would take N new test files for the runtime to increase as much as 1 new test file would in serial mode.

In a perfect world…

We've kept things simple here at LeaseLock, so the only data store that our tests interact with is the Postgres database. As the codebase grows, we'll inevitably add more data stores or external services that need to be tested end-to-end. When that happens, we'll be sure to take our learnings from this iteration of test parallelization and apply them as needed.

If you're interested in tackling problems like this with us, visit our careers page for information about available roles. If you don't see the role you're looking for, you can also reach out to us directly at talent@leaselock.com.

Top comments (2)

Olivier Guimbal • Dec 20 '20

For your use case, I'd suggest to have a look at this lib I recently released 😇, which kinda aims to solve your problem without needing parallel execution. Nor docker. Nor an actual database (for what it's worth: we're running around a thousand integration test - which I think is about 30k requests- in less than a minute using this lib).

I wrote about it here.

Timothy Ng • Dec 20 '20

This looks pretty neat, thanks for sharing!

DEV Community