If you are writing a moderately complex web application, eventually, you will run into difficulties that can occur when a change in one place has unintended consequences elsewhere in the application. These changes are inevitable as an application ages, and unit testing will not save you. Tests that exercise the full application’s web of dependencies are the best path to assuring success. In addition, as the application evolves, these tests need to be easy to change, and avoid failing when irrelevant implementation details change under the hood.
In my most recent employment at Citrine Informatics, we adopted Cypress (https://cypress.io) as our testing framework for both integration and end-to-end testing. There’s no question: It transformed our working dynamic. Both our certainty we are building the right thing and certainty that things will work went way up. Along the way, we learned a few subtle tricks to keep Cypress stable in both local and CI environments. We also learned how powerful the right testing approach can be towards steering product development to find an optimal user experience. All of this is possible with minimal disruption of developer work to craft the tests themselves, and that is where Cypress shines compared to other testing approaches.
It has a wealth of plugins, and a commercial dashboard that makes running tests in parallel and inspecting results in real-time easy. It takes a screenshot by default on test failure, which is something that has to be manually configured for Puppeteer and friends.
Prior to using Cypress, we at Citrine did not yet have an end-to-end test suite, as the web interface to our platform was brand new. We did have some Jest unit tests, and toyed briefly with a react-testing-library/nock-based framework for mocking out a server as a custom integration test framework. Long story short: don’t do this. It’s theoretically possible, but a nightmare to debug when something fails to work. Instead, write integration tests in an environment where you can see the app as the test runs!
In the 9 months since adopting Cypress, we have learned a ton, and our testing suite has evolved to a mature stage where our tests are now remarkably stable in both an end-to-end test environment against a live server, and an integration test environment using a mocked-out server. Writing new tests for features, or modifying existing tests for changes to existing features is fast, and supports an agile iteration that includes input from product, design and developers.
When we first adopted Cypress, we tended to use its built-in selection and assertion functionality like this
Soon after, QA guru Jeff Nyman (check out his extensive blog on testing at https://testerstories.com/author/Administrator/) recommended we take a look at using “page objects” to abstract out the elements on a page. Our first attempts looked like:
This worked pretty well for us. However, Jeff was gentle, but persistent: things could work better. At this point, our requirements were loosely spelled out in Jira tickets, and our tests were basically hidden from the product team, as something that we coded on our own. Once a ticket was closed, the requirements would disappear into the vacuum of things-you-can’t-find-in-Jira-by-searching-for-them. If something seemed weird in the app, there was no single place to point to that said “this is how it should work.” Directly pinging someone to see if they knew the answer was the best way to get this info, and occasionally, two different people would give opposing answers.
As a developer, this is frustrating. As a company, this is downright dangerous: your customers will definitely notice if you listen to the wrong person and “fix” expected behavior!
At this point, Jeff’s constant refrain of “eventually, we’ll have executable feature specs” began to make sense. Instead of writing vague requirements in a Jira ticket, and often sending developers back to the beginning to fix a necessary requirement that was not at all clear when the feature was all done, there was a better way. We could write our specs in a clear format, one clear enough that it could serve both as requirements, and as the inputs used to run automated tests. The language would allow both running manually (a person reading the spec and manually doing what it says) or running automatically by a testing framework.
We chose to implement this by porting Jeff’s Testable framework into Typescript, and to adapt Cypress to use the cypress-cucumber-preprocessor plugin to directly run feature specifications written in the Gherkin dialect as tests. Since then, we have gradually migrated our existing tests over to this new format, and written several new tests as new features have been built.
I’m not going to lie: setting up a testing framework with Cypress that is both easy to develop locally and easy to run on continuous integration was really difficult. First, we had to figure out how to coax Cypress to work in a CI environment. We use Jenkins, but the same issues would apply to Travis or CircleCI. Our app runs on an Alpine linux container. Alpine can’t run electron effectively, so we couldn’t just install Cypress inside of our app. Additionally, porting our app to run inside a pre-built Cypress container was not leading us to the happy place, as the Alpine extensions we need do not map 1:1 to the containers Cypress runs in.
Ultimately, the solution that works is to take advantage of package.json’s optionalDependencies field. By placing Cypress and all of its extensions in optional dependencies, we can use this with a simple shell script to extract the optional dependencies and make a custom package.json containing only them. When using the app locally, we can install Cypress as well as the app and development dependencies with:
yarn install --frozen-lockfile
npm ci is the npm equivalent)
In CI, we can build the app with:
yarn install --frozen-lockfile --ignore-optional
npm ci --no-optional is the npm equivalent)
and then we can use our custom package.json to copy over our Cypress tests and install the extensions we need inside the extended Cypress container.
Additionally, to get the two containers to communicate with one another, we used docker run to run the app and cypress in separate containers, sharing the same network. Recently, we switched to a docker-compose environment that allows us to run all of the containers in parallel without needing to use Jenkins scripts.
With this basic insight, the stability of the tests jumped exponentially. We still had some flake, however, and addressed it with these changes (now obvious in retrospect):
- Don’t record Cypress videos, only store 2 test runs in memory, and turn off Cypress watching for test file changes in CI.
- Increase the memory size available to Cypress using NODE_OPTIONS=--max-old-space-size=4096 as a prefix to the cypress run command.
- Run the application in a uniquely named docker container (use the CI build number as a postfix to the app name)
- Run both the application and the cypress container in a uniquely named network (use the CI build number as a postfix to the app name)
- In CI, set CYPRESS_BASE_URL to the unique container name (https://app123:8080 for jenkins build number 123, for example)
- Set the shm to use the host (https://docs.cypress.io/guides/guides/continuous-integration.html#In-Docker)
- Don’t start Cypress until the webpack build has actually completed
- Fix the webpack build to never rely upon hot reload or file system watching
The webpack issues took us MONTHS to fully figure out, because 95+% of the time, the tests worked just fine, and the error messages were very cryptic, often referring to a sockjs endpoint.
The most significant changes to improve flake were to move all mocking out of the app, and out of Cypress, and instead use webpack dev server’s before option to implement a fake server.
First, let’s look at the changes to webpack configuration that improved flakiness!
First, determining when the app is built required adding a webpack build plugin which would set a flag when the app is finished building
Then we use this in the webpack dev server before option to enable a health check endpoint.
Finally, we can use a small shell script that fits into a single package.json script line to wait for the server
Next, disabling watching/hot reload turned out to be tougher than we expected. Our app uses a webpack vendor build when running in development, and we had to disable it on both the app and the vendor build. Much of this problem went away when we discovered we could easily run the production build of our app using webpack dev server, and still intercept API calls in order to proxy to our fake server.
With these changes, a large majority of test failures disappeared.
When we first enabled Cypress, we enabled recording of screencast videos and upload on failure to the Cypress dashboard. Unfortunately, the test videos tend to drop a minute of frames at a time, which rendered them essentially as massive, multi-minute screenshots. In addition, they could add 5 minutes of run time on each failed test as the video was compressed for upload. They never provided context that a screenshot and local reproduction couldn’t. With the stability improvements listed above, we found it was easier to simply reproduce the test failure locally and rely upon the screenshot to determine what was going wrong.
We pass these options to cypress on the command-line to override the behavior we would like locally:
--config=video=false,watchForFileChanges=false,numTestsKeptInMemory=2. Of course, you could opt to make a duplicate configuration file for CI that contains these changes instead, but we found that it was simpler for maintenance to pass in the option above, so that we could have a single configuration file for the other options.
Additionally, when we first began, we tried to enable code coverage, but found that even with Docker volumes set up to write the coverage data outside the running container, we couldn’t get it to successfully write out coverage info in the CI environment. Ultimately, we solved the problem in a different way: instead of relying on a raw metric of lines of code executed, we use our feature specs to determine coverage of critical user paths. The specs either have a test or they don’t, which gives us much more confidence in the coverage of tests than the numbers ever could. Code coverage can’t tell you whether your test is relevant, or whether it is truly testing the feature.
Why would you ever want to mock your API? First, if developing a feature against an API that doesn’t yet exist, you need a way to write code that will work when the production server supports the new API call. Next, when writing new frontend code, you will want to isolate variables: if a test fails, it should only be because of your code, not because of a network glitch contacting a live server. Last, if your live development server is in a broken state, this should not block all of the frontend development. Additionally, with a mock API, you can develop against and robustly test edge cases such as the internet going down mid-request, an object in an error state that rarely happens, etc.
When should you not mock the API? When your goal is to test the interface between the frontend and the API, you should always hit a live endpoint. These tests tend to be slower than the mocked API tests, and should generally be a deployment gate, rather than a pull request gate.
At Citrine, we started out by using a mechanism to automatically record network calls, and then use Cypress’s built-in mocking to serve them up when the test runs. This worked great at first, but we quickly ran into some annoying problems.
- If the test was recorded based on the local development server state (they were), then when anything in this state was modified, the entire test has to be re-recorded. Or worse, the test gets stale, never running against current API data.
- If a single new API request is added to each app run, ALL of the tests must be re-recorded. This introduces required, but irrelevant changes into a pull request.
- As the app grows, there is a lot of duplication in the mocks. At our peak we were storing 91 megabytes of recorded mock API data. When we moved to a fake server, that same data was representable with 31MB of storage.
To solve these issues, we use fakes instead of mocks. We wrote a server that reacts to requests the same way our actual API does, but instead of doing real work, it returns sample JSON files we scraped from the API server.
In our first successful implementation, we used the excellent Mock Service Worker package. This works great locally, but in a CI environment, it was incredibly flaky. We suspect (but were never able to confirm) that the service worker buckled under the weight of 31 MB of faked data. As soon as we moved to a server-side approach, the app got snappy, and our tests became completely stable.
We considered using Mock Service Worker’s node interface, but ultimately this seemed like an extra step - it’s not that hard to write an actual express server, and this way we could have full access to the server in every environment except Storybook. Since the only reason we would need to access network requests would be to develop pages that make network requests, we decided to limit Storybook to components, and use the fake server for developing new features and tests locally.
The drawbacks to this approach? Extra effort is needed to write a downloader script and the mock server to consume the resources, and some time debugging the mock server. Now that it is working, we have a very stable system for extracting new data, and a very stable system for adding new endpoint functionality.
The last element of Citrine’s testing strategy is a more unusual approach that in retrospect seems obvious. We use the same feature specs as the source for both our end-to-end tests and our integration tests. With the use of the @ui-integration-only tag, we can flag tests that should only run in the integration environment, such as tests that rely upon unusual error states, or ephemeral network failures. Otherwise, the same tests can run both against a live development server, or our fake server.
In this way, we have a system that runs as a pull request gate using the fake server, and the same system runs as a post-deployment gate against the live server.
Recently, we had a configuration issue bring down our development server for a few hours. Because our PR gate did not depend on the live server, we were not blocked in the development of features. Our end-to-end tests can catch breaking changes in the backend API or the backend data before they metastasize into real problems.
Writing new tests or reorganizing existing tests is fast and focuses on the way the customer will actually use the app. We have already caught and fixed a few UX problems simply by trying to write a test and discovering it was hard to do it well. In short, Citrine’s feature work is in really good shape.
Citrine is hiring! Come work on their cutting-edge platform for Materials Informatics to see this well-oiled machine from the inside. https://citrine.io