Eugene Kazakov

Posted on Aug 13, 2024

Cypress and Our Testing Pyramid

#cypress #autotest #e2e #aws

Automated tests are expected to be stable and fast. In this article, I will discuss our strategy for optimizing the testing pyramid, why we chose Cypress, the approaches we developed for writing tests, and running tests on AWS infrastructure.

Introduction to Cypress

Setting aside the obvious definition that Cypress is a JavaScript testing framework, it's important to note that when working with it, we see a browser on the screen. It doesn't have to be open; it can be headless, but it exists and opens a special Cypress application, which consists of several frames: in one frame, the product we are testing is opened, and in another frame, the tests are run. The test code is written in JavaScript, so it can be executed directly in the browser, as JavaScript is its natively supported language.

So, using the JavaScript API, all the interactions performed in the tests are carried out, such as filling out forms, clicking, and similar actions.

Advantages of Cypress

No Selenium WebDriver

The key difference between Cypress and the libraries and frameworks we've used before is the absence of the main component, Selenium.

Selenium WebDriver is a third-party Java-based service that interacts with the browser via the WebDriver protocol. This imposes certain limitations on how the browser can be controlled within the protocol. Network interactions also contribute to the test execution time.

Selenium was originally designed not specifically for testing, but as a general browser automation tool. Cypress, in contrast, is focused on solving a specific task — namely, creating end-to-end (e2e) tests for web application interfaces.

All-in-One

Cypress doesn’t need to be assembled from different parts — it comes with all the modern "batteries included":

BDD syntax (inherited from Mocha): describe(), context(), it().
As well as hooks: before(), beforeEach().
This Domain-Specific Language (DSL) is familiar to those who have already written unit tests in JavaScript.
Assertion library (inherited from Chai). For example:
expect(name).to.not.equal("Jane") — expecting that an element does not exist is not the same as expecting a failure when checking for the existence of an element. If the element is absent, that's good, there’s no need to double-check it; you should move on.
A good framework should handle this. And this was the thing we had implemented in our custom library previously.
Intercepting, spying, and mocking browser requests to the backend.

Development Experience

The main advantage of Cypress is its excellent development experience. You can write your first test for your project (regardless of the language the project is written in) in about 10 minutes. All you need to do is add one dependency to your package.json (npm install cypress), read the documentation on where to place the files (cypress/integration/login.spec.js), and write 5 lines of code:

describe('Login', () => {
     it('should log in with credentials', () => {
        cy.visit('/login');
        cy.get('[name=login_name]').type(Cypress.env('login'));
        cy.get('[name=passwd]').type(Cypress.env('password'));
        cy.get('[name=send]').click();
        cy.get('.main-header').should('be.visible');
    });
});

You get an actual test that visits the login page, fills out the form, clicks the button, and checks the result.

In the browser screenshot, you can see that all test steps are logged. But it's not just a log — it's a navigation tool that allows you to return to any point after the test has run and see what was happening in the browser. For example, you can view snapshots before and after an Ajax request.

A nice touch is that every cy.get() ensures the page has loaded and makes several attempts to find the element. Each year, web application interfaces become more complex. The resulting HTML is not generated server-side but in the browser. This is done asynchronously and with various component libraries. It is becoming increasingly difficult to determine exactly when a particular interface element will appear on the screen.

One of the best practices suggests that you should never write a timeout like "wait for 2 seconds." All timeouts should wait for something tangible, such as the completion of an Ajax request. You can subscribe to an event that occurs in the product's code. For example, when an event from the backend arrives via a WebSocket, a specific listener on the frontend is triggered.

All Cypress documentation and best practices are available on one site, docs.cypress.io. It's worth mentioning the excellent quality of the documentation, along with the master classes that the Cypress development team offers and makes available to the public.

Additionally, one pattern to move away from is PageObject. It was considered essential for a long time, but it's no longer necessary for new tests.

We'll return to our own established best practices a little later, but for now, let's take a break and talk about the testing pyramid, why we're doing all this, and what our goal is.

Testing Pyramid

When discussing the testing pyramid, the anti-pattern of the "inverted pyramid" or the "ice cream cone" is often mentioned. In this scenario, the number of unit tests at the bottom level is close to zero. Personally, I find this situation unlikely for a mature project because it would mean that the developers have entirely abandoned writing the simplest tests — so where did the complex end-to-end (e2e) tests come from then?

In any case, this doesn't apply to us — we have several thousand PHPUnit tests with around 20% code coverage.

At the same time, we also have several thousand Selenium-base e2e tests, which test all possible product configurations. These tests take a lot of time (we managed to optimize the subset that runs on every commit down to 40-60 minutes), have a rather low level of reliability (with a 30-40% chance that the tests will fail even though the commit doesn't contain the cause of the failure), and cover about 30% of the code.

So, our situation resembles an hourglass — we lack a middle layer in testing, where integration tests verify components of the system independently of each other. This narrow "neck" of the hourglass is what we want to fill using Cypress. Additionally, we want to address the existing e2e tests to "sharpen" the top of the pyramid. The key point here is that Cypress is not a replacement for the old framework; we don't want to simply rewrite all the tests using Cypress — otherwise, we'll remain stuck with the "ice cream cone" structure. The goal of the tests remains to check for regression in the product, but to do so at a different level, enabling faster execution, earlier results, and easier maintenance.

Our Approach to Writing Tests

The project in question is the Plesk control panel, which provides users with an interface for managing website hosting. The panel's functionality is accessible not only through the UI but also via API and CLI, which are used for automation.

We started with the following assumptions:

Cypress tests are purely UI-focused: we do not include tests where steps are executed through the API or CLI.
No additional validation beyond UI: for example, if we are testing domain creation, we do not send requests to verify the web server or DNS. We consider the test successful if a green message in the UI indicates that the domain was created successfully. This approach helps us avoid setting up complicated prerequisites and writing extensive test scenarios.
Automate only positive scenarios at the initial stage: negative scenarios do not provide value to the customer but take up valuable time for test execution. Therefore, we shift these scenarios to the lower part of the pyramid — they are generally easy to check with unit tests.

Our experience with Cypress, combined with the official recommendations, has led us to adopt the following set of practices:

Reset Product State

We reset the product state to its original condition before running each test suite (Cypress recommends doing this before each test, but we use a simplified version). We create a database dump and restore it before running each test suite/spec. This takes approximately 5 seconds.

before(cy.resetInstance);
//  => test_helper --reset-instance
//       => cat /var/lib/psa/dumps/snapshot.sql | mysql

Use Fixtures

Instead of using real objects as test prerequisites, we use fixtures — saved structures that contain the required database state. For example, to perform certain tests, a domain must be present. Instead of creating a real domain, we recreate all the necessary records in the database without touching the file system or other system services. This takes less than a second (for comparison, creating a full domain would take about 30 seconds).

cy.setupData(subscription).as('subscription');
//  => test_helper --setup-data < {domains: [{ id: 1, name: "example.com" }]}

Such objects won’t execute complete user scenarios, but they are sufficient for UI testing.

Use Direct URLs

Instead of navigating, we access the necessary UI locations through direct URLs. We call our custom login command to create a session and then navigate directly to the desired page.

beforeEach(() => {
    cy.login();
    cy.visit('/admin/my-profile/');
});

In our old framework, we would use PageObject to log into the main menu and then navigate from there to the required element. Here, this is not needed since we are testing only the specific page. The only duplication is the login command, but this does not seem to be an issue.

Frontend Without Backend

Sometimes it is difficult to set up conditions for a specific state that we want to test. For example, testing available updates is much easier by providing a prepared (fake) response for an Ajax request than by setting up the infrastructure for updates.

const lastChecked = 'Jan 29, 2021 04:42 PM';
cy.intercept('POST', '/admin/home/check-for-updates', {
   status: 'success',
   lastChecked,
   newVersion: null,
   whatsNewUrl: null,
}).as('checkForUpdates');

cy.get('[data-name="checkForUpdates"]').click();
cy.wait('@checkForUpdates');
cy.get('[data-name="lastCheckedDate"]').should('contain', lastChecked);

While not all data is delivered via Ajax and the frontend is not yet a fully-fledged SPA, we are moving in that direction. This approach to frontend testing using pre-prepared backend responses seems most promising, as it allows us to not run the backend at all and speeds up test execution.

Test Stability

When you start writing Cypress tests, you'll likely find that many tests unexpectedly become flaky, meaning they sometimes pass and sometimes fail. To avoid such instability, we use the following practices.

Wait for Ajax Requests

Many forms in our product are submitted via Ajax requests without a page transition. To ensure that a test passes consistently, you need to intercept these requests and wait for their completion. Using Cypress, we only check what happens in the UI and wait for the specific message we need.

In the example below, we intercept the client creation request, wait for the request to complete immediately after clicking the button, and only then check the message indicating that the client has been created.

cy.intercept('POST', '/admin/customer/create').as('customerCreate');
cy.get('[name=send]').click();
cy.wait('@customerCreate');
cy.get('.msg-box.msg-info').should('be.visible');

Wait for the Loading Indicator to Disappear

In some parts of our interface, background operations like updating a list are accompanied by an animated loading indicator ("spinner"). On these pages, after an Ajax request completes, Cypress might throw an error like "element has been detached from the DOM" when trying to click on list elements. To prevent this, we add an extra line after the Ajax request to check that the loading indicator is no longer visible.

cy.get('.ajax-loading').should('not.be.visible');

We hope one day this issue will be fixed on the Cypress side so we won't have to monitor it manually.

Results

After the first iteration (3 man-months), we achieved the following results:

335 Cypress tests (split across 84 specs)
The pipeline completes in 35-40 minutes, with the tests themselves taking 20 minutes
The pipeline runs on every pull request in blocking mode (i.e., merging is not allowed without passing tests)
Confidence level over 95% (meaning the probability of flaky failures is below 5%)
Interface coverage at 35% (details to follow)

After 3 years (doing it mostly as a background work), we are currently in the following state:

1242 Cypress tests (in 222 specs)
The pipeline runs in 12 parallel threads and completes in the same 30-40 minutes.

Pipeline for Running Tests

During the development process, the pipeline for running tests went through several stages of evolution. The requirement was to keep the duration under an hour, as longer wait times for merging pull requests would cause delays.

As with most of our tasks, the pipeline is executed in Jenkins and stored in a Jenkinsfile along with the project's code.

Linear Pipeline

In the first iteration, we created a simple linear pipeline.

We run a Docker container with Plesk in the background and wait for it to become accessible on the local network. Then, we launch another container with Cypress and the test code; it connects to Plesk and runs all the tests while we wait for it to finish (without detaching).

We ran the tests on a machine with 12 cores, which we also use for building Plesk and several of its services. During the workday, we often have 20-30 builds. As a result, the Load Average reached 20, and many neighboring processes stalled. We added a limit on the number of concurrent builds to 3-5. However, this wasn’t enough, and our colleagues continued to complain about the load.

So, we moved the test runs to a dedicated server in AWS with 4 cores within a VPC that has access to our office network. This solved the issue of noisy neighbors, but the test builds continued to wait in the queue for a long time, occasionally exceeding the timeout.

Pipeline with Parallel Steps

To speed up the process, we decided to use the Jenkins EC2 Fleet plugin, which provides a Jenkins slave node on demand from an Autoscaling Group in AWS and terminates inactive nodes after a period of inactivity. This approach allows us to spend money on resource rentals only when they are needed.

Switching to spot instances allowed us to significantly reduce costs: instead of $150 per month for an on-demand c5.xlarge, we started spending around $60 for c5.xlarge and the more powerful c5.2xlarge.

Most importantly, we can run as many concurrent builds as we need.

Launching a new node takes about 2 minutes. We made several parallel pipeline steps so that during this time, we could build the product and be ready to install it in Docker on the new node.

However, as the number of our tests grew, the pipeline time inevitably increased along with it, so we needed to find new ways to speed it up.

Pipeline with Parallel Tests

Cypress offers a paid feature for parallel test execution using the Cypress Dashboard. However, we opted for a simpler and free approach — listing the test files when launching the container. The first container runs all the even-numbered files, while the second runs all the odd-numbered ones.

cypress run --spec $(find 'cypress/integration' -type f -name '*.js' | awk '(NR - ${RUNNER}) % ${TOTAL_RUNNERS} == 0' | tr '\n' ',')

This resulted in a matrix build, where each axis launches its own container with Plesk and a specific set of tests.

As a result, the entire pipeline now fits within an acceptable 30-40 minutes, with each batch of tests taking about 20 minutes. As the number of tests increases, we will increase the number of parallel threads.

Measure URL Coverage

Our project involves many different programming languages, and code coverage analysis has always been a challenging topic because collecting data requires special builds and merging reports from multiple configurations.

For analyzing UI test coverage, we decided to use product analytics and compare the data from test installations with that from real users. We already had a service similar to Google Analytics for collecting user metrics, and the test data was being stored separately but remained unused. From the many metrics available, we filtered out events related to visited URLs of the product, started saving this data in a format that was convenient for us in a database, and began generating reports on the visited URLs.

Based on the data collected, through all automated and manual testing within the company, we cover about 60% of the URLs that real users visit over a month. Our old tests cover around 25%, while the new Cypress tests have already reached 35%.

This statistic helps us plan further testing — for example, we plan to prioritize automating the more frequently visited pages first.

Conclusion

Cypress is appealing due to its quick setup for writing tests and convenient debugging tools. However, it's important to remember that the speed and stability of tests depend on how they are written: lightweight fixtures and prepared backend responses can significantly speed up tests, while state resets help avoid unintended interactions between tests.

Cypress enables both comprehensive end-to-end (e2e) tests that simulate user scenarios on a real product and integration testing of individual frontend components. It is better to define the goals and agree on the rules of engagement for the entire team in advance.

Additionally, it's good to know that running tests consumes a significant amount of CPU resources, so it's essential to assess the number of parallel executions and plan for infrastructure scaling ahead of time.

DEV Community