DEV Community

Roy Klein
Roy Klein

Posted on

2

3 Rules for Turbo-Charged E2E Tests

Slow and brittle tests are killing your productivity and blocking the adoption of advanced practices like Trunk-Based-Development and Continuous Integration (CI). Yet in many projects, Test Suites take north of 30 minutes to complete while also failing randomly, requiring reruns (effectively doubling or more the already slow run-time). 

In my previous article, I shared a strategy for mindfully pushing most tests down the Testing Pyramid with minimal impact to quality and coverage. Still, we need E2E tests because they're essential for ensuring the system truly works when it matters most: when users interact with it. The challenge is how to keep these tests from turning our CI pipelines to just "I" pipelines (losing the "Continuous" due to slowness). 

In this article, I will share my 3 rules for writing blazing-fast E2E tests, and a real-world benchmark showing how fast they can get.

Two Axes of E2E

The first step in preventing slow and brittle E2E suites is recognizing that there are two distinct perspectives on what "end-to-end" means:

  • User end-to-end: Verifies a complete user workflow. In an E-Commerce site, such a flow might be: Logging in, searching for a product, adding it to the cart, checking out, and receiving confirmation.

  • System end-to-end: Verifies one behavior at a time using the full system (no mocks). For example, one test would be verifying that login works, another test verifying the search functionality, etc., and each of those tests would use the actual authentication provider, a real database, and the genuine UI - just like a real user would.

While User E2E tests have their place in a testing strategy, they are inherently slow and brittle. Yet, System E2E tests can be surprisingly fast. Which is why the first rule is:

Rule #1: In Your CI, Focus on System E2E , not User E2E

When we talk about lower-level tests, we naturally focus on the System Axis: In a Unit Test we test one component of the system (a single function or class) at a time with the rest fully mocked (a single function or class). With Integration tests we check one functionality at a time, handled by several components of the system together (e.g. by calling an API endpoint directly with a database attached).

In both cases, we check one scenario at a time - one function call, or one API call - rather than a flow. We distinguish between Unit and Integration on the System Axis, not the User Axis.

But when moving to the E2E part of the pyramid, we slide forward on the System axis, but also suddenly make an implicit jump on the "User axis" at the same time, writing tests that not only check the entire system, but ALSO the entire user flow.

Note that I'm not advocating to abandon User E2E tests altogether - I will later advise on what their place should be in your testing strategy.

Two types of E2E on an axis system. The red arrow represents the
Two types of E2E on an axis system. The red arrow represents the "jump": Moving to E2E tests that check entire flows, while skipping E2E tests that check one behavior at a time, like our unit & integration tests do.

Thinking in Given-When-Then

Rule #1 establishes that System E2E tests check one feature at a time. To do that, each test would need to have the "right state" of the system to operate on. Using the E-Commerce example, we would have one test for logging in, another test for searching for a product, another test for adding it to a cart, and another test for checking out. The "rub" is that most of these tests require that the system is in a specific state, with the user already logged in, for example. How do we achieve that for each test?

The most obvious way is to use the UI to login, just like we did when we tested the login. But this approach has severe drawbacks:

  1. It's extremely slow. For anything but the most shallow features, virtually the entire run time of each test would be spent on setting up rather than on testing.
  2. It's brittle. The more actions performed via the UI - the more opportunities there are for UI flakiness to creep in. UI setup will fail from time to time, eroding our confidence in our test suite. 
  3. It's highly redundant. By testing one thing at a time, we're aiming towards a state where one broken feature means one broken test - so we can easily triangulate problems when they occur. If, for example, most of our tests use the UI to log in, then a broken login form, which is one bug, would lead to many unrelated tests failing, obscuring the information on where the problem is. We want a one-to-one bug to test failure relationship as much as possible.

Atomic tests (those that check one thing per test) are comprised of 3 parts. In Gherkin terms, they are:

Given: Setup the system to be in the state needed for our test
When: Activate the system with one action (or as few as possible)
Then: Validate that the new state of the system corresponds with what we expected the action to do

Rule #2: Perform the "Setup" phase ("Given") with quick programmatic calls

The key to rule #2 is understanding that the Given part is not actually a part of the test - it's just the setup. We would like the setup to run as quickly as possible. For that, we would use programmatic access to our system (authentication, DB population etc.), so our tests spend as little time as possible setting up.

For example, for our checkout test, we'd need a logged in user, with an item in the cart. This is our Given. To get the system to this state, we would first make a programmatic call to our Auth provider to create a signed-in session with a token, and another call to the Backend with that token to add an item to the cart of the signed-in user. These calls would be orders of magnitude faster than using the UI to achieve the same results.

Then, we perform the test and validate using UI to make sure the system is in the intended new state (the When/Then parts of the test).

Breakdown of which part of the test we're performing via API, and which via the UI. Test text taken from: https://www.subject7.com/gherkin-behavior-driven-testing-hype-or-not/
Breakdown of which part of the test we're performing via API, and which via the UI. Test text taken from: https://www.subject7.com/gherkin-behavior-driven-testing-hype-or-not/

A mock of how a System E2E test might look like in Playwright code
A mock of how a System E2E test might look like in Playwright code

Leverage Parallelization

Just like how we write our unit and integration tests to be isolated, run in any order, and in parallel, the same should apply to our System E2E tests. With fully parallelized tests, you can easily cut the total runtime by adding more workers.

Rule #3: Make your System E2E tests parallelizable

Real-World Test Case

Even with following the first two rules, the CI runtime of one of my projects (an React Native app), started edging towards 10 minutes on a single Github Runner, which makes working Trunk-Based and Atomic Commits slow and awkward. With a single commit taking me about 3–5 minutes to write, and with a CI running double that time, I started hesitating to push each commit immediately, which was a red flag.

Fortunately, we've been writing the System E2E tests to run in parallel from the start, so switching to a self-hosted runner with multiple workers was quick and painless, and improved the runtime dramatically (as a neat bonus, using a self-hosted runner avoids the need to reinstall static dependencies for each run, like docker, Playwright Browsers etc., further reducing the total CI time).

Here's the benchmark of Playwright running 80 System E2E tests:

Github Runner:
1 worker: Total: 8:30 minutes, Tests: 5:50 minutes
Self-hosted Runner:
1 worker: Total: 6:53, Tests: 5 minutes
2 workers: Total: 4:12, Tests: 3 minutes
3 workers: Total: 3:22, Tests: 2 minutes
4 workers: Total: 2:30, Tests: 1.4 minutes

80 E2E benchmark with 1–3 workers

Note: With 4 workers, tests were running so rapidly that the Auth provider sometimes rejected calls due to rate limiting, causing an occasional increase in runtime, so we've settled on 3 workers. Removing one bottleneck can expose another, often unexpected one, so benchmarking is important

The faster the CI runs, the more granular you can make your pushes, letting you catch mistakes and integration problems before they have a chance to pile up on one another.

Mitigating Implementation Binding

One downside of following these rules is that we are giving up grounds on an important principle of BDD - that our tests only fail when the behavior changes, not when the implementation details change. By making direct calls to our APIs, we are exposing our tests to failing when we modify or change these APIs, which isn't ideal. In essence, in order to accelerate our "given" stage, we bind it to the architecture.

Every approach has an upside and a downside, and if we're getting blazing-fast tests, I'd call this a fair tradeoff. However, we can mitigate this by having an abstraction layer between the tests and the architecture- for example, for our authentication actions, there would be a helper class that interacts with the authentication provider and is called on by the tests. If we change our Auth provider, we would have one place to fix, instead of having to go through all our tests and fix them one by one.

What To Do About User E2E tests

To validate that our app provides a cohesive user experience, we still need tests that run through a longer flow. Unfortunately, this validation is slow and brittle by nature. Therefore, I'd avoid including User E2E tests in the normal CI pipeline.

Instead, consider the following options:

  1. Nightly: Schedule these tests once a day, preferably with sufficient retries to minimize the impact of UI-related flakiness.
  2. Pre-Deployment: Run them before deployments. 
  3. Post-Deployment: If you practice Continuous Deployment (CD), consider running them post-deployment instead, with an auto rollback if they fail.

Bonus rule: Keep the inherently slow User E2E tests out of your CI pipeline

By keeping these tests out of your CI pipeline, you accelerate day-to-day development while still benefiting from full user-flow validations at regular intervals and critical moments.

A quick and robust suite of System E2E tests provide most of the confidence while the User E2E tests provide the final seal of approval, without becoming a bottleneck.

Summary

Speed, reliability, and coverage in testing directly impact development velocity and software quality. Sadly, E2E tests often become the slow, fragile part of the pipeline. But by adopting straightforward principles - familiar from our unit and integration test patterns - you can make many of your E2E tests surprisingly fast and stable.

  1. Focus your CI pipeline on System E2E tests that validate one feature at a time
  2. Accelerate their setup with programmatic calls.
  3. Run them in parallel
  4. Bonus rule: Relegate longer flow User E2E tests to scheduled & pre/post-deployment runs.

By following these rules, you'll maintain both speed and quality, ensuring that E2E testing supports your development process instead of hindering it.

Heroku

Simplify your DevOps and maximize your time.

Since 2007, Heroku has been the go-to platform for developers as it monitors uptime, performance, and infrastructure concerns, allowing you to focus on writing code.

Learn More

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more