Tomas Fernandez for Semaphore

Posted on Mar 29, 2022 • Originally published at semaphoreci.com

The Testing Pyramid: How to Structure Your Test Suite

#testing #test #beginners #cic

For over a decade, the testing pyramid has been helping developers plan automated software tests. In this article, we’ll explore what makes up the pyramid, how it is helpful, and what alternative “shapes” there are.

What is the Testing Pyramid?

Introduced by Mike Cohn in his book Succeeding with Agile (2009), the pyramid is a metaphor for thinking about testing in software. It’s an idea that has caught on so strongly that, to this day, it’s still the industry standard in engineering circles.

The pyramid attempts to visually represent a logical organization of testing standards. It consists of three distinct layers:

The base of the pyramid consists of unit tests. A unit is a small logical piece of code: it can be a function, a class, or even a method in a class. A unit test only checks that said unit behaves as the developer intended. By calling the tested code directly and evaluating its output, a developer can write a unit test without depending on any other components, services, or the UI.

One level above, at the middle of the pyramid, we find integration tests, which are called “service tests” in Mike’s book. Integration in this context refers to testing how different components of the system work together. For instance, if a model in the code can correctly exchange data with the database or if a method can retrieve information from an API. No UI interactions are needed, as integration tests can directly call the code at the interfaces.

At the top of the pyramid we find the end-to-end tests (E2E). Also known as UI tests, E2E is testing in its most intuitive sense: use the application and see if it works. But instead of having a human conducting the tests, E2E tests are entirely automated. Every user interaction is mimicked; an E2E test can click buttons, type values, and evaluate what the UI is showing.

As you can see, the three types of tests have very different scopes:

Unit tests can only find logical errors at the most fundamental level. They are fast and require very few resources to run.
Integration tests verify that services and databases work well together with the code and the classes you’ve written. They can only find problems at the interfaces where two or more components meet.
E2E tests depend on the complete application being able to start. These are the most comprehensive type of tests we have and, accordingly, need the most computing resources and time to run.

So, why a pyramid?

To understand how the pyramid got its shape, we must understand the intricacies of each type of test.

Unit tests are small and therefore easy to write and maintain. Because they test very narrow parts of the code, we need plenty of them. This is usually not a problem because unit tests are light enough that we can run thousands of them in a few seconds.

E2E tests are at the far end of the spectrum. They are complex to write, difficult to maintain, need plenty of resources, and are slow to run. But, since we can cover a lot of the application with a few E2E tests, we need fewer of them.

In the middle, we find integration tests. Complexity-wise, they are on the same page as unit tests. But we don’t need as many of them since we are only interested in testing the “edges” of the application. Compared with unit tests, integration tests need more resources to run but are the same order of magnitude.

Hopefully, you now understand why the pyramid has its shape: the width of each layer represents the ideal relative quantity for each kind of test. In other words, the pyramid says we must have a few end-to-end tests, a decent amount of integration tests, and a swarm of unit tests.

As you work up the pyramid, tests get more complex and cover a more significant portion of the codebase. At the same time, the effort of writing, running and maintaining them increases. The pyramid illustrates an ideal ratio that maximizes the chance of finding a bug with the least work.

The forces shaping the pyramid

The nature of software development can often make the pyramid appear spontaneously — even when developers didn’t consciously set out to do it. Why does this happen?

It’s challenging to write E2E tests when the project is just starting. Unless the development team adopts a framework such as BDD and sets out to write acceptance tests from the beginning, most E2E tests will be written only when a basic prototype or a minimum viable product is in place. By then, developers will have had plenty of opportunities to write unit and integration tests.

A second factor that shapes the pyramid is speed. The faster the test suite is, the more often developers run it. Slow tests hurt the vital feedback loop needed for a productive environment.

Tests at the bottom of the pyramid are the fastest. So developers tend to write more of them. Conversely, E2E tests are slow and thus used more sparingly. As a result, a large web app can have thousands of unit tests, hundreds of integration tests, and a few dozen E2E tests.

Test Type	Order of magnitude
Unit test	0.01 - 0.001 s
Integration test	1 s
E2E test	10 s

The testing pyramid is the most widely-known format of designing automated tests. But is it the only one? Certainly not.

Testing Frontends with the Testing Trophy

The testing pyramid dates back to 2009. To put things into context, Ruby on Rails was on its second release and Node.js was only being created. Internet Explorer and Adobe Flash were still relevant. MySpace had just peaked and Facebook was only getting started.

Rich frontend frameworks like React or Angular were still far away on the horizon.

Technology has changed so much that many people feel that a different approach is needed. Kent C. Dodds is one such person and proposed the Testing Trophy as an alternative way for structuring tests in frontend development.

Caption: “Write tests, not too many, mostly integration.”
Credit: Kent C. Dodds at testingjavascript.com

The Testing Trophy reorders priorities. Integration tests are king as most modern UIs rely on backend components and are difficult to test in isolation.

Compared to the pyramid, unit tests take a back seat and are replaced by static testing tools such as ESLint and JSHInt. These scan the code to offer suggestions and find potential problems such as use of unsafe statements or lack of adherence to variable naming rules.

The trophy is crowned by E2E tests, which take a similar portion of the testing pie as in the pyramid.

The Test Matrix

One thing that’s often left out of the equation when discussing the pyramid is confidence. Which type of test gives you more confidence? The only test that can genuinely validate the application’s usability is an E2E test.

What’s stopping us from writing a lot more E2E tests? The typical answer is effort: running E2E tests is rarely worth the time and maintenance effort. But not everyone agrees with this argument. Gleb Bahmutov and Roman Sandler proposed the Testing Matrix as an alternative device for planning a testing strategy.

In the matrix, effort increases from left to right and confidence rises from the bottom to the top. The best place to be is the green quadrant.

Most software projects start in the low-effort, low-confidence yellow zone.

Tests are added at every level as the project matures and new features are added. The upkeep of the test suite grows entropically as one or more of the effort categories increase. A team that neglects to upkeep its test suite may soon find itself in the red zone.

How can you increase confidence and reduce effort? The answer is to periodically reevaluate the characteristics of your tests in the following five categories:

Installation: the effort involved in installing and setting up the test framework.
Writing: the complexity of writing tests and the skill level of the developers for a given framework.
Running: the difficulty of running the test suite and CI/CD performance.
Debugging: how easy it is to find and fix a problem when a test fails.
Maintenance: how much effort is required to maintain a test throughout the project’s lifetime.

Unit tests may be the best investment at the start of the project. But once features have stabilized, you may need to rebalance the mix by adding more E2E tests and removing some in other categories. This should increase confidence while reducing, or at least maintaining, effort level.

Beware of dogmas

The pyramid tells us to limit E2E tests due to speed, cost, and maintenance concerns. But this is not true in every situation. As Gleb Bahmutov remarks in this Semaphore Uncut Episode, we can imagine scenarios where E2E tests are easy to maintain:

“As for end-to-end tests, they operate like a user. By definition, you test through the public interface of your website. If you change implementation under the hood, you can swap your whole backend. The test should not be concerned. The maintenance should actually be much lower.”

— Gleb Bahmutov, Semaphore Uncut

Every team, every project, every organization is different. As requirements change, a team may decide to respec the suite. Having the flexibility to stop and reevaluate the cost-benefit equation, and adjust as needed, is critical to reaching a low-effort and high confidence zone.

The role of CI/CD in your testing suite

All the “shapes” discussed in this post are valuable models. But none of them should be blindly followed.

As the testing suite grows and is rebalanced, CI/CD pipelines must also adapt. While the project is still young, you can maintain the process in the low-effort quadrant by running tests in sequence and putting fundamental jobs at the beginning of the pipeline. For instance, if you have mostly unit tests in your suite, running them at the beginning will help you fail fast.

Later, when the CI pipeline begins to struggle under the burden of multiple integration and E2E tests, you will find that unless you start to parallelize some workloads, your pipeline might become too slow.

As your CI/CD approaches the critical 10-minute mark, you will have to reorganize your pipelines and optimize any slow tests you have to keep the vital feedback loop fast and nimble.

Final thoughts

The testing pyramid model gives such benefits that it has survived for more than a decade. It introduced the healthy habit of thinking about testing and has set a common language across the industry.

Yet, the pyramid is not as fresh as it used to be. New practices, technologies, and cultural changes mean that the pyramid makes less sense than before. Cracks have begun to show. As a result, alternative models have appeared and will continue to appear.

In the end, you have to decide which approach is best for your project. What does your test suite look like?
Thanks for reading.