DEV Community

Bartek Żyliński
Bartek Żyliński

Posted on • Originally published at pasksoftware.com on

Test Pyramid: Best Practices For A Reliable Test Suite

Testing our code is essential for maintaining the high quality of our code. In the long term, tests are crucial to ensure that we have maintainable software at all. Today I will dive into the Test Pyramid and present a way how you can structure your tests to get the most out of them. If you want to know other best practises for tests, check FIRST.

However, before we dive into the Test Pyramid, let’s take a look at different types of tests that we have.

Tests Taxonomy

  • Unit Test

Simplest test intended to verify correctness for singular methods or functions in isolation.

  • Integration Test

Verify the interaction between different modules our applications have, usually one at a time, identifying issues at the interfaces between integrated parts.

  • E2E Tests

High-level tests that verify the whole flow correctness, from providing input to validating output on the opposite end. They validate if the application works well as a whole.

  • Smoke Tests

Very simple tests that run on an up-and-running system, usually just after deploying a new version, to ensure that the most critical features are working as expected—a kind of sanity check of our system.

  • Contract Tests

Validate if two sides of some arbitrary interaction are compatible with one another. They check whether the responses from one side of the interaction match the expectations of the other, and vice versa.

  • Performance Tests

This type of test verifies if the performance of our applications meets the requirements, usually done on a setup as similar to production as possible and in the scope of the whole system.

  • Pen-Test/Security

A very diverse catch-all term for all the checks and tests that verify the security of our system.

  • Chaos Testing/Engineering

It is more an approach than an actual test. Chaos Engineering is aimed at testing system resilience by extreme measures. It works by introducing unpredictable but intentional and traceable failures into the working environment.

These are not all the types of tests out there, but the exact list depends on whom you ask and how far into categorizing you are willing to get. I believe that the types mentioned above are the most crucial ones, and we will focus on them in today’s text. I also believe that they are the reasonable ones.

Original Test Pyramid

Test Pyramid a concept used to describe the test setup to which a system should aspire, visually. It consists of different types of tests. The test types are sorted so that the base is represented by the test type of the highest quantity. Moving higher in the pyramid, each level is represented by the type with a lower number of tests in the overall set.

In my opinion, the best representation of this test pyramid is presented by Robert C. Martin in his book, The Clean Coder: A Code of Conduct for Professional Programmers.

Uncle bob pyramid

Basically, we should have a high number of unit tests as a base, though having only a small set of integration and E2E tests. Performance and security tests are included under System tests.

This approach has a few good points like:

  • Fast and cost-effective feedback

Unit tests are fairly easy to set up and, at least by the book, should run quickly, reducing the feedback loop for the developer.

  • It is CI/CD friendly

Having fewer complex tests like E2E and integration tests promises that it would be simpler to set up CI/CD jobs. Besides CI runs faster with less integration and E2E tests.

  • Reliability

Unit, component and integration tests are less flaky and less complex than full E2E tests. Thus, we have smaller chances of any non-deterministic errors while introducing new tests and/or changing our test environment.

Additionally, as a whole, the Test Pyramid provides a clear and ready framework on how one should structure tests to get a more reliable system.

Still, while having all these benefits, it is not free of drawbacks, which I will describe in the following paragraph.

Why It Is Not Enough

Well, the first and most important problem in terms of the original test pyramid is the over-reliance on Unit Tests. Such over-reliance introduces a set of problems to our application:

  1. Striving to have a high coverage of unit tests in your applications may not necessarily be a good idea. While fast and easy to build, it is very easy to dig too deep into unit testing your code. In such a case, any further changes related to this component may require a lot of additional work.

  2. Unit tests are not suitable for every project life cycle phase; sometimes even writing proper unit tests may not be possible at all, thus you will have to heavily rely on mocks.

  3. The current shape of the pyramid can give a false sense of security, as you have a few tests that actually test the “living, breathing” system. While on unit and integration levels all things may appear right, they may not work correctly as a whole unit.

  4. In its current shape, we do not have a large space for non-functional tests, like security tests or performance tests. It also does not mention contract or smoke tests.

Last but not least, remember that the test pyramid is a concept, and as with every concept, there is no need to blindly adhere to it if you do not see any sense. Remove one layer or more of the pyramid if it does not make sense for you.

Test Pyramid Per Use Case

If the original test pyramid is not enough, and I still want to have some guidelines for tests, what then? Well, let’s throw the test pyramid away and just make a priority list of tests. Let’s iterate from the most to the least important type of tests that you need to have. Additionally, let’s make it on a case-by-case basis.

Change Heavy

Let’s start from the change heavy case. It does not have to be startup, it can be anytype of greenfield or just a new service. Well, here you can go with even zero tests; you probably need velocity and quick customer feedback, not tests. You need freedom to break stuff and rebuild them quickly, not rewriting all the tests from the ground up.

Here I would recommend focusing on E2E tests for paths that are the most crucial for you. Paths that are your main selling points and competitive advantages. While problematic in case of need for more velocity, I believe such a setup will benefit you the most, and will give you feedback on the operation of your most important parts.

I would recommend some unit tests if you have some algorithm-heavy or complex logic inside your codebase, especially if it is crucial for your operations and impacts customers directly.

What is more, I would suggest doing some performance tests before going live—going viral on day one in this way is probably not a desired result.

If, by some miracle, you still have time to spare, set up some monitoring for the service. Trust me, it will be worth the time and the effort.

Stable

Opposite to the change heavy API, where everything may need to be changed and rewritten from scratch, here we have a system without such events—at least not frequently. We have infrequent changes, or the change impacts only a small subset of features.

In such a case, I would recommend going into the following structure: required integration tests, E2E tests, smoke tests, maybe security and performance tests, and consider contract tests if you are exposing an API.

Following such a structure will give you:

  • Real-life guarantees as to your system’s operations.
  • Freedom to change underlying implementation without the need to change your tests.
  • A tool for finding problems in your integrations with 3rd party providers.
  • A tool to quickly ensure your system is working correctly after deploying the system.
  • A lot of insight from security and performance tests.

Service Oriented Architecture

This case is kind of a tricky one, as different services may be owned by different teams, and in general, it should be their decision how they want to test their component. However, I believe that there should be a recommendation or best practice to have contract tests for every component, which exposes any type of API. Thanks to following this you will have extract guarantees after any type of change in one of your services.

If your design is mature enough, you can try introducing chaos engineering and see what results it will yield. System-wide pen-tests can also be a good idea, better done collectively rather than individually. Some additional problems may occur in service as a whole.

Besides that, I would recommend having systems wide requirements for observability—maybe some preset dashboards, alerts, system-wide best practices. I think that it will give the teams some frameworks they can easily adopt for their unique cases.

As for the individual services, I would not recommend anything specific; pick the tests that suits your use case the best.

Monolith

This case is a kind of mix of all the previous ones. I recommend choosing your approach based on how frequent the changes are and what is changing. Remember to take into consideration the coupling between different components inside the monolith.

If you frequently change the inside of the monolith, not the interface, then go for E2E tests. On the other hand, if you frequently revise the API, then go for whatever is closer to unit tests you can get. Do the same if you cannot set up E2E in any way, or it is too complex to be actually worth it.

If there is a high coupling between different components, or the boundaries between them are blurry, maybe try writing something akin to “E2E tests” on a higher component level.

If it is not there yet, try to set up well-defined logs, metrics, and possible alerts, as close to per-component basis as possible.

Test Pyramid Common Parts

Besides structures that I mentioned before, there are a couple of different tools that may help you build more reliable systems. Not all of them are mandatory—maybe besides monitoring (this one, in my opinion, is a must-have). Pick the ones that you think will help you.

However, try to think through all of them; I believe that it will be time well spent nevertheless.

Performance Tests

While not all systems and modules have strict performance requirements, it may be beneficial to have some performance tests.

We can provide additional insights for our product or business:

  • We know how far we can scale if the need arises at some point.
  • We can notice that some feature negatively impacts our performance.

I know it may not be the most crucial part for non-critical systems. However, at least we know about the issue and can make a decision on what to do with it instead of just letting it through.

Pen-tests / Security Tests

Again, as with performance tests, not all services and systems require these. Nevertheless, it may be beneficial to at least entertain the idea. You may find some interesting insights along the way. The exact scope and scale greatly depends on a number of various factors. If you want to know more about security, I write on this topic in more detail elsewhere.

ArchUnit Tests

I think that for all four cases it may be worth to try writing some tests in ArchUnit fashion. At least when your code structure will stabilize. While it may seem like a wasted time, it will for sure help you keep your code in shape for longer.

Observability

Tests are not the only thing that you will need to create robust systems. The whole infrastructure part around your system may be even more crucial than the tests in ensuring flawless operation of your systems.

As an addition to your tests, you should also have good logging, metrics, and possibly alerts. They will give you additional insight into the operations of your systems. They will also polish some rough edges around your tests and may help identify some bottlenecks not caught in the tests.

Chaos Engineering/Testing

Probably the most complex concept to implement correctly. While deliberately introducing any type of disruptions or failures into otherwise perfectly working system seem not the brightest idea. It can help identify weaknesses and problems that will not show up in any other case.

However, this type of “tests” is very, very complex. Introducing failures—no matter if they are intentional or not—is never fully safe. Before going head-on with this, double-check that your software and infrastructure are actually ready to live it through.

Test Pyramid Trade-off & Considerations

Before we jump to the conclusion, there are a couple of trade-offs and assumptions that I think you should take into consideration while picking the tests that you want to use:

  1. Time limits

One of the considerations when picking, which tests to focus on is time restrictions. If you have very strict limitations on how long your tests can run, then focusing on unit tests, and some integrations would be better than going for a full E2E test set, and vice versa.

  1. Integration tests

In my opinion, a database is not a good case for integration tests nowadays. Integration tests should be used only for 3rd-party services that have complex behavior and cannot be easily tested in E2E tests. If you have such dependencies in your system, then that is, in my opinion, the only valid point to write integration tests. The database layer can be tested in the E2E test layer.

  1. Unit tests

I believe that unit tests should only cover the algorithm/logic-heavy pieces of code. There is no point in trying to reach higher coverage tiers with unit tests. In my opinion, it is better to focus on E2E tests. Sometimes, especially for poorly design architectures, writing actual unit tests is much harder than it looks.

  1. Setup complexity

In some cases, it may not be an option to create E2E or unit tests. In such a case, pick the one, which is easier to set up and maintain and gives you more reliability. It may be reasonable to change your architecture/design to be more testable.

  1. Over-reliance on mocks

While writing any type of test, be careful not to overuse mocking and/or stubbing. You can easily start testing mock and stub behaviors instead of the actual code.

  1. Test implementation

For unit tests, do not go too deep into testing your behavior. Try to test interfaces, not the content of your methods. For E2E tests, try to use as much of the actual components as you can. Do not write your own stubs until you have to, testcontainersmay come in very handy here.

Summary

Let’s start with a table to show concepts from previous paragraphs in a clear and concise manner.

Per Type Of Environment You Want To Run Your Tests

Type Base Optional
Change Heavy - E2E for crucial parts of API
- Good observability pipeline (from logs to alerts)
- Smoke tests for crucial paths
- Performance tests for crucial parts
- Security tests
- Unit tests for logic/algorithm-heavy parts
- Integration tests for 3rd-party services
Stable - E2E
- Good observability pipeline (from logs to alerts)
- Integration tests for 3rd-party services
- Unit tests for logic/algorithm-heavy parts
- Performance tests for crucial parts
- Security tests
- Consider if you need Smoke Tests and their scope
Service Oriented Architecture - Contract tests for services exposing APIs used by other services
- Choose exact test setup per service
- Design base observability approaches for each team to adopt and extend
- System-wide Performance tests
- System-wide Security tests
- Consider Chaos Engineering
Monolith - Pick the tests that are easier to set up and maintain
- Good observability pipeline (from logs to alerts)
- Smoke tests
- System-wide Performance tests
- System-wide Security tests

Per Test Type

Test Type / Environment Change Heavy Stable Service Based Monolith
Unit No Logic heavy methods Per service basis Depends on the setup cost
Integration Consider for 3rd party service 3rd party service Per service basis 3rd party service
E2E For critical paths Mandatory Per service basis Depends on the setup cost
Contract No When and where applicable Recommended for all services No
Performance For consideration Yes Per service basis System wide
Smoke Consider for critical path Consider for critical path Per service basis Consider for critical path
Security For consideration Yes System wide System wide
Observability Yes Yes Predefined rules Yes

It is not a perfect silver bullet for every case—there is no such thing or recommendation. Everything here is based on different trade-offs, some of them are mentioned in the paragraphs above.

My final recommendation is: Just write the best tests that you can, given your design and possibilities.

Thank you for your time.

Blog Test Pyramid: Best Practices For A Reliable Test Suite from Pask Software.

Top comments (0)