Written by Ramón Guijarro
If only because of this complexity, we should be testing our web applications to make sure that they behave as expected. As developers, we want to increase our confidence in the software we write, and that’s what tests provide. But making sure our webapps work is even more important if we think about the fact that they’re the entry point to our products. It can even be argued that your webapp is your product since it’s the thing your users are actually using. When users think about your product, they think about your UI, and all they care about is accomplishing tasks through it. So how do we go about testing it?
From a conceptual standpoint, besides picking specific testing tools and technologies, one of the first questions that arise is at which level we need to be testing and how many tests of each kind we should be writing. The classic test pyramid quickly comes to mind as an answer, but let’s see how it applies to modern component-based web applications — the kind built with libraries like React or Vue.
Following the test pyramid will get us writing lots of unit tests, as those constitute its base. In our webapp context, they’re usually interpreted as testing a single component completely isolated from the rest of the tree, mocking out all of its dependencies and subcomponents. However, this kind of test doesn’t very accurately reflect how people will actually use our app.
That last statement is especially true for those who believe that a test that writes to the DOM or receives user input cannot be considered a unit test but an integration test, because there is I/O involved or side effects in general. But while you can call it an integration test all you want, it’s probably not the kind of integration test the pyramid is referring to, as it is not noticeably more expensive to run. And hence the advice of not writing as many of them arguably doesn’t apply.
The technologies used for unit and integration tests of components are in fact usually the same — typically, a test runner like Jest that uses an emulated browser-like environment under the hood — so sometimes the distinction between them only comes down to who you’re asking.
On the other hand, end to end tests at the top of the pyramid much better reflect how users interact with the app. For web applications, these rely on tools that run your tests in an actual web browser, instead of an emulated DOM like the ones we just mentioned. This fact makes them conceptually different and forces you to test from the end user’s point of view. You can think of them as manual tests that are automated.
Historically, end to end tests have been slow, prone to flakiness, and hard to debug. However recent testing tools and frameworks, like Cypress or Puppeteer, are improving all of these aspects to the point where some people are even advocating to invert the test pyramid altogether — something generally considered an antipattern — on the basis that we ought to be testing exactly what the user is experiencing.
I personally wouldn’t go as far as inverting the test pyramid, but what is definitely becoming clearer nowadays is that our webapps could benefit from raising the level at which you’d typically write tests; moving it above single isolated components. Let’s see why.
Testing our components in isolation and mocking everything around them is not only a poor reflection of their real-world usage — hence not providing that much value in terms of confidence — but it almost inevitably leads to coupling our tests with their implementation details. This can get particularly bad with libraries such as Enzyme, which lets developers select nodes based on component names, arbitrarily modify their internal state, and skip rendering of all children altogether with shallow rendering.
Under this approach, it’s common that making almost any change to a component will break its tests, even if its API stays the same. If end users or consumers of the component would not notice changes, why should tests fail? Also, fixing the tests will sometimes force you to basically rewrite them from scratch. This means that those tests will actually hinder your ability to refactor and will never be able to catch regressions. So what’s their value then?
A lot of components in our applications are meant to be working in conjunction with others to form a larger component, a certain screen, or a feature. Features are what users care about and how we should measure confidence in our app. So instead of testing the individual components at the leaves of the tree — what would be usually referred to as unit testing them — look for the higher-level components that constitute true units in terms of features and test these without mocking its children. You will cover the real use case, the tests will take less effort to write and maintain, and you will be able to refactor all the subcomponents without breaking multiple tests.
To ensure that you’re testing the same way your users would use the app, it’s a good idea to rely on tools like Testing Library, since it gives you utilities to query for nodes similarly to how users would find them. And because its queries are based on ARIA roles, testing with it will force you to improve the accessibility of your app as a bonus.
This approach allows us to write tests that closely imitate real user interactions and are resilient to changes, just as end to end tests would do, which is exactly what we want. But they have the benefit of running much faster, since they’re not using a real browser.
So if we can get a similar level of confidence with integration tests, what are end to end tests good for then? An excellent use for them in our context is smoke testing. Build processes of modern webapps have many moving pieces and involve sophisticated tools like transpilers, bundlers or polyfills, with non-trivial configurations that are often different for development and production. That means that your production build could fail whilst the development one is working fine. So a simple test that opens your webapp in a browser and checks that it loads actually gives you quite some value for little investment.
Another use case for end to end tests is to cover the happy path of your most important user flows, exercising real APIs instead of having network requests mocked — either production ones, or in testing or staging environments. These tests might seem redundant, as well as more prone to exhibit flakiness since they hit real backend services, all on top of being slower. That’s why it’s advisable not to have a ton of them and maybe only run them before a deployment or as a nightly process. But they’re still relevant since they emulate the usage of your app in the most realistic way of all automated types of testing.
We’ve discussed an approach to tests based on emulated DOM technology that tries to get as close as possible to the benefits of end to end browser-based tests, and we’ve seen how too much mocking goes against our goals in this area. But you will still need and want to mock some things in your non-browser tests. So let’s briefly discuss what to mock and how.
You usually won’t want to render your whole app, but at the same time, you’ll want whatever global mechanisms you have in place to be available in your tests. This way, you can confidently rely on those in your code knowing that your tests won’t fail. For example, if you’re using React you will probably have some top-level context providers; it can be a good idea to write mocked versions of them and mount them in all your tests.
In a similar vein, you will want some functions used across your app to always be mocked. A typical example is date formatters: if you ever change the way you format dates in your app, you don’t want to have to modify assertions in every test under the sun. You can write unit tests for those functions to check that the formatting works as expected, and then have global mocks return a constant to ignore these in the rest of your tests. Testing frameworks like Jest allow you to define global mocks once for your own modules or third party dependencies. And you can always restore the original implementation for a particular test if you need to.
You will also want to mock network requests in your integration tests to ensure that they run fast and are not flaky. You can do that in exactly the same way, as long as all your requests eventually go through the same module. The key is to always encapsulate core functionality like this in reusable modules and consistently rely on them. This will not only make testing easier but improve the architecture of your application, avoiding duplication and reducing the risk of diverging implementations and duplicated errors.
Finally, there are legitimate reasons to mock components in a particular test, besides top-level or global ones. For example, you might want to have actual unit tests for some core components. That’s fine, but instead of using techniques like shallow rendering to completely prevent rendering all child components — which might unknowingly hide errors from you — explicitly mock only what you need. Again, Jest makes it easy with mock functions.
The easiest tests to maintain are the ones that you don’t need to write. Static analysis tools can automatically catch a lot of bugs for us, effectively saving us from writing certain kinds of tests. These tests tend to be repetitive and cumbersome, so help is even more welcomed.
A common source of bugs in modern web applications is the incorrect use of components by other consumers, like forgetting to set some mandatory property, or using a mismatching value for it — e.g., a string when a number is expected. This can be addressed with mechanisms like PropTypes in React, but these only warn you in runtime during development, if the particular component happens to be mounted, and via the browser console — so you can miss the warning anyway. A better alternative is to declare types for properties and state of components with tools like TypeScript or Flow, that perform static checking of the types to ensure that these issues are all caught at build time. They will also support you while making changes in your components and allow you to refactor with greater confidence.
We can leverage this ability to declare our own types to make our applications safer in some other ways. For example, we can write factory functions that receive data fetched from our backend APIs and process and sanitize it, returning objects safe to use in our components — what we’d call models. And we can define custom types for the shape of these models and export them to use as prop types of our components. This can help us gracefully handle mistakes in backend responses so that our UI doesn’t break, while still being able to log custom errors to a monitoring service like Sentry to be aware of the issues.
We can go even further down this road by using tools that automatically generate type definitions for our API responses. This allows us to use the generated types directly in our components without any extra work, providing a solid safety layer. Since GraphQL has a type system at its core, APIs based on it are particularly well-suited for this. Tools like GraphQL Code Generator can generate TypeScript and Flow types from a GraphQL schema, and next-generation systems like Prisma go one step further and generate them from the database itself.
Writing tests for our web frontend code is important because it’s the part of our product that our users will be directly interacting with. We want to simulate that interaction as faithfully as possible, so browser-based tests would be the way to go, but they’re too slow to use at scale. However, we can get pretty close with non-browser based tests if we write them at a high enough level, use selectors based on accessibility roles, and make sensible use of mocking. Adding a static type checker to the mix will further increase our confidence and prevent some common bugs.
So go ahead and write some tests for your webapp. Your users will unknowingly thank you.
Testing is a core competency to build great software. But testing has failed to keep up with the fundamental shift in how we build applications. Scope gives engineering teams production-level visibility on every test for every app — spanning mobile, monoliths, and microservices.
Your journey to better applications through better testing starts with Scope.