Flaky tests are non-deterministic tests in your test suite. They may be intermittently passing or failing, making test results unreliable.
Why are flaky tests bad?
- Developer productivity goes down as test results become inaccurate and trust in the test-suite decreases.
- Multiple, unrelated commits cause similar errors, making maintenance difficult.
- Legitimate issues may get ignored due to a high number of false positives.
- Repetitive work is required to determine if bugs exist at all.
- Diagnostics time increases as errors can be in the test or code.
- User dissatisfaction due to bugs ending up in production.
Let's look at five common causes for flaky tests showing up in your build pipeline and what you can do about them.
One of the common reasons why tests do not find bugs is concurrency. They occur because developers may have made incorrect assumptions about the ordering of operations between threads. One test thread might be assuming a state for shared resources like data or memory.
For example, test 2 might assume test 1 passes and use test 1’s output as an input for itself. Or test 2 might assume that test 1 leaves a data variable in state x, but test 1 may not always do that – causing test 2 to fail. Tests can also be flaky if they do not correctly acquire and release shared resources between them.
- Use synchronization blocks between tests.
- Change the test to accept a wider range of behaviors.
- Remove dependencies between tests.
- Explicitly set static variables to their default value.
- Use resource pools - your tests can acquire and return resources to the pool.
Decreased control of your test environment increases the chances of test unpredictability. Flaky tests can occur when your test suite is dependent on unreliable third-party APIs or functionality maintained by another team. These tests may intermittently fail due to third-party system errors, unreliable network connections, or third-party contract changes.
- Use test stubs or test doubles to replace the third-party dependency. Your regular tests can talk to the double instead of the external source.
- Test doubles will not detect API contract changes. You will need to develop a separate suite of integration contract tests for this.
- Contract tests can be run separately and need not break the build the same as other tests. They can be run less frequently and be actioned independently of other bugs.
- Communicate with the third-party provider to discuss the impact of changes made by them on your system.
Test infrastructure failure is one of the common causes for flaky tests. These include network outages, database issues, Continuous Integration Node Failure, etc.
- These issues are typically easier to spot than others. Your debugging process can check these first before attempting to find other causes.
- Write fewer end-to-end tests and more unit tests.
- Run tests on real devices instead of emulators or simulators.
UI tests are used to test visual logic, browser compatibility, animation, etc. Since they start at the browser level, they can be very flaky due to a variety of reasons – from missing HTML elements, cookie changes, etc. to actual system issues. If you visualize your test suite as a pyramid, UI tests are at the top. They should only occupy a small portion of your test portfolio because they are brittle, expensive to maintain, and time-consuming to run.
- Don’t use UI tests to test back-end logic.
- Capture the network layer using Chrome DevTools Protocol(CDP). CDP allows for tools to inspect, debug, and profile Chromium, Chrome, and other Blink-based browsers
Not following good test writing practices can result in a large number of flaky tests in your pipeline. Some common mistakes include:
- Not adopting a testing framework even as code complexity and team size increases.
- Caching data. Over time, cached data may become stale affecting test results.
- Using random number generators without accounting for the full range of possibilities.
- Using floating-point operations without paying attention to underflows and overflows.
- Making assumptions about the order of elements in an unordered collection.
- Using sleep statements to make your test wait for a state change. Sleep statements are imprecise and one of the biggest causes of flaky tests. It is better to replace them with the waitFor() function.
- Treat automation testing like any other software development effort. Make testing a shared responsibility between developers and analysts.
- Use tools to monitor test flakiness. If the flakiness is too high, the tool can quarantine the test, (removing it from the critical path) and help resolve issues faster.
- Start all tests in a known state.
- Avoid hardcoding test data.
The unfortunate answer is, no, there is no silver bullet that entirely eliminates flakiness. Even high-performing teams like Google have reported at least some flakiness in 16% of their test suite.
The best way to deal with the issue is by monitoring test-health and having both short-term and long-term mitigation strategies in place. If flaky tests are a severe problem for your team, or if this is a general topic of interest, email us at firstname.lastname@example.org to get an invite to our private beta group for Flaky Bot, a tool to help manage flaky test infrastructure better.