Pros and Cons of Quarantined Tests

#qa #automation #testing

Flaky tests, i.e., those that only fail sometimes, are the bane of any end-to-end automated test suite.

Another type of problem test is one that fails every time but which tests something that is deemed not important enough to fix right now. If you have to ignore some of the failed tests sooner or later you’re going to ignore one that you should have paid attention to. Or worse, you might decide to ignore them all because clearly no-one is fixing the bugs.

If a test is broken, fixing it should always be the first course of action, if possible. But what if some other task has a higher priority? If you’re confident that the problem is the test and not the software being tested, it might be reasonable to allow the test to keep failing, at least temporarily.

When you frequently ignore some failing tests, the whole suite is at risk of being seen as unreliable. A common way to prevent that is to quarantine the flaky/failing tests. Quarantine in this context refers to isolating the troublesome tests from the rest of the test suite. Not for fear of contagion, except in the sense of the negative impact they can have on the perception of the rest of the tests.

I think I first came across the concept in an article by Martin Fowler. It’s a great read on the topic of flaky tests and how to identify and resolve the causes of their flakiness. This post isn’t about how to fix them so check out that article if you’re after that kind of info.

More recently, an article on the Google Testing Blog mentioned the same technique for dealing with the same types of troublesome tests.

Even though quarantining tests can be a good temporary solution, if you don’t fix the tests (or the bugs) you can end up in the situation I mentioned before; a few failing tests create the impression that the entire suite is unreliable, enough so that you might consider them a death sentence.

My team and try to avoid that death sentence in a few ways:

Report quarantined test results separately from the rest of the test suite.

That way everyone can see the results of the reliable tests and know that a failure there is something that should be looked at immediately. We don’t have to try to identify the “true” failures among the flaky ones.
Tag quarantined tests with a reason they’re quarantined.

So flaky tests get tagged as such. Failing tests that aren’t going to get fixed for a while get reported and tagged with the issue number. Comments can be added if the tag isn’t sufficient. This isn’t enough to rescue a quarantined test from oblivion, but it can help avoid the potential problem of losing track of why a test was quarantined.
Schedule a regular review of quarantined tests.

If it’s not scheduled it’s not likely to happen. Failing tests can be assigned to someone to fix if priorities change, and time can be invested in fixing a flaky test if we decide it’s more important than we first thought.
Delete the test

If any test stays in quarantine for a long time it would be worthwhile rethinking the value the test provides. Maybe it turns out that unit tests, or even exploratory tests, provide enough coverage. Or the test might cover a part of the software that rarely changes, or which doesn’t get much use. In that case if there is a regression it’s not a big deal. We might @Ignore the test and leave a comment explaining why—instead of deleting it—if it seems likely someone might decide to write the test again.

How do you deal with flaky or failing tests that don’t get fixed quickly?

Top comments (4)

Josh Cheek • Jun 7 '18

Hmm. I absolutely hate tests that fail when nothing is wrong. My best tool is to reduce the strength of the assertion (at the most extreme end, turn it into a smoke test: run the code and assert nothing other than it doesn't explode). If it's based on something volatile, I might generate the data or assert "it seems to have the things I expect" rather than "it matches some fixture".

For flaky tests, it's a question of why they're flaky. If it's some async thing, maybe add a callback so the assertion doesn't happen until the async thing has occurred.

Fortunately, I've not been on a project where a consistently failing test is an acceptable thing to have.

But yeah, I've deleted tests that fail / removed environments that cause the tests to fail, when I can't easily get into that environment to test them myself, but know that they generally work (eg some specific version of the interpreter on an OS I don't have easy access to).

Mark Lapierre • Jun 7 '18

Yeah, I hate those tests too. Fortunately, we don't have too many of those anymore and we typically fix them quickly.

Most of our quarantined tests are legitimate failures, but the bug is a low priority. Honestly, I'd prefer not to run the test at all until the bug is fixed since we know it's going to fail. But this way we get some measure of part of our technical debt.

ran tene • Jun 7 '18

This is how we deal with flaky tests:
blogs.dropbox.com/tech/2018/05/how...

Mark Lapierre • Jun 7 '18

I like it! It's something we could do pretty easily too. We use JUnit so we could distinguish between real failures, which throw an AssertionError, and other failures, which throw some type of Exception