DEV Community

Cover image for Mutation Testing: delicious concept you will rarely use in practice
force_push
force_push

Posted on

Mutation Testing: delicious concept you will rarely use in practice

Mutation Testing

Code coverage is a fairly standard metric, and many of us use it to assess unit testing. After all, if certain lines of code are never called by tests, it is likely the scenarios that use them are not fully tested.

However, can the fact that a line of code is called by a test guarantee that the covered logic is tested thoroughly? At this stage, you face the need for a qualitative assessment of tests rather than a quantitative one.

The idea of Mutation Testing is to randomly change the tested code, creating different variants of behavior that were not anticipated by the developer - mutants. Then, your unit tests run over the different mutations: if a test fails - a mutant is detected (and "killed" in terms of MT) and the test is strong enough, if not (a mutant "survived") - the test does not cover all possible variants of program behavior and is potentially vulnerable.

Pitest

The Mutation Testing concept became popular quite many years ago and now it is implemented in several frameworks. One of the most successful and popular is Pitest.

Comparison of Java MT systems (from https://pitest.org/java_mutation_testing_systems/)

Comparison of Java MT systems (from here)

To make it work all you need is to have a code and some tests over it and add Pitest dependencies to your maven or gradle config.

During the execution the plugin:

  1. Detects covered lines of code (there is no sense in mutating uncovered)
  2. Copies corresponding class files in memory (no changes in a working directory by default)
  3. Mutates some specific places in copies according to the configurable list of allowed mutators
  4. Runs your tests over them and provides a report in different formats

Talking about changes in the source code we are considering some specific allowable list of mutators: there is no sense to change our code in a literally random way to make syntax or structural errors. But if you replace:

  • plus with a minus in math expressions;
  • true with false in logical;
  • link to an object with null in return statement;
  • etc.

it may help you to test your test's quality safely.

Full list of mutators implemented in Pitest you can find here.

Practice! Let's write some code

Let's create a simple maven project with only one class \tools\IntComparator.java.
Then we need to implement a single method that can receive two int arguments, compare them and return "first" if the first is greater and "second" otherwise:

package tools;

class IntComparator {

    String max(int a, int b) {
        if (a > b) {
            return "first";
        } else {
            return "second";
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Write a test

We will use JUnit for testing our application so we need to add a dependency to pom.xml:

    <dependencies>
        <dependency>
            <groupId>org.junit.jupiter</groupId>
            <artifactId>junit-jupiter-engine</artifactId>
            <version>5.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
Enter fullscreen mode Exit fullscreen mode

After that let's create a test.
Our testing scenarios are quite simple:
1) 5 > 3 -> "first"
2) 3 < 5 -> "second"

package tools;

import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.assertEquals;

class IntComparatorTest {

    private final IntComparator intComparator = new IntComparator();

    @Test
    void testMax() {
        assertEquals("first", intComparator.max(5, 3));
        assertEquals("second", intComparator.max(3, 5));
    }
}
Enter fullscreen mode Exit fullscreen mode

Now, if you run it you will see that test passes successfully. Moreover, if you calculate a code coverage it will be 100%.

Image description

Does it mean that all cases are covered?
No.

Let's mutate!

Add pitest dependency to the dependencies section of pom.xml:

<dependency>
    <groupId>org.pitest</groupId>
    <artifactId>pitest-parent</artifactId>
    <version>1.11.5</version>
    <type>pom</type>
</dependency>
Enter fullscreen mode Exit fullscreen mode

and a plugin to build section:

    <build>
        <plugins>
            <plugin>
                <groupId>org.pitest</groupId>
                <artifactId>pitest-maven</artifactId>
                <version>1.11.5</version>
                <dependencies>
                    <dependency>
                        <groupId>org.pitest</groupId>
                        <artifactId>pitest-junit5-plugin</artifactId>
                        <version>1.1.1</version>
                    </dependency>
                </dependencies>
                <configuration>
                    <targetClasses>
                        <param>tools.*</param>
                    </targetClasses>
                    <targetTests>
                        <param>tools.*</param>
                    </targetTests>
                </configuration>
            </plugin>
        </plugins>
    </build>
Enter fullscreen mode Exit fullscreen mode

And after that, let's run our new plugin to start MT:

mvn clean verify pitest:mutationCoverage 
Enter fullscreen mode Exit fullscreen mode

This command generates two reports: in the text format to the console and additionally to an .html file in your target folder.

================================================================================
- Mutators
================================================================================
> org.pitest.mutationtest.engine.gregor.mutators.ConditionalsBoundaryMutator
>> Generated 1 Killed 0 (0%)
> KILLED 0 SURVIVED 1 TIMED_OUT 0 NON_VIABLE 0 
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0 
> NO_COVERAGE 0 
--------------------------------------------------------------------------------
> org.pitest.mutationtest.engine.gregor.mutators.returns.EmptyObjectReturnValsMutator
>> Generated 2 Killed 2 (100%)
> KILLED 2 SURVIVED 0 TIMED_OUT 0 NON_VIABLE 0 
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0 
> NO_COVERAGE 0 
--------------------------------------------------------------------------------
> org.pitest.mutationtest.engine.gregor.mutators.NegateConditionalsMutator
>> Generated 1 Killed 1 (100%)
> KILLED 1 SURVIVED 0 TIMED_OUT 0 NON_VIABLE 0 
> MEMORY_ERROR 0 NOT_STARTED 0 STARTED 0 RUN_ERROR 0 
> NO_COVERAGE 0 
--------------------------------------------------------------------------------
================================================================================
- Timings
================================================================================
> pre-scan for mutations : < 1 second
> scan classpath : < 1 second
> coverage and dependency analysis : < 1 second
> build mutation tests : < 1 second
> run mutation analysis : < 1 second
--------------------------------------------------------------------------------
> Total  : < 1 second
--------------------------------------------------------------------------------
================================================================================
- Statistics
================================================================================
>> Line Coverage (for mutated classes only): 4/4 (100%)
>> Generated 4 mutations Killed 3 (75%)
>> Mutations with no coverage 0. Test strength 75%
>> Ran 4 tests (1 tests per mutation)
Enhanced functionality available at https://www.arcmutate.com/
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
Enter fullscreen mode Exit fullscreen mode

Here we can see a list of mutators that were used to test our code: mutants provided by NegateConditionalsMutator and EmptyObjectReturnValsMutator were successfully killed.
It means that Pitest replaced > to < in line 6 of IntComparator.java and the first assertion in our test failed because the mutant returns "second" instead of "first".
Or, for another mutator, the code returns "" instead of the expected string and also was detected and killed.

Image description

In the .html report we can also find a list of active mutators.

As for the failed test generated by ConditionalsBoundaryMutator, this mutator changes condition boundaries.
The condition was changed from a > b to a >= b. That mutant survived because in the test we don't check an expected result for the case when a == b.

Kill the mutant!

Simply add missing assertion to the unit test:

@Test
void testMax() {
    assertEquals("first", intComparator.max(5, 3));
    assertEquals("second", intComparator.max(3, 5));
    assertEquals("second", intComparator.max(4, 4));
}
Enter fullscreen mode Exit fullscreen mode

And re-run pitest plugin:

Image description

Great! MT clearly found the vulnerability and helped our test become better. So why hasn't the use of MT become an industry standard yet?

Real-life problems

False positives

I'm going to change an example above just a little bit to demonstrate a problem.
From now the tested method will return int result instead of String. At the same time, we will keep all assertions in the test.

IntComparator:

int max(int a, int b) {
    if (a > b) {
        return a;
    } else {
        return b;
    }
}
Enter fullscreen mode Exit fullscreen mode

IntComparatorTest:

@Test
void testMax() {
    assertEquals(5, intComparator.max(5, 3));
    assertEquals(5, intComparator.max(3, 5));
    assertEquals(4, intComparator.max(4, 4));
}
Enter fullscreen mode Exit fullscreen mode

Check the results:

Image description

What happened? All the cases were covered! But we again have a mutant that survived. There is no matter for the test which value was returned a or b because it is just a primitive value.
This is a typical example of a false positive result when the test is stricter than you really need. Searching through the internet you will find that it is quite a popular complaint.

Image description

Performance

Mutation Testing is not fast. This is not a secret and there is no need to have illusions about it. Pitest states directly on its landing page that it is fast, but immediately explains what "fast" means in this context.

Image description

By experimenting with different sets of mutators, using several threads or thanks to optimizations within the engine (for example, avoiding the creation of useless subsumed mutants), it is possible to achieve a decent reduction in the time it takes to perform mutationion testing. However, it will still be significant and will add minutes or tens of minutes to the testing process.

I also recommend this post about the real-life experience of MT and performance analysis in particular.

Recommended way of usage

In Dec 2016, Java Magazine published an article by Henry Coles, the author of Pitest, titled "Mutation Testing: Automate the Search for Imperfect Tests". In the article, he discusses the possibility of using MT on real and large projects and, to sum up the main idea in one sentence, it would be:

The only code you need to perform mutation testing on is code that you've just written or changed.

Then this idea was developed and described in the article "Don't let your code dry", which is now posted on the Pitest blog.

Speaking more about implementation in the context of Pitest, the author explains how incremental testing can be practically implemented: using the local plugin launch (which, by the way, is integrated with GIT and can only access files that have been added or changed) or by running MT during pull request analysis.

Thus, it can be said that adding mutation testing to the entire project's CI pipeline is probably not the best idea. On the other hand, using an incremental approach will minimize the impact of the problem factors described above and make mutation testing a more interesting instrument.

Conclusion

Mutation Testing is not a silver bullet!
We use the word strong to describe tests that achieve high mutation coverage because they are able to detect and kill most mutants, indicating that they are effective at catching potential bugs in the code.

However, even if tests are strong, it does not mean that they are good. Will they pass if the implementation changes but the behavior is kept the same? Do they smell? Are they quick, and do not impose unnecessary overhead?

Overall, while strong tests are a good indicator of test effectiveness, other factors should be considered to ensure that the tests are reliable and comprehensive.

My personal problem with regard to Mutation Testing turned out to be inflated expectations. The concept itself seems very exciting, but the more you dive into the details, the more you notice limitations. The main one, which is always on the surface, is that testing tools do not know the logic of your application, and no matter what variability or prediction methods they use, they only provide you with results for analysis. The final decision on the quality of tests is up to the developer or the team. And Mutation Testing will work effectively as long as you are willing to pay attention to its reports and process each case it identifies.

Would I use MT? For small projects based on algorithmic calculations, such as a core library for some product that requires high code quality - absolutely yes. For projects where quality requirements are not so critical - probably not.

In any case, I will closely monitor the evolution of this idea and in particular the Pitest project. Already now, it is a developed product integrated with GIT, Eclipse, IntelliJ, Sonar, etc. I am also inspired by the level of activity and support for the project from its author and community.

P.S.

Many are expecting the arrival of AI in development. Solutions that understand context and business logic will take analysis and testing tools to a new level. I think that Pitest is one of those projects that could unexpectedly evolve and become much more popular in the near future. I'm looking forward to it.

Top comments (0)