DEV Community

Jack Steel
Jack Steel

Posted on

Mutation Testing - A .NET Developer's Guide

Introduction

Mutation Tests allow us to evaluate and quantify how good our Unit Tests are. Your true 'test effectiveness score' could be thought of as Code Coverage Percentage * Mutation Score Percentage you can have 100% code coverage, but if your mutation score is 0% then all of those fancy tests are actually most likely worthless. In other words, Mutation Tests are tests for your tests.

Let's look at an example to illustrate how this works in practice.

Consider the following code

public int Add(int a, int b){
    return a + b;
}
Enter fullscreen mode Exit fullscreen mode

and this Unit Test

[Theory]
[InlineData(0,0,0)]
[InlineData(2,2,4)]
public void CanAdd(int a, int b, int expectedResult){
    var result = Add(a, b);
    Assert.Equal(expectedResult, result);
}
Enter fullscreen mode Exit fullscreen mode

Some of you may have already spotted the problem with this test (gold star for you!).
Let's run the test:

0 + 0 = 0 PASS
2 + 2 = 4 PASS
Enter fullscreen mode Exit fullscreen mode

Awesome, looks good to me!

Now imagine someone messes up a find-and-replace, or somehow makes the mistake of changing the Add method's code to this:

public int Add(int a, int b){
    return a * b;
}
Enter fullscreen mode Exit fullscreen mode

At first glance, you might assume that this is an obvious mistake that would easily be caught by unit tests failing and quickly fixed by the developer.
However, if we run the unit tests again:

0 * 0 = 0 PASS
2 * 2 = 4 PASS
Enter fullscreen mode Exit fullscreen mode

All our tests are still passing, but the behaviour of the system has fundamentally changed and in this particular case is definitely broken.

This is exactly what Mutation Tests do, they make small changes to the source code (a Mutant) and then run our Unit Tests against the changed code, if the Mutation causes a test to fail, then we've successfully Killed that Mutant (this is good), however if all our tests still pass then the Mutant has escaped and Survived (this is bad). The above example is somewhat unrealistic and trivial but I hope you can see how this concept extends to much more complex systems.

Enter Stryker

There are many different tools for doing mutation testing but Stryker Mutator is the standard usually used for .NET.
You can read more about it here
and try out an interactive example similar to the scenario described above here

Stryker has many mutation types, you can see a list of them all here it can do simple operator swaps like the one in the example above switching + for *, all the way up to removing entire blocks of code, or altering Regular Expression syntax

Making It Useful

Mutation Tests, by their very nature, are a slow process. For small libraries, this probably isn't a big deal - adding 30 seconds to your pipeline is no problem (yes I'm assuming you've got a test pipeline). However, if we want this to scale to enterprise software (and we do), you're gonna have more than 30 seconds of run time for mutation tests.

The largest service I had access to run this against generates about 3,000 mutations and, when running in a pipeline, takes about 4 hours to complete. If you're doing multiple releases per day then adding even 30 minutes to a build pipeline is a major problem, not to mention it sitting occupying one of your available build agents for a significant amount of time. You'll quickly get complaints from other devs that their changes are being delayed by your multiple-hour pipelines. Our target for pipeline duration is about 15 minutes, from a change being merged to it running in Production should take no longer than 15 minutes, so clearly we can't run mutation tests as a quality gate on deployments.

If we can't put this in as a deployment quality gate what are our options?

Option 1 - Run it Locally

We can advise it as a dev-run process that each dev should run mutation tests locally when they're happy with their change and ready to merge and hope that they pay attention to the results if they introduce code that drops the mutation score off a cliff.

This has the obvious downside of relying on people to follow the process (in my experience, this would likely be forgotten after a week or two) and now we're back to square one.

The above is exacerbated by mutation tests being slow, no one wants to have to remember to run another tool locally, especially if that tool has the ability to pin their CPU for an hour.
Relevant XKCD
Although I do like the idea of this excuse, I don't think it's worth a return to extremely long compilation jobs (sorry Rust™ devs).

Another problem with this approach is observability. Unless we add another step in the process of dumping the report somewhere shared for historical stats we lose all ability to build pretty graphs, and isn't that half the point??

Option 2 - A new Pipeline

No one ever said we had to have one test pipeline per service. If we split the mutation tests into their own pipeline that just runs mutation tests and pipes the results off to other services for analysis then that might be the best we can do until someone comes up with some sort of quantum-everything-at-once-mutation-tests that makes them Blazingly Fast™.

We still can't use them as a quality gate on deployments but we can use them as a quality alert, if the mutation score takes a nosedive over a given timeframe we can alert the relevant team that they need to take a look at it.

We can run this new pipeline on any schedule that makes sense for your application, if you're making changes frequently maybe you want to run it nightly. If your services can go a while without much active development on individual services then maybe weekly or more is more appropriate. We've gone with weekly on Sunday nights for now, maybe that will change in the future as we see how the stats change over time. So devs get to log in on Monday morning and be greeted with some (hopefully in the green) stats and pretty graphs.

If any of this sounds interesting and you want to give it a go, take a look at the sections below for Setting Up Stryker For The First Time or Using Stryker.

Setting Up Stryker For The First Time

This section covers setting up Stryker for the first time on a new or existing project, you should only need to do this once per project, after that it should Just Work™

Note: All of the commands in this guide are being run from your solution's root directory (the same directory your .sln file is in)

We're going to install the Stryker Tool in the project's tool manifest, this ensures that everyone else can easily install and update Stryker on that project.

> dotnet new tool-manifest
Enter fullscreen mode Exit fullscreen mode

Great, that should have created the dotnet-tools.json file, most likely in the .config directory
Check this file into source control

We can now actually install Stryker locally with the following command

> dotnet tool install dotnet-stryker
Enter fullscreen mode Exit fullscreen mode

You should get a success message about Stryker being installed and added to the tool manifest

Finally, we want to configure Stryker to run correctly on our project. In the same directory as your .sln file add a new file named stryker-config.json. Then paste in the content below and I'll walk you through what needs to be changed and what you can configure to your preference
Check this file into source control

{
  "stryker-config": {
    "solution": "YourSolution.sln",
    "target-framework": "net6.0",
    "mutation-level": "Complete",
    "reporters": [ "html", "progress", "markdown" ],
    "additional-timeout": 5000,
    "thresholds": {
      "high": 70,
      "low": 30,
      "break": 1
    },
    "coverage-analysis": "perTestInIsolation",
  }
}
Enter fullscreen mode Exit fullscreen mode

Configuration Explanation

Solution

This MUST be the exact name of your .sln file. It must not include any filepath /s or ..s it is not the filepath to your solution it is just the filename.

Target-Framework

If your project supports multiple build targets and this config is not set Stryker will pick a random build target from your options each time it runs.

Mutation-Level

This defines the amount of mutation you want Stryker to do, for the best results this should be set to Complete or Advanced when committing to source control. However, you may want to lower it to Basic or Standard when running locally to save yourself time (if the mutation you're investigating is in those levels). See here for more details on the possible options:

Reporters

This array defines any report formats you want to generate, you probably want html for user-friendliness and progress for progress visibility when running, and then maybe json if you're doing anything fancy with the output like we are.

Additional Timeout

Some mutations can cause infinite loops. The timeout to cancel a test is the initialTestTime + additionalTimeout if you run into problems with tests that vary in running time significantly then you may want to increase this setting.

Thresholds

This defines our expectations for the mutation score on this project, much like code coverage thresholds you can set high, low, and break. If the mutation score is below the break threshold then Stryker will exit with a non-zero exit code and fail the pipeline.
They also define the boundaries used for colour coding the mutation report.

Coverage-Analysis

You probably want to use perTestInIsolation for most cases.
You can read the details of what each setting does here.
perTestInIsolation strikes a balance of best accuracy and fastest run time at the expense of a longer start-up analysis time.

Using Stryker

Once Stryker has been set up and your pipeline and configuration are merged to the main branch, you (and any other devs) can now use it to their heart's content.

Locally

If you're cloning a project that includes mutation test configuration for the first time then you'll need to run the following command to install Stryker locally for that project

> dotnet tool restore
Enter fullscreen mode Exit fullscreen mode

Then to run Stryker make sure your terminal working directory is the same directory your .sln file is in (see also) and run the command

> dotnet stryker
Enter fullscreen mode Exit fullscreen mode

Easy as that, you should soon see logs of Stryker working on analysing your project.

If you're trying to diagnose a configuration issue or problem getting Stryker working I recommend running with trace log verbosity

> dotnet stryker --verbosity trace --log-to-file
Enter fullscreen mode Exit fullscreen mode

Gotchas and Good-to-Knows

Mutant States and Scores

There are several states a mutant can end up in. However, for the mutation score calculation, only a few are relevant.
The mutation score formula is Score = DetectedMutants / (DetectedMutants + UndetectedMutants)
Where DetectedMutants is the number of Mutants that were Killed or Timed Out and UndetectedMutants is the number of Mutants that Survived or had No Coverage.
This means that No Coverage mutants can lower the overall mutation score, if you have excluded any tests from being run during Mutation Testing then any code they cover could be mutated and produce a No Coverage mutant.
If you have a lot of these types of tests and can't cover the code with Unit Tests that can be run in a mutation context then this is important to keep in mind when evaluating scores. You should take a look at the Survived, Killed, and No Coverage counts more closely in the report rather than just the overall percentage score.

Failures in CI

Sometimes your pipeline run may fail due to an error along the lines of "System.Attribute is not defined". This is an ongoing (and extremely annoying) bug with Stryker or one of its dependencies (https://github.com/stryker-mutator/stryker-net/issues/931) with an unknown root cause. Some of these options may help you get the pipeline working again:

  • Set disable-mix-mutants to true
  • Turn on log verbosity trace in the config file and check which files are logged in relation to this error, then ignore mutations in those files using the mutate configuration option. If none of those help, it may be intermittent enough that you can just re-run the pipeline a couple of times. If the pipeline consistently fails then please check the Stryker version being used, this problem has been observed with version 3.10.0, hopefully, it will be fixed in later versions (which may be released by the time you read this)

In Closing

Mutation tests are extremely helpful for ensuring you maintain a high quality of tests over time. Requirements change, old tests get forgotten about, people leave and the context of features can be lost. It is easier than you'd expect to end up with a collection of tests that fly under the radar, always passing but no longer actually testing what they should. In the best case, this adds some necessary computation to your test runs, in the worst case these false positives give you false confidence in your changes.

Mutation testing can help catch bad tests. They will probably never be fast enough to put in as a quality gate on any service that you want to release hyper-frequently but they provide a useful metric to objectively evaluate the quality of your tests. Often we accept sub-par code in tests when reviewing code changes because "they're just tests"; mutation testing can help you find out if any actually problematic code has made it through review because of this more lax approach.

Top comments (0)