Five Reasons to Take Code Coverage With a Grain of Salt
scottshipp Oct 31 '17
Code coverage is important. I encourage anyone to do it, and I think it's an easy and valuable practice to keep for any software team. I am also not one of those people who thinks that unit testing is pointless, or who means integration testing when they say unit testing.
As a working software engineer, I love having high code coverage. My job is far more enjoyable in a codebase with high code coverage, because the amount of yak shaving I have to do during implementation goes down considerably.
But here's the thing about code coverage: I really have to make sure to take it with a grain of salt. Seeing a high number can almost be like having a couple shots of whiskey. Inhibitions are lost a little too easily. Discipline goes out the door a bit. Because, hey, the tests will catch me, right?
Wrong! Here are five reasons why code coverage can mislead even the best of us.
1. The type of code coverage used makes a huge difference in what a coverage number means.
I don't know about you, but everywhere I've worked, it's been a chore to get the code coverage somewhere decent because I've almost always inherited a project from someone, or some team, that left little to no unit tests behind. In which case, the team usually agrees to start measuring code coverage with statement (or line) coverage.
Statement coverage is low value, though. Some might argue that it is not a good measure at all, and I might agree, depending on the day. Branch and path coverage are both better at revealing places where key customer scenarios are left untested.
In short, code coverage should be a general indication of whether or not actual users have the functionality they need. Too often, high statement coverage is not matched to high user value. I recommend using branch coverage.
2. The type of application being tested can really alter what the number means.
What percentage should a software team aim for? 75%? 80%? 95%? While that's always a hard question to answer in its own right, I've learned it actually should vary depending on the application. Some applications are purely bridges between cooperating services. Others are tied closely to or even embedded in hardware. Still others are more about data than behavior.
For example, there are applications that are pretty much only a bunch of data classes and an interface to a data store. Much of the code is what I would call POJO's (because I'm currently a Java developer) or what might be called POCO's or POPO's or POO's in other languages.
Should data classes be covered just to meet a target coverage number? Proceed with caution! The options available are to cover them with low-value tests that then have to be maintained, exclude them from code coverage entirely, or leave them in and don't exclude them. I generally try to get things organized in a way that I can add some package exclusions to my coverage measurement tool and leave these out of the equation. But either way, whatever option I choose, my code coverage number then doesn't tell the full story.
3. Tests can execute lines of code but ultimately verify nothing.
A lot of the time, programmers learn to write unit tests by reading the examples given in the documentation for whatever testing or mocking framework they are using. A lot of that documentation is rightly oriented around how to use the framework features, and doesn't teach how to test. This leaves the door open for the common failure to actually test something.
Take, for example, the documentation for the popular JS testing framework Jasmine . Most of the guide gives tautological examples, tests that prove only what is written inside the test code itself. That's good for letting you quickly understand the available features of Jasmine, and bad for more novice developers who might mistakenly think that's a way to test. Soon, they're writing tests that verify nothing.
Even experienced developers occasionally lapse and write tests that verify nothing. And a code coverage number doesn't reveal that! So unless I actually remain disciplined and take the time to verify that my tests exercise the code and verify the result, my tests have high coverage but low value. On top of measuring code coverage, software teams should have careful code reviews that result in high-quality tests.
4. Code coverage doesn't indicate how well the application is designed.
Can I get good code coverage in a poorly designed application? Absolutely! Classes can be poorly designed but still testable. Making code testable usually does improve its design, but there are also pitfalls galore. For example, one way to make a class testable is to make class members public to have a way to set test doubles. Hmmm...not the best idea.
Unfortunately, I think a myth has been passed around that testable code is well-designed code. My guess is that it is an affirming the consequent fallacy. For many years now, a lot of authors and speakers have rightly pointed out that when code is well-designed, it is testable. In daily practice, I think developers swap these two around and begin thinking that if code is testable it must be well-designed. Beware of the fallacy!
5. Code coverage doesn't mean the tests are maintainable.
I don't know what anyone else has seen but I have watched a couple of pretty good codebases slowly lose code coverage due to unmaintainable tests. The sad truth about test code is that it often gets second-class citizen status, and programmers tend to be less disciplined about what goes on in there. Soon the tests are unreadable, flaky, too hard to extend, and worse. Without the discipline to bring them back to a good state, a death spiral is soon underway.
It's important to remember that tests can be helpful, but only as long as they ultimately contribute to the team's velocity. A test suite that sucks up the team's time and energy instead might as well just get marked "ignore." So watch out to maintain discipline even when code coverage is high, because the avalanche might be coming.
Testing is a lot of things, but easy isn't one of them. Go ahead and measure code coverage. Go ahead and shoot for a target coverage number. Go ahead, even put a gate on pull requests that requires code coverage to stay above or at your target. But watch out for the feeling of overconfidence that this might provide. Code coverage isn't everything, and a lot can still go wrong. Good discipline, and good practices like code reviews, are still more valuable than code coverage.