Discussion on: One Web Architect's Core Beliefs

View post

Lovely to see you post here. As you know, it is inevitable that I shall disagree with one of your thoughts here. :-)


def multiply(a, b):
   return 4

def test_multiply():
  assert multiply(2, 2) == 4
  # 100% test coverage!

I'd love to have 100% automated check coverage, but I will never break my back trying to do it. It's lovely if your code falls into that form, but if it does, it probably means that you're just not doing the things that are hard to build automated checks around. That's great. Lucky you.

I think that Software Engineers, and programmers in general, are terrible about their approach to software quality. "100% test coverage" seems to be feeding that mentality rather than in opposition to it. Software Quality is about delivering software to the customer that does what they need, expect, and want. Automated checks form the middle third of that process at best - at worst, they're completely disconnected from it.

It assumes that the only valuable testing is automated checks. Manual testing - and more broadly, anything that doesn't run through in a couple of minutes within a development environment or CI pipeline - is discounted. That's a shame, because manual test scripts, automated large scale integration tests, and so on are all great too (and catch things unit tests can't), but are hard to extract the Holy Metric from.
It has a focus on metrics which is unhealthy - because we fixate on the numbers, and not the result. There's a corresponding risk that if we focus too much on the metrics, we lose sight of the goal, and quality actually goes down.

I find automated checks written by the developer who also wrote the code fall regularly into two pitfalls:

They test the same assumptions that the developer had when writing the code.
They often don't check nearly as much as they cover, because given a metric, we get excited over the number, and not the outcome.
In particular, Library and Backend developers have a tendency to write APIs in order to make them easy to check, rather than easy to use without error.

In the past couple of weeks at work, I've found two cases where the code wasn't actually checked in any useful way.

In one case, the developer had incorporated the same error into both their test and the code it was testing. The code in question had 100% test coverage, and therefore passed CI, but would never have worked in reality. Luckily, it would have been caught when it stopped the Dev environment working. The fixed code (and test) had a new, more subtle, bug in that might have work in Dev, Stage, and Prod. Or might have failed in one of them. We switched it out for a library.

In the second case, the developer had a check for a particular problem - and since I know you'll know this one, I can tell you it was that an XMPP client library was choking on milliseconds in timestamps in MAM, and the test ensured they weren't present. Only it turned out the test was wrong - it tested that it was a valid ISO 8601 timestamp, but passed whether or not it had milliseconds. It didn't, as it happens, but the fact the test was wrong went entirely unnoticed.

That latter bug was mine, and the reason why the code worked anyway was because, of course, I'd eyeballed the result and then tested it - manually - against the app. Manual testing is, and always will be, the gold standard of "Does it actually work?" - it's just lengthy and painstaking to do.

Ultimately, bugs creep into code at a reasonably constant rate by volume of code. If you have more test code, the statistics say that you'll have more bugs - we hope that the increased introspection of the overall code will mean fewer bugs, not more, but the bugs are just as capable of falling in the test code as in the code under test, after all. And who tests the automated check code? And how? (And, moreover, what coverage does it have?)

So am I about to argue that we shouldn't have automated checks at all? Far from it, of course. I expect automated checks to cover about 70%-90% of the codebase. Higher if it's possible, but I find that above 90% we're reaching into diminishing returns, but YMMV, and if you can comfortably get it higher, do.

As an aside, my security labelling library is around 93%, with every line that's not covered examined (they're usually overly-defensive conditionals), and a set of test vectors that's had at least some independent verification. (And, more fun, the test suite is data driven so you can provide your own test vectors). I'm totally happy with that. I could spent days getting the last 7%, but ... why?

Let's face it, the important metric isn't coverage, but checking, and we don't have convenient tooling for that. We just use coverage as a proxy.

So more important to me than coverage is two things:

Any bug that gets past the development stage should be - if at all possible - baked into an automated check.
A good automated check suite is, first and foremost, a developer's aid. It should be a really convenient way of getting a first pass over your code, a quick way of reproducing a bug, and of driving a deep part of your code through a debugging session easily.

These two rules will naturally drive coverage up. But that will happen without arbitrary rules. They'll also save you loads of time.

That all said, some software really can be checked exhaustively. Model verification and formal proofs are great if you can.

But when you can't, I truly believe that instead of chasing that last 7% of coverage, you'd be more efficient working through a manual test script.

Nathan Fritz • Feb 3 '21 • Edited

We've disagreed on this before. And to be fair, I'm not afraid to put coverage exemptions around conditionals that are a little over-defensive and configuration-specific lines. That said, I agree that many types of projects don't need 100% test coverage. Data-centered open source libraries and API services are most of what I write, and in those cases, I find it advantageous to have 100% coverage, at least on paper.

My games usually have little to no tests, as the desired behavior is an emergent property of several systems working together, and so it makes more sense to manually test. Even in the web world, most front-end UI rendering logic isn't worth testing, nor is it worth re-testing your APIs from a set of client tests.

In any case, with APIs and open source libraries, I find bugs by working toward 100% coverage, and I'm more confident to do refactoring later. It's also good marketing to legitimately have that 100% coverage badge on an open source library.

We can continue to disagree on this, but I think we understand each other's views pretty well. Hopefully, someday, we get to work together and butt heads on issues like this for code that we share.

Dave Cridland • Feb 3 '21

I wonder, though, whether 100% coverage - and coverage in general - is a measure of how easily the software could be checked automatically, rather than how well it is. I think the thing that winds me up so much about coverage as a metric is that distinction seems to be missed.