How do you measure and discuss the less measurable things about testing code?

twitter logo github logo ・1 min read

Test coverage metrics have a hard time dealing with the infinite complexity of edge cases in code. So even if we have 100% coverage, it's still really easy to see how we don't and can't test every possible path. So how do you think about these circumstances, how does your team talk about these issues, and is there anything you measure in addition to test coverage that helps account for edge paths/cases?

And if anyone has any thoughts about how this conversation is different with strongly typed languages vs weak, I'd love to hear that described a bit.

twitter logo DISCUSS (19)
markdown guide
 

I'm a huge fan of full CI/CD, so my key success metrics are deployment frequency and bad deployment rate (tracking bugs making it to production). I think these are better metrics to track than code coverage, because they get to the heart of what testing is--risk management.

That said, I don't ignore coverage. But I consider it a highly imprecise metric. 80% vs 90% doesn't tell me much. 80% vs 10% means something.

 

I like this thought. Would these ideas apply much to a small team that doesn't really have enough overall activity to develop "rates" yet? Would love to take a small step in this direction for the future.

 

I apply this with 2-pizza teams so it doesn't have to be big teams. But if you're talking about 2-3 people, I prefer agile's "people over process" and just poll the group asking how they'd rate quality, what's working, where are the gaps, etc

I am with James.

I just published Error budget: Google's solution for innovating at a sustainable pace that explains how do you know if you are going too fast.

A couple of tools that you may also want to take into consideration:

 

I think a better approach is to look at feature coverage, how many of the features are actually tested - which provides better customer facing metrics than code coverage

 

Would you "track" feature coverage in any way, or is it more an approach for talking about testing with the team in general?

 

This is an approach for talking about testing with the team, but one which relates more to today's framework heavy fast user feedback cycle development approach.

While this is a different metric to track, and I have no idea how to do it I am actively looking for an opportunity to do it - with a mature team that has worked before.

Yeah, I like the idea too. It might also have a nice byproduct of forcing you to name features and assign value to them. Testing in general forces you to think through the value and arrangement of code, but features kind of get lost and messy, and we might support features not delivering value, but we don't really draw the line.

And features could also be proxied by user stories - which have an element of value, and acceptance criteria as part of their documentation.

This can also be tied with metrics on how features are used to get a better insight

 

I know answering a question with a question is bad form but I've found the world of testing has a lot of different meanings for the same word. So the first step I take is to level set to make sure we're all on the same page.

What do you mean by "test coverage"? Do you mean unit testing or automated functional checks or "manual" testing? If manual testing do you mean using step by step test cases or exploratory testing based on user stories?

What does "Test Coverage" include? What actionable feedback does it give as a metric? Are you measuring test coverage just because you always have or does it have actual business value?

The testing world is shifting and evolving along with the development world. What was once standard practice is found to be no longer valuable when you take a critical look at it. Test coverage is one of those metrics that seem useful but is usually a waste of time.

(edit - word clean up)

 

Testing is hard because there are not many books or articles that show you how to properly test things. Especially for non-trivial stuff like concurrency or validation or not producing memory leaks or not having side effects.

 

While there aren't as many books on testing as there is on development there is quite a lot of information out there. There is value in developers learning the core of what testing is as a skill. There's more to doing it well than most people think.

 

I mainly measure test coverage in terms of use-case coverage. Are we covering the primary app features, and their combination with other primary features.

For algorithmic modules I try to cover logical edge-cases but rely heavily on code review to ensure correctness.

I place no value on actual code coverage. It's a pointless number to chase and in no way relates to the use-case coverage. In terms of use-case coverage there is no actual upper limit to coverage, unless you have a trivial app. Orthogonal features have an infinite number of ways to be combined. It's just foolish to think test cases can somehow cover all of them.

I wait for complex edge cases to arrive and be reported by users before worrying about them. It's a game of priority. I write good code and hope it's generally correct. Realisitic priorities don't allow me to ensure it works 100%.

 

This is difficult to answer without knowing what kind of problems you're trying to avoid or how the code (product?) is being modelled.

Can you shed any light on that?

 

Yes I think part of the problem is that one has to get down to specifics in order to address the topics the OP wants to know about. Unit testing in Java is very different from unit testing browser-side JavaScript to pick an extreme example.

 

Test coverage is a good metric if you don't have any other metrics for code quality. As other metrics are developed, I try to let code coverage slip to the background.

Of those "other metrics" I primarily rely on static analysis feedback either from CI or from a service like Bugsnag. Static analysis outs conditions like unreachable code, missing-or-extra-parameters — unit tests won't help with finding issues like that.

 

"Test Coverage" actually has several meanings and they are quite different en.wikipedia.org/wiki/Code_coverage.

My IDE reports lines covered, which is a near meaningless measure. I can write a "test" that invokes a large amount of code and not assert anything and my coverage will be extensive (because so many lines of code were executed).

The folks working in safety-critical circles have tools that test for the different kinds of coverage mentioned in the wikipedia article. I'm in web dev so I've never taken things to that level. But if your project needs that kind of rigor, the tools do exist.

Steve McConnell has some nice stuff about testing in Code Complete (both 1 and 2) that shows you how to write tests that systematically exercise more paths through your code. It's probably overkill in many circumstances but it shows how a function with '100% coverage' can be far from 100% tested.

Lots of edge cases don't actually matter or aren't reachable from within your program because of some validation or restriction coming from above the code in question.

For example, I can imagine writing a function that does not behave correctly if it is passed a null. But when you look at the actual code, the place that calls it makes it impossible to call that function with null.

So in theory, you could argue that you have a defect. And you could probably write a unit test that shows the defect. But in practice, it doesn't matter and you're wasting valuable time working on this valid, yet irrelevant defect.

You could argue that you don't know how that function will be called (or how the calling code could be modified) in the future. And that's true but of all the priorities calling for my attention, is this the best place to invest my time? Every team has to decide for themselves what their priorities are for their project.

How does my team handle these issues?

  • we focus on writing simple, readable code, unit testing (work in progress), and performing rigorous code reviews
  • we track defects caught in production instead of coverage because we think it has a higher signal to noise ratio
 

Touching this topic, I recently read an interesting article how trying to achieve 100% coverage can actually become counter-productive: labs.ig.com/code-coverage-100-perc...

 

If I can clarify anything about the question, let me know. I mostly just want to discuss code coverage a bit. Feel free to go on a tangent.

Classic DEV Post from Dec 31 '18

Who's looking for open source contributors? (Dec 31st edition)

Find something to work on or promote your project here. Please shamelessly pro...

Ben Halpern profile image
A Canadian software developer who thinks he’s funny.