Discussion on: The broken promise of static typing

View post

Replies for: I find a troubling back-of-the-envelope correlation between language popularity/adoption level and number of bugs found. I think some of this can b...

Popularity may screw the data, but feel free to compare just the "experienced enthusiasts" languages: Scala/Haskell/F#/Clojure/Erlang/Go.

Martin Hoffmann • Jun 6 '17 • Edited

This is however a well known bias in data analysis. There might be a hidden phenomenon that explain most of the correlation.

In case of geography for instance, you must always be careful of not redrawing a simple population map. Because high number of occurence often happen in highly populated place.

In your case repository with the most contribution are logically thoses that contain also the more bug reports. So by calculating your bug density by dividing against the number of repo what you might actually mesure is the number of active contributors.

You could check this by testing if charting the number of contributors produce a similar graph to your indicator.

And to overcome this you might divide number of bug by the number of active contributors to each project. Also you might need to filter a specific timespan because older project expectedly have more reported bug. So only number of issue and number of unique contributors from last year shall be taken in account.

By implementing these changes you could have a more robust indicator using the same source.

Dan Lebrero • Jun 6 '17

Interesting points. I will see if I can get that data.

Thanks for the idea!

Kris Nuttycombe • Jun 6 '17

Another thing you might look at along these lines is to subtract bug reports submitted by contributors to the project, so as to try to distinguish (if imperfectly) between bugs discovered by users and bugs logged by those who are developing the project. For example, in a Haskell project it may be considered a bug if an invalid state is representable given the type signature even if that bug is never encountered as a runtime error, whereas in a Clojure project this isn't even a concept. However, this sort of "bug" is unlikely to be reported by someone who's simply a consumer of a library, so maybe excluding contributors (perhaps over some threshold?) can help to filter out issues that may not affect end-users.