Discussion on: The broken promise of static typing

View post

Great topic! I think your data analysis is fatally flawed, though. The only way to really figure this out is to give similarly experienced programmers a task and see how many errors solutions in the static typed languages contain vs the dynamically typed solutions. Unfortunately, I'm not aware any research on that exact topic.

I think Uncle Bob's oversimplified things. On the web, where speed is important and bugs usually aren't very costly, dynamic languages will win.

But in other domains where bugs can be expensive or cost lives (avionics, nuclear power plants, pace makers, etc), we might want to use languages and tools that help us ensure there are no bugs (or essentially no bugs). 100% test coverage of a dynamic language that Uncle Bob's talking about doesn't mean there are no bugs. And it certainly doesn't mean you've covered every execution path of the code.

If you look at the software written in Spark/Ada, you can see some really low defect rates. These defect rate are well below anything you could hope to achieve a dynamic language using TDD and there is data to back that up. But you end up trading speed for correctness.

Kirill Shestakov • Jun 6 '17

I like how you go on to say that the analysis is "fatally flawed", but don't explain how. If the author did some kind of data manipulation that favored one language (or paradigm) over another, that would be a sign of being "fatally flawed", but it seems like there's no other explanation for the observed data other than the author's conclusion. Obviously, it doesn't meet scientific standards for being conclusive, however, why should we prefer an opposite statement (that static types are more bug-free) by default over the author's statement (that more simplistic languages are more bug-free)? Clearly, the data is in favor of the latter statement.

"100% test coverage of a dynamic language that Uncle Bob's talking about doesn't mean there are no bugs. And it certainly doesn't mean you've covered every execution path of the code."
This seems to be very far from what the author was talking about. I think you might have misunderstood the article.

Remy Goldschmidt • Jun 6 '17

I'm not the GP, but I'm guessing that he was thinking of the fact that this could simply mean that more bugs are reported on F# projects, not that more bugs exist.

Josh Freckleton • Jun 7 '17

fatally flawed

Here are some alternate explanations of the data, all hypotheticals that should be considered before OPs conclusion is accepted as an accurate interpretation of his data:

Practitioners of different languages have different reporting habits: they call different things "bugs", they report with varying frequencies, they tend to not care about reporting bugs as much as building the next feature, etc.
Bugs are different sizes, so while haskell and python might both have "1 bug", the cost of that bug could vary wildly.
There's a ratio of "bugs per feature", so more productive languages show up as more buggy.
Bugs are labeled differently, IE perhaps haskell projects tend to have nice "bug" labels just because static typists are more OCD about it, where as a python project might have a million bugs, but no one labeled them as such. (related to my bullet #1)

I agree with others in the comments, in order to appropriately draw up causal relationships, one would need to construct an appropriate experiment. Double-blind-placebo-controlled-randomized might be a bit tough to construct, although the closer to that one could be, the better.

Perhaps one could construct a randomized crossover though, and that would finally lend some actionable insights into the problem?

Dan Lebrero • Jun 6 '17

Hey Blaine,

Thanks a lot for your comments.

I am not aware of any proper study either. In the article I link to the best source that I found, which contains a list of studies and a summary of each. Unfortunately nothing conclusive there either.

You are right about context being important. The same bug in two different context can have very different consequences.

I also agree that code coverage is no proof of 0 bugs, but static typing is neither.

I would love to see some proper studies to backup your statement about Spark/Ada ;)

I think you really nail it with "trading speed for correctness", a very important tradeoff.

Blaine Osepchuk • Jun 6 '17

Hey Dan,

The best source I have off the top of my head is a great talk by Martyn Thomas: [youtu.be/03mUs5NlT6U]

The whole talk is fascinating but I'll point you to the juicy bits in case you're in a hurry.

2 minutes: defect rate of 810 experienced software developers on > 8000 applications

19:22 minutes: defect rates of 5 projects that used 'correct by construction' software development techniques

32:33 minutes: productivity, cost, defects of the tokeneer project (zero critical failures found after extensive testing by the NSA).

33:51 minutes: The NSA gave interns with no experience with these techniques the job of adding features to the tokeneer project and they had amazing results (NSA conclusions at 36 minutes)

38:54 minutes: discussion of a few real-world safety-critical projects developed with these techniques (including defect rates which are fractions of the defect rates for typical projects)

I'm mostly a web guy but I'm really interested in this stuff. I've done a bunch of reading and I'm just beyond a "Hello World" example in Spark (the learning curve is pretty steep compared to picking up Java or something like that).

Anyway my inexperience with Spark/Ada prevents me from being able to tell how honest Thomas is about the benefits and drawbacks of this approach but I'm intrigued all the same.

Cheers.

Dan Lebrero • Jun 7 '17

Hi Blaine,

The talk was really fascinating. Thanks a lot for sharing.

Some thoughts after watching it:

The numbers are pretty impressive.
I don't trust anything that comes from the NSA, but I can trust the other examples ;)
I wouldn't either recommend an agile methodology for building aircraft controller software. I don't want to imagine what "iterating" would mean.
I loved the reasons why SPARK is not being adapted (min 48:30). We are so close minded!
Interesting that they removed features from Ada to make it simpler and verifiable. It somehow reinforces my belief that simpler is key to more reliable software.
I am really intrigued about "bounded resource (space and time) requirements."
Did you notice on the SHOLIS slide (min 42:20) the bullet point "Demonstrated low value of unit testing when formal methods used"? Interesting!

You just added another TODO in my very long list of things that I do not know about.

I think it is very laudable for you to be interested and to be learning something so unusual. I recently wrote about my experience with Clojure at my personal blog, maybe you can relate to it.

Thanks again,

Daniel

Blaine Osepchuk • Jun 7 '17 • Edited

Hey Dan,

I can relate to your Clojure experience. The Pragmatic Programmer (remember that book?) was right. Learning another language or paradigm effects how you program and how you think about solving problems.

Martyn Thomas has a whole series of lectures and they are all interesting. You might want to checkout:

How can Software be so Hard?: youtu.be/VfRVz1iqgKU
Safety-Critical Systems: youtu.be/E0igfLcilSk

Anyway, I really got interested in this stuff because I'm working in a code base that is full of bugs (who isn't, right?) and I just thought there has to be a better way to develop software so I started asking myself how 'they' make software for safety critical applications that doesn't break and isn't full of bugs.

The traditional advice is to turn up the compiler/interpreter warnings. Then you add static analysis. And now in PHP 7.1 you have optional strong typing so you convert your code base to run on PHP 7.1 and you do some of that. And you write unit tests. And once you're good at that you switch to TDD.

And all that stuff is good. It's really good in fact but it doesn't help you if you missed a requirement or a whole class of requirements. It also doesn't help if your requirements are ambiguous or contradictory.

So what we're trying to do is get really fast feedback. If we've got something wrong, we want to fix it as soon as possible because the longer that wrong thing is in your system the more it will cost to fix it. And the next step after everything I mentioned might be formal methods and mathematically verified software. I think of it as an uber-static analyzer in that it automatically verifies certain properties of your code (and annotations).

So you can spend your time writing tests and hoping you catch things or you can spend your time annotating your code in Spark/Ada and let the tools prove it works or you can ship buggy software, which in some cases is the right thing to do.

The real question for me is what if any of these tools and disciplines are appropriate for my role as a web developer?

Most projects spend more than 50% of their budgets testing and fixing defects. Could we spend a fraction of that money up front and do it right the first time by writing software with formal proofs? I don't know the answer yet but I'm working on it.

Cheers.

Dan Lebrero • Jun 8 '17

I will add those two videos to my list for my weekend.

Very interesting thoughts. Let us know how it goes!

Thanks,

Daniel

mjworsley • Jun 25 '17

So its been a long time since I last coded in SPARK (nearly 10 years now) but it's worth noting a few things:

1) The really low defect rates reported for systems coded in SPARK aren't simply due to the language features, but also down to the "Correctness by Construction" approach, which emphasises getting things right from the high-level requirements all the way through formal specs and into coding, information flow analysis and proofs -- the sooner you find and eliminate the bugs, the less costly removing them is. The language greatly aids this approach due to its static analysis capabilities, but you can improve the defect rates in any language by following a similar approach (not going to get them as low though)

2) By getting rid of the bugs early, you are minimising re-work and (importantly) re-verification when removing them at a later date, so the "speed for correctness" tradeoff isn't as large as you might otherwise expect. Certainly in the domains where you tend to find SPARK (or normal Ada) being used, the cost of testing required for a similar confidence level in other languages can exceed that of the V&V for SPARK.

3) A lot of the applications that demand really low defect rates are aerospace, defence, etc etc. You'll see more statically typed languages in this arena because of their amenability to verification, but you are unlikely to see these projects pop up on github. That's an understandable limitation of the approach in the original post.

3) There's some good info on this set of slides from Rod Chapman of Praxis about real world applications, including defects per KLOC: asq509.org/ht/a/GetDocumentAction/... (NB: Praxis developed SPARK from earlier work from University of Southampton, and is now part of Altran)

4) Even proof of partial correctness doesn't negate the need for testing. Proof of freedom from run-time exceptions (e.g. demonstrably no buffer overruns) is less time consuming, but of great value.

Finally, I believe that Tony Hoare's quote was also used in the preface of "High Integrity Software: The SPARK Approach to Safety and Security" which is pretty much the text for SPARK :-)

Dan Lebrero • Jun 25 '17

Thanks a lot for sharing your experience with SPARK!

I agree that bugs are one of the worst cases of wasted time, specially if by the time they are found, we have already context switch, which is usually the case, by many weeks in some cases.

Given that you are not using SPARK anymore, may I ask what has been your experience since then? Have you tried to convince your teams to use it?

Cheers,

Dan