Compilers and Type Checking: Good or Bad?

Nested Software on October 24, 2018

This is a question I've thought about a bit over the years. I know it's one of those things that pushes people's buttons, so I hope here we can h... [Read Full]
markdown guide
 

Compile time checks are worth their weight in gold. They straight away allow you to not worry about many classes of bugs like passing wrong datatype to your functions. Languages like Haskell allow you to be more expressive with your types without having to clutter your code with statements like A a = new A();. Even Java has become less verbose with Java 9+ allowing for local type inference, so statements like the former become var a = new A();

More importantly, stronger type systems allow you to prove mathematical properties about your code. Propositions as types is a good read about this approach.

Libraries like Quickcheck allow you to do property based testing, greatly enhancing test coverage. We should still write unit tests as a sanity check.

If you are a Python user, Python 3 has optional type annotations which can be checked by tools like mypy and do not affect the run-time of code. There is also Hypothesis which is based on Quickcheck and allows you to do property based testing.

That said, static types and compile time checking are not a magic bullet for eliminating all bugs. A sufficiently motivated programmer can write bad code in any language :P.

Static typed languages involve a learning curve where you have to wrestle with the compiler/type checker to get your code working. The main takeaway for me after working with static and dynamically typed languages is that the type system is not a thing to work around but a great tool that allows for writing robust code.

 

Static analysis tools have their place of course, but I am uncertain of their broader applicability. Up until now I think they have been more of a niche technology. As an example, a friend of mine developed an ultra low latency logging module for a high frequency trading system. They told me that they did not believe it was practical to use existing methods of proof to formally analyze this system for correctness with respect to critical sections. To do so would have required simplifications to be made that would have reduced the performance. They did think about the logic rigorously and ran millions of tests with randomly-generated data to look for synchronization problems, but there was not a formal proof.

On the other hand, often the only measure of correctness is that the application implements the business logic correctly. Since this is an arbitrary set of rules, there is no way to formally prove such logic to be correct - it becomes a case of infinite regress.

Types can be helpful for static analysis, but I don't think that they address these kinds of fundamental issues.

 

I agree with you on some points i.e as of today, it is difficult to use more formal methods in developing programs. Many formal proof systems have a steep learning curve and require academic backgrounds to even understand the documentation. Also, many formal systems themselves are implemented using your programming language / libraries du jour which may have bugs in their own implementations.

An issue with static analysis of high-performance systems is that the code you write and the final assembly code can be very different especially with C/C++ and the incredible optimizing abilities of modern-day compilers. AFAIK, gcc/llvm both take advantage of undefined behavior in C standard to perform many optimizations.

Static types are not the only kind of formal systems though. Amazon used TLA+ to discover some very weird edge cases in S3. As systems become more complex in terms of code size and deployment size, formal methods offer some abstractions to reason about them better.

I prefer statically typed languages for bigger projects and use languages like Python for smaller tasks. I have seen some horrible code-bases in Python which made gratuitous use of eval and monkey-patching instead of spending some time to create a better data structure.

I am learning some Haskell these days. So we'll see if my opinion on static types becomes more radical :P

I'm in the process of learning some Haskell too! :D

I hadn't heard of TLA before - thank you for that. It sounds a bit like a souped up version of property-based testing. The difference is, I guess, that property-based testing relies on roughly sampling the space of possible inputs at runtime whereas model checking would cover them completely. I get the impression that one would still have to manually specify the number of steps for the state machine to cover though, and of course one would also have to be confident that the invariants as specified match the real world.

 

I started off my professional career in dynamic languages. Nowadays I tend to prefer static types. I agree that there is a lot of noise and overhead in most places where static types are used. I found a good balance with typed functional programming, utilizing concise type definitions and inference (ML-family languages tend to have these). When used with pure functions, this kind of typing model feels like you have refactoring super powers. Yes, the compiler complains and shows squiggles on the screen. But they are more like a to-do list of things to fix, which is pretty helpful. Even still, rustling types can be mildly annoying at times, but I believe the payoff is worth it.

I think too, preference/personality comes into play. Some people might like to experiment first, and dynamic languages offer more flexibility there. But for me, I work through the ideas first in my head or on paper. Then often write the types and try to make bad states impossible before writing functions for those types. Just different strategies to the same goal.

 

I think too, preference/personality comes into play.

I'm guessing that the application domain also comes into play. Maybe the more "algorithmic" a given problem is, the more useful these constructs become.

 

I mostly deal with business software. It has rules, but I would not characterize it as algorithmic. The primary benefit I see in how I do things is the refactor-ability of the code. This leads to easier maintenance for the life of the product, which leads to being able to keeping quality high and add high impact features at lower risk. The only domain I would be hesitant to use typed FP idiomatically would be one where low level performance is paramount. But I would almost certainly use types even still (and more procedural code), to control the memory structure and usage very carefully.

When I program in python, I just do searches across files to make changes and then run automated tests to confirm that the refactoring hasn't broken anything. I've used products like jetbrains before, and the IDE support for various things can be a nice touch for sure. I don't know that it really makes a massive difference to me though. I feel like maybe one kind of gets used to whatever one is doing.

Looking at signatures of things in languages like Java, C#, Typescript, can be unpleasant, especially when there are all kinds of expressions related to generics with nested angle brackets and so on. That's my biggest pet peeve with explicit typing. A secondary issue for me is that without duck typing, sometimes it can be a pain to add functionality because one has to adjust the typing accordingly. I do rather like duck typing where things just work if the call can be made. That said, type inference can in some cases make things look like duck typing - for every call with a given specific type, a concrete implementation would be compiled into the executable.

My only experience doing FP "all the time" so far has been with Haskell. I do like a lot of things in Haskell, but I don't know what it would be like to build real software in it. I've only written toy programs to get a basic sense of how to do things. I'm not sure if the type declarations also would tend to look as messy as they do in the languages that use the angle bracket notation for generics. As you mentioned, performance may or may not be a problem. A lot of business software involves doing data transformations on fairly complicated structures. I wonder how well a language like Haskell could handle that sort of thing.

I don't use a lot of IDE tooling as far as refactoring code or inserting snippets. As I do refactoring (as I'm typing in F#), the compiler tells me other things I have broken. It's kindof a todo list of places to touch. But I do the fixes myself to ensure it makes sense. If it gets to be too much pain to do this, then rather than reaching for better tooling, I take it as a signal that my design should be re-examined or organized better.

I agree about standard typed OO languages. There ends up being a lot of effort spent on just the type system... perhaps more than is saved when doing proper SOLID principles. So many interfaces and classes. Note that type annotations are optional in Typescript... it's basically just JS with optional extras.

Haskell is an interesting one among the ML family. I would consider Haskell "extremely typed" when used in a way that is idiomatic for it. I tend to gravitate toward more moderate languages like F#, OCaml, or Elm. My specific experiences are with F# and Elm. The typing overhead seems heavier in Elm, mainly because of the MVU pattern it uses. But Elm taught me a very valuable lesson about the benefits of pure functions. Over time, we have literally gutted our Elm code base, removing hundreds of lines of code overall, and drastically reducing the possibility of state-based bugs. Since all Elm code is pure (no side effects), refactoring has little risk... because once it compiles, it won't crash. (Refactors still take effort though.) I confess that we don't even write tests for our Elm UI -- it just hasn't seemed necessary. We do sometimes discover bugs, but so far blocking bugs are found, fixed, tested, and deployed within an hour. We have had regressions before, but usually it is because we didn't "make bad states impossible"... taking some extra time to adjust the types can literally prevent the need for that test. F# is an impure language, so you have to practice some discipline to keep functions side-effect-free. But it carries the same benefits when you do. Pure functions are especially nice to use in domain logic... makes testing really easy too. We extensively test domain logic, verifying both success and failure scenarios behave as expected.

For perf, I was thinking more of code which is close to the metal, like device drivers or file systems. Perf for business purposes is great in any of them, and they have knobs you can turn for specific cases. For example, if you look at this challenge, I wrote entirely procedural F#. And the author's Haskell version used some impure libraries to get about the same perf as mine (maybe better if both were run on same hardware). Both of these performed better than a naive C implementation.

 

I do development in both statically and dynamically typed languages; speaking personally, I really value the added documentation that types provide beyond a certain program size.

Static types also can enhance a language's tooling; look at Hoogle for Haskell, Pursuit for PureScript, or Fancy Search for Elm, for example. I can ask these type-directed search engines "hey, is there a function that takes a String and returns an Int", and Hoogle will tell me length satisfies those requirements. Idris' type-directed search goes a step further - I can ask it "hey, is there a function that takes a vector of length N and a vect of length M and returns a vector of length N+M", and it'll tell me that ++ is how you append two vectors. It's a great tool for exploring a new language! That being said, there are similar things in some dynamic languages - Smalltalk's method finder comes to mind - but they seem more common in statically typed languages, especially descendents of ML.

It seems that one can tend to spend more time fighting with the borrow checker than actually solving real problems.

I definitely had this reaction when I started playing with Rust last year! But as a sometimes C programmer, I began to realize that the problems the borrow checker is pointing out - memory leaks and shared mutable references - are definitely real problems worth solving! I didn't see a lot of false positives from the borrow checker during my experiementation, but I know there are blind spots (eg. IIRC it's impossible to write a double linked list in safe Rust).

On a side note, if anyone from dynamic land is looking to dip their toe into static waters, especially if you know JavaScript, I highly recommend checking out Elm. Its compiler is very friendly and helps you along, and I feel like the speed of development compares to that of JS, even though it's statically typed.

 

Thank you - I do intend to try elm at some point, though I haven't yet. I've heard very good things about its error reporting.

 

As a Java fan myself, your arguments sound so strange and foreign! It's like you're looking at everything I love about compilers and type safety in a mirror 😉

That being said, Kotlin actually does allow you to simply turn off type checking when compiling to Javascript, which might ease you toward the dark side

 

It seems that in Python, type annotations may be partial; checked or not; annotating the source or kept in a separate file.

So long as it doesn't restrict the dynamism, I approve.

 

Yes, that's definitely interesting. I feel like if we're using type annotations in python, we really should be checking them with something like mypy. The worst thing, kind of the worst of both worlds, would be to have type annotations that aren't actually in sync with what the code is doing!

code of conduct - report abuse