re: Compilers and Type Checking: Good or Bad? VIEW POST

VIEW FULL DISCUSSION
 

Compile time checks are worth their weight in gold. They straight away allow you to not worry about many classes of bugs like passing wrong datatype to your functions. Languages like Haskell allow you to be more expressive with your types without having to clutter your code with statements like A a = new A();. Even Java has become less verbose with Java 9+ allowing for local type inference, so statements like the former become var a = new A();

More importantly, stronger type systems allow you to prove mathematical properties about your code. Propositions as types is a good read about this approach.

Libraries like Quickcheck allow you to do property based testing, greatly enhancing test coverage. We should still write unit tests as a sanity check.

If you are a Python user, Python 3 has optional type annotations which can be checked by tools like mypy and do not affect the run-time of code. There is also Hypothesis which is based on Quickcheck and allows you to do property based testing.

That said, static types and compile time checking are not a magic bullet for eliminating all bugs. A sufficiently motivated programmer can write bad code in any language :P.

Static typed languages involve a learning curve where you have to wrestle with the compiler/type checker to get your code working. The main takeaway for me after working with static and dynamically typed languages is that the type system is not a thing to work around but a great tool that allows for writing robust code.

 

Static analysis tools have their place of course, but I am uncertain of their broader applicability. Up until now I think they have been more of a niche technology. As an example, a friend of mine developed an ultra low latency logging module for a high frequency trading system. They told me that they did not believe it was practical to use existing methods of proof to formally analyze this system for correctness with respect to critical sections. To do so would have required simplifications to be made that would have reduced the performance. They did think about the logic rigorously and ran millions of tests with randomly-generated data to look for synchronization problems, but there was not a formal proof.

On the other hand, often the only measure of correctness is that the application implements the business logic correctly. Since this is an arbitrary set of rules, there is no way to formally prove such logic to be correct - it becomes a case of infinite regress.

Types can be helpful for static analysis, but I don't think that they address these kinds of fundamental issues.

 

I agree with you on some points i.e as of today, it is difficult to use more formal methods in developing programs. Many formal proof systems have a steep learning curve and require academic backgrounds to even understand the documentation. Also, many formal systems themselves are implemented using your programming language / libraries du jour which may have bugs in their own implementations.

An issue with static analysis of high-performance systems is that the code you write and the final assembly code can be very different especially with C/C++ and the incredible optimizing abilities of modern-day compilers. AFAIK, gcc/llvm both take advantage of undefined behavior in C standard to perform many optimizations.

Static types are not the only kind of formal systems though. Amazon used TLA+ to discover some very weird edge cases in S3. As systems become more complex in terms of code size and deployment size, formal methods offer some abstractions to reason about them better.

I prefer statically typed languages for bigger projects and use languages like Python for smaller tasks. I have seen some horrible code-bases in Python which made gratuitous use of eval and monkey-patching instead of spending some time to create a better data structure.

I am learning some Haskell these days. So we'll see if my opinion on static types becomes more radical :P

I'm in the process of learning some Haskell too! :D

I hadn't heard of TLA before - thank you for that. It sounds a bit like a souped up version of property-based testing. The difference is, I guess, that property-based testing relies on roughly sampling the space of possible inputs at runtime whereas model checking would cover them completely. I get the impression that one would still have to manually specify the number of steps for the state machine to cover though, and of course one would also have to be confident that the invariants as specified match the real world.

code of conduct - report abuse