Type systems: dynamic versus static, strong versus weak

#compiler #type #rust #clojure

Almost every practical programming language has a type system that specifies how to assign types to various constructs in the language and how constructs of those types interact with each other. Most programmers characterize type systems with two sets of properties. One has to do with when rules of the type system are enforced (aka type checking): dynamic or static; The other has to do with how much safety guarantee the type system provides: strong or week.

This is a confusing topic. I've run into articles on multiple otherwise reputable websites claiming static typing means variables need to be declared before use, and dynamic typing means otherwise. This is very misleading. Haskell is a statically-typed language, yet programmers don't need to declare the type of each name¹. Types are inferred from how the names are used. Its optional type annotation is mostly for the benefit of human readers rather than the compiler. I've also seen articles that equate dynamic with weak and static with strong. This is also wrong. A dynamically-typed language can be more strongly-typed than a statically-typed language.

To understand the distinction, we need to separate names and values. A name is the identifier you use in a program to refer to entities like objects and functions. Values are the entities themselves. A name can refer to different values at different times, and a value can be referred to by different names.

In statically-typed languages, names have types, and the type of a name typically cannot change within its scope. A name of a certain type can only refer to values of that type. In a statement like a = sum(b, c), a, b, c, and sum all have types. The compiler checks to see if the types of a and b match with the parameter types of sum and if the result type of sum matches the type of a. If not, it emits a type error and refuses to compile the program.

In dynamically-typed languages, names themselves do not have types, but the values they refer to do. In the last example a = sum(b, c), when the program runs, the language runtime looks at the values b and c and makes sure they are of the type to which sum can be applied to. For example, sum(b, c) might be implemented as b + c, the language runtime checks if b and c refer to the types for which the operator + is defined. If not, it throws an exception.

Let's turn to the strength of type safety by looking at some examples.

Both C and Rust are statically-typed, but they provide different levels of type safety. Consider the following C program:

#include <stdio.h>

int main() {
  int numbers[] = {0, 1, 2};
  printf("%d", numbers[6]);
  return 0;
}

It has an obvious problem, but can be compiled successfully. Depending on what compiler you use, you might see a warning, but it's just a heuristic added by the compiler to assist programmers. The C language standard allows the program to be compiled. When I run this program on my Mac, it prints 32766% which is just whatever gibberish happened to be at that memory location. If this were a complex program, it would likely be a frustrating bug.

The following Rust program attempts to do the same thing:

fn main() {
    let numbers = [0, 1, 2];
    println!("{}", numbers[5]);
}

But when it is compiled, the following error is emitted. It won't get a chance to run.

error: index out of bounds: the len is 3 but the index is 5
 --> array.rs:3:20
  |
3 |     println!("{}", numbers[5]);

This is because in Rust, the length is part of the type of an array literal, so [0, 1] and [0, 1, 2] are actually different types. In this case the compiler can detect illegal access to an array literal just by looking at the type. To verify this, add a line to the rust code:

fn main() {
    let numbers = [0, 1, 2];
    let foo: () = numbers;  // <- add
    println!("{}", numbers[5]);
}

You will see the following error from the compiler:

error[E0308]: mismatched types
 --> array.rs:3:19
  |
3 |     let foo: () = numbers;
  |                   ^^^^^^^ expected (), found array of 3 elements
  |
  = note: expected type `()`
             found type `[{integer}; 3]`

This is an intentional type mismatch error we add to show the type of numbers. The last line says it's [{integer}; 3], in which 3 is the length of the array. For variable-size vectors, Rust uses a Result type to force programmers to check for the possibility of out-of-bound errors.

The infamous type systems of Perl and PHP have tormented countless programming souls. The following is copied verbatim from the official PHP manual:

$foo = 1 + "10.5";                // $foo is float (11.5)
$foo = 1 + "-1.3e3";              // $foo is float (-1299)
$foo = 1 + "bob-1.3e3";           // $foo is integer (1)
$foo = 1 + "bob3";                // $foo is integer (1)
$foo = 1 + "10 Small Pigs";       // $foo is integer (11)
$foo = 4 + "10.2 Little Piggies"; // $foo is float (14.2)
$foo = "10.0 pigs " + 1;          // $foo is float (11)
$foo = "10.0 pigs " + 1.0;        // $foo is float (11)

PHP and Perl are extremely lenient to type mismatches and go to great length to massage arguments into whatever types required. For quick-and-dirty scripts, they may allow the programmer to get things done with as little code as possible, but for large projects they are good at burying bugs. Most other languages in wide use today require explicit conversion between unrelated types, whether they are dynamically typed or statically typed. For example, in Clojure, you'd need to call Integer/parseInt to parse a string to an integer.

Both static type checking and a strong type system help to uncover bugs as early as possible, usually at the expense of making programs more verbose, but they are different things.

Haskell does not have variables whose values "vary", so I avoid using the term here. ↩

Top comments (3)

Vince Ramces Oliveros • Apr 4 '19

Wasn't even a fan of dynamically and weak typed languages. I don't really want to be quirky with my code that will eventually become more spaghetti code.
Greay article. Hope you showed the graph of the learning curve and complexity of static and dynamic type languages.

Van Ly • Apr 4 '19

To be simple, statically type checking means ability of discovering type mismatch at compile-time, while dynamically type checking can only detect type mismatch at runtime. However, dynamic type checking can be enhanced by using a linting tool which constraints strict-typing.

Clojure is a compiled language. It’s dynamic due to ability of compiling on the fly. It’s strong or weak depends on how you use it.