Gabi

Posted on Aug 15, 2023

Driving Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism in Rust

#rust #haskell #polymorphism

The main goal of this article is making some comments about mb64 implementation of
the Complete and Easy.. paper, but implementing it in pure
rust code with some optimizations, like de bruijin levels and indexes!

Credits

mb64 for this implementation of "Complete and Easy" paper
Sofia for her own implementation of this paper too

Mentions

Practical type inference for arbitrary-rank types

Problems with the paper

The paper uses an ordered context, which is pretty slow to deal with in real world scenarios, and we should get rid of it.

This presupposes that we should use a list of tuples, which is pretty slow to deal with, and we should use a
different data structure, like a hashmap, which is pretty fast to deal with, and we can use the im_rc crate for

Nominal typing, nominal typing makes harder to unify forall expression, and this tutorial presents an idea implementing debruijin indexes and levels

Required Dependencies

The main dependencies are going to be the crates:

thiserror for full-featured error handling in rust
im_rc for immutable data structures with rc
ariadne for presenting language's errors in the stderr or stdout
fxhash for a fast hash implementation

Cool readings

I have made a tutorial about Hindley Milner without any extension, if you don't know anything about type systems, I
highly recommend you to read the following:

My tutorial on Hindley Milner which implements Algorithm W and some another useful things for PL theory in Kotlin.
This response on PL theory stackexchange which explains important stuff type system's notation

Bidirectional Type Systems

Bidirectional type checking is a quite popular technique for implementing type systems with complex features, like
Higher-Rank Polymorphism itself, among other features, like Dependent Type Systems. It's constituted of synthesizing a
type and, checking it against another for comparing two types.

For example, the following expression: 10 : Int, the number 10 synthesizes the type Int, and the type
annotation checks the synthesized type Int against Int specified, and if the wrong type was specified,
like 10 : String, obviously, 10 can't have type String, so the compiler will try to check 10 against String,
and fail at this.

Hindley-Milner

This type checker extends Hindley-Milner implementation, I suggest you read
the Wikipedia's page about hindley milner
implementation.

Our implementation relies on the base of Algorithm J, which is mutable and pretty fast, but it's not pure.

The Hindley-Milner rules we are going to use are the "generalization" and the "unification", to first, create types
without any type annotation, and second and respectively to check if two types are equal, and if they are not, we should
unify them.

Higher-Rank Polymorphism

Higher-Rank Polymorphism is a feature that allows us to write polymorphic functions as arguments, like the following:

let id : ∀ 'a. 'a -> 'a = ... in
let f : (∀ 'a. 'a -> 'a) -> Int = ... in
f id

Pseudo language example which we can pass a polymorphic function as argument

This type of polymorphism is called "Rank-2 Polymorphism", because we are passing in a polymorphic function as argument,
it's different from a naive hindley milner implementation, because we can't pass polymorphic functions as arguments in
hindley milner, because it's not a higher-rank polymorphic type system:

let id : ∀ 'a. 'a -> 'a = ... in
let f : (∀ 'a. 'a -> 'a) -> Int = ... in
f id

The let f part is not valid in a naive hindley milner implementation, because we can't pass polymorphic functions as
arguments

Unification and Subsumption

Unification is the process of checking if two types are equal, and if they are not, we should unify them, for example,
the following expression: 10 : Int, the number 10 synthesizes the type Int, and the type annotation checks the
synthesized type Int against Int specified, and if the wrong type was specified, like 10 : String, obviously, 10
can't have type String, so the compiler will try to check 10 against String, and fail at this.

But "Subsumption" is another term we are going to use, it's the process of checking if a type is a subtype of another,
and the process is called "polymorphic subtyping", for example, the following expressions:

let k : ∀ 'a. ∀ 'b. a -> b -> b = ... in
let f : (Int -> Int -> Int) -> Int = ... in
let g : (∀ 'a. 'a -> 'a) -> Int = ... in
(f k, g k)

Pseudo language example

So, the application (f k) is valid in any ML language, or haskell, etc... Because they are the same type when unified,
but when trying to apply (g k), the compiler will try to unify ∀ 'a. 'a -> 'a with Int -> Int -> Int, and fail at
this, so we need to implement the so-called "subsumption".

The type of ∀ 'a. 'a -> 'a is more "polymorphic" than ∀ 'a. ∀ 'b. a -> b -> b, so the expressions f k and g k
should be valid.

De Bruijin Indexes and Levels

De Bruijin indexes and levels are a way to represent type variables in a type system, for example, the following
expression:

let id : ∀ 'a. a -> a = ... in
id 10

The variable 'a should be replaced by the de bruijin index 0, which is the first variable in the scope

This technique is used to create rigid type variables in some scopes, like if we have a "hole" of a type, and we
increase the de bruijin level of the scope, we won't be able to substitute the variable with another value in our
internal code, for example, the following expression.

The algorithm

The algorithm is based on the paper, but with some modifications, like the de bruijin indexes and levels, and some
optimizations, like the im_rc crate, which is a immutable data structure with rc pointers, and the fxhash crate,
which is a fast hash implementation.

The full code implementation is here.

DEV Community