Max Heiber

Posted on Jan 23, 2021

Function Intersection Types Considered Surprising

#typescript #erlang #programming #statictypes

Function intersection types sound simple, but get less intuitive the deeper one looks.

Each of the following surprised me, and has its own section in this post:

When "or" means "intersection"
Applying an Intersection
What Erlang Does
What TypeScript Does

If you're short on time, skip to the last section, since what TypeScript does is most interesting.

Key:

& is intersection
| is union
x: T is a variable x of type T
'a1' in a type means a singleton type inhabited by only the value 'a1'.

When "or" means "intersection"

Here's an Erlang function that:

given an 'a1' returns 'r1'
given an 'a2' returns 'r2'

Mnemonic: "a" for "argument", "r" for "return"

func('a1') -> 'r1';
func('a2') -> 'r2'.

We can give it a type specification like this:

-spec func('a1') -> 'r1'; ('a2') -> 'r2'.
func('a1') -> 'r1';
func('a2') -> 'r2'.

If you ask someone whether ; means "union" or "intersection" here, they are very likely to say "union." I made this mistake, too. There seems to be "or" reasoning going on: either the first function cluase is used, or the second is used.

But here are some reasons for thinking the function is not a union: in general, one cannot use a union of T and U in a way that takes advantage of its T-ness without first proving that it is a T. Here's a TypeScript example, if I have a variable x of type number | string[], I can't do much with it until I figure out what it is. I have to do something like if (Array.isArray(x)) { /* now I can treat x as an array }. If I have a y of type number[] | string[] I can safely use y.concat because that doesn't take advantage of y's number[]-ness or string[]-ness, it uses a method that's defined for both branches of the union.

So func is an intersection, not a union, otherwise we wouldn't be able to call it on a1 without first proving that the function isn't defined on a2.

See how intuitive this is? It gets better!

Applying an Intersection

Ideally, a type checker will be able to figure out the type of an expression like f(x) based on the type of f and the type of x.

The typical rule for this sort of thing is:

x: T  and f: T -> U
-------
f(x): U

That is, "The type of f(x) is the return type of f, assuming x has the same type as f's parameter."

But the type of func above is the intersection of 'a1' -> 'r1' and 'a2' -> 'r2', which doesn't clearly tell us what to do when applying the function: it doesn't have the same shape as T -> U in our function application rule.

One can make an argument for massaging the type of func into the slightly surprising:

('a1' | 'b1') -> ('r1' & 'r2')

Or, more generally:

F1: A1 -> R1
F2: A2 -> R2
----------
(F1 & F2): (A1 | A2) -> (R1 & R2)

That is: "The type of a function intersection is a function type from the union of the parameter types to the intersection of the return types"

I'll show how the function intersection rule above is sensible, then show how it's not.

Some theoretical reasons for liking this rule are:

For any types T and U, both T and U should be subtypes of T & U. Then rule above is the only one that fits. This explanation is all you need if variance is super-intuitive to you and you remember that functions are contravariant in their inputs and covariant in their outputs.
Think of a function as like a burrito maker. Suppose my birthday is coming up and I ask you for a machine that accepts flour and turns it into burritos. And you give me a machine that can turn either flour or corn into things that are both burritos and delicious. Then I get what I wanted for my birthday, a strained metaphor.

To make things more concrete, the signature does the right thing for examples like the following. Given a spec:

-spec f('a') -> 1 | 2 | 3
      ;('b') -> 2 | 3.

Then the rule allows the following example implementation of function f:

f(a) -> 2;
f(b) -> 3.

At this point, I hope something still seems fishy.

Intersecting the return types is weird!! What was the point of having the left side of the intersection in the spec say the function could return 1 if it can't actually return 1?

Put differently, the rule for how to intersect function types makes one wonder why one would bother writing specs with functions intersections at all. By the rule above, the spec for f has the same meaning as this much simpler spec:

-spec f('a' | 'b') -> 2 | 3.

The rule for function intersections also seems to say the wrong thing about cases like our original example:

-spec func('a1') -> 'r1'; ('a2') -> 'r2'.
func('a1') -> 'r1';
func('a2') -> 'r2'.

The function intersection rule says that the function above is ('a1' | 'a2') -> ('r1' & 'r2'). But what on Earth is an 'r1' & 'r2'? Neither Erlang nor TypeScript has any values of type 'r1' & 'r2'.

Perhaps the spec for func doesn't need massaging into T -> U form. Instead, we can tweak the function application rule, splitting it into two:

Rule 1

f: (A1 -> R1) & (A2 -> R2)
x: A1
-------------
f(x): R1

Rule 2

f: (A1 -> R1) & (A2 -> R2)
x: A2
-------------
f(x): R2

That is, in order to calculate the return type of a function F applied to argument x, use the branch of the intersection with a parameter that matches the type of x. Put more verbosely:

if the type of x matches the parameter type of the left side of the function intersection, the return type of the function application is the return type of the left side of the function intersection
if the type of x matches the parameter type of the right side of the function intersection, the return type of the function application is the return type of the right side of the function intersection

This is a bit annoying in the implementation of the type checker, but a deeper problem is that we haven't yet said what to do when both of our new function application rules match. What happens when A1 and A2 overlap? I'll say more about this in the next section.

What Erlang Does

The de-facto type-checker-like tool for Erlang is Dialyzer. Erlang has very clear runtime semantics for functions with multiple heads like func and f above: go through them one by one, using the first function head that matches. But it turns out that Dialyzer doesn't care about the order of function types in an intersection type!

Nor does Dialyzer take the other option we discussed above of unioning the parameter types and intersecting the return types. Dialyzer does something closer to:

F1: A1 -> R1
F2: A2 -> R2
---------------
(F1 & F2): (A1 & A2) -> (R1 & R2)

You might wonder why Dialyzer takes the intersection of the parameter types rather than the union. There's a long answer I don't have space for here, but part of the story is that Dialyzer is more of a bug-finder (or discrepancy analyzer) than a type checker, so its rules are intentionally unsound.

This description is misleading, see Update 2.

What TypeScript Does

Here is our original example, translated to TypeScript:

declare const func:  ((arg: 'a1') => 'r1')
                   & ((arg: 'a2') => 'r2')

const x = func('a1') // type of `x` is 'r1'
const y = func('a1') // type of `y` is 'r2'

playground link

TypeScript function intersections are handled like in our Rule 1 and Rule 2: when applying a function intersection to x, pick the branch of the intersection with a parameter that matches the type of x. That's how TS is able to assign different types to x and y.

The interesting bit is that TypeScript designers had to make a choice about what happens when the parameter types of F1 and F2 overlap. Options include:

Try Rule 1 and Rule 2 in order, applying the first rule that matches
OR reject outright attempts to intersect functions with overlapping domains

The first option sounds OK at first, but it has the really surprising consequence that A & B is not equivalent to B & A. These are noncommutative intersections!

The second option sounds great, but would not be practical for TypeScript. That's because TypeScript can't generally tell, for a given x: T and y: U whether or not the x is also a U.

TypeScript makes a really interesting choice here. TS intersections both are and are not commutative! TypeScript plays fast and loose with equality, which is unsound (issue #42204), but seems to work OK in practice.

Top comments (1)

Max Heiber • Jan 24 '21 • Edited

Update 1:

Stavros Aronis kindly shared a link to their experimental work on better intersection types for Dialyzer. This version of Dialyzer caught 22 discrepancies in OTP, and got better behavior for behaviors overall.

After the front matter, the paper is in English

Update 2: My description of what Dialyzer does left out a lot. Here's a fuller picture of how Dialyzer currently handles functions with specs that contain intersections:

-spec baz(integer()) -> boolean(); (float()) -> float().
baz(1) -> true;
baz(1.0) -> 1.0.

Find the success typing. In this case it is number() -> 'true' | float(). Dialyzer doesn't currently infer intersection types for functions, as far as I can tell.
Collapse the intersections in the spec by taking the union of the domains and union of the ranges. Then we get number() -> (boolean() | float()). Dialyzer seems to always immediately collapse intersections in specs, so the only motivation for writing specs with intersections (for now) is for better documentation for humans.
The final type is the intersection of the specced type and the success typing. Which is done by intersecting the domains and ranges, respectively. So we end up with number() -> 'true' | float().

Update 3

Update 2 was wrong.

Dialyzer does remember and use the mapping of arg types to return types for function specs with intersections. The action happens in dialyzer_contracts:get_contract_return(C :: contract, ArgTypes). The collapsing into union types I described in the previous update also happens, as does the pairwise intersectioning of domain and and range I wrote about in the post. So there isn't one answer to "how does Dialyzer handle function intersections?"

running typer on a sample.erl with these contents:

-module(sample).
-export([main/1]).

main(_Args) ->
    bar(10) + bar(20).

-spec bar(10) -> 10; (20) -> 20.
bar(X) -> unknown_module:f().

generates:

-spec main(_) -> 30.
-spec bar(10) -> 10
    ; (20) -> 20.