O.F.K.

Posted on Sep 28

The function not dispatched

#programming #learning #computerscience

Reading the Julia language's documentation, it says that unlike many traditional languages, Julia support multiple dispatching of functions (from here on, MD), as opposed to traditional OOP languages' single dispatch, or even the more modern function overloading (onwards, FO.)

I know what single dispatch is, anyone who ever "dotted into" a method call in an OO language knows what single dispatch is.

I also know what function overloading is (thank you Elixir for that.)

But multiple dispatch?

So I did some digging and now would like to post about it too, adding my own two cents. 😉
I'll start with single dispatch, for completeness sake.

You had one task

As mentioned earlier, if you ever "dotted into" a method call in most dynamic OO languages - congratulations, you executed a single dispatch operation.

What single dispatch simply means is that when a function is called, the compiler knows exactly what implementation of the function to call, because there is only ever one such implementation!

Furthermore, even if there are several similarly named functions across the system, even if all those functions take the same number of arguments and the same types for those same arguments, its arguments' shape - the compiler still knows which unique function to call.

How? Due to the function's first argument - the receiving object, the object we "dot-into".

For example, in Ruby:

[1, 2, 3].reverse # => [3, 2, 1]
"Hi".reverse # => "iH"

It looks like we're calling the same method, reverse, with the same shape, i.e., no argument at all, on both a string and an array instances, yet Ruby's interpreter is able to discern which implementation to use for each.

This is because the receiver, a string in one case, and array in the other, dictates which implementation would be called!

More so, since Ruby allows "monkey patching", we could try to "get creative" and do:

class String
    def reverse
        ""
    end
end

One might now ask, which method is dispatched when "A string".reverse is called, but the answer is that the last implementation (in terms of code hierarchy, not chronological order) wins, so in this case, our patched version is the one that's called and would return the empty string.

There is only ever one possible method implementation for any given method name for a given type!

By "qualifying" the method with its receiver, the instance being invoked, we let the compiler know exclusively which implementation to dispatch.

Most of the popular dynamic OO languages support only single dispatch - JavaScript (TypeScript allows multiple dispatch, but since TS is eventually compiled down to JS, this is risky and error prone in subtle, hard to catch, ways), both Python and Ruby, as does Go.

On the other hand, language such as Java, and C#, for example, support a form of function dispatching where the compiler has to decide which implementation is the correct one - function overloading.

Two methods diverged in a code

Suppose I'm malcontent with the previous languages' inability to let me define just one function, but one that can take several different arguments' shapes, and return different outputs according to the arguments provided.

Put another way, I want to define a "family" of similarly named functions, differing in their arguments' shapes, and outputs.

For example, I want to to define a custom type, i.e., a class, say Sum, with a member method sum that sums its arguments. I also want the function to operate only on the following arguments' shapes are:

Two integers
Two floats
An integer and a float (or vice-versa, a float and an integer)
A single integer
A single float

Generics (AKA "ad-hoc polymorphism") won't do.

A generic function sum<'T>, by definition would allow 'T to be any type: a string, a tuple of an arbitrary shape, a custom type, etc.

There can't be a single implementation that supports all possible inputs.

What I want is actually the diametric opposite of a generic function - I want a specialized function that only allows a very specific enumeration of arguments' shapes, allowing the compiler to give me compile-time type safety guarantees!

You may have also noted that just as I was talking about dynamic languages when talking about single dispatch, with Go being the notable exception, I am now speaking of "compile-time guarantees", since we're talking about function overloading.

That's because you only get FO in compiled languages that support static typing (not necessarily strong though: Elixir, for example, is statically but weakly typed, and supports FO.)

Luckily, in the languages listed above (and many others), we can implement the following (example in Java):

class Sum {
    public int sum(int x, int y) {return (x + y);}

    public double sum(double x, double y) {return (x + y);}

    public double sum(int x, double y) {return ((double) x + y);}

    public double sum(double x, int y) {return (x + (double) y);}

    public int sum(int x) {return x;} // Well... technically, it is

    public double sum(double x) {return x;}
}

// And use it...
class Main {
    public static void main(String args[]) {
        Sum s = new Sum();
        System.out.println(s.sum(1, 2)); // => Prints 3
        System.out.println(s.sum(1.0, 2.0)); // => Prints 3.0
        System.out.println(s.sum(5)); // => Prints 5
        System.out.println(s.sum(7.2)); // Prints 7.2
        // But...
        System.out.println(s.sum(1, 2, 3)); // => Compile-time `IllegalArgumentException` - too many arguments
        System.out.println(s.sum("Hello, ", "world!")); // => Compile-time `IllegalArgumentException` - illegal argument type(s)
    }
}

This method does exactly what we defined earlier:

It sums its arguments
It only accepts arguments of specific shape
- Note how we had to define, and implement, the cases for "integer-and-float" versus "float-and-integer" separately!

The compiler knows which implementation of the function to dispatch not due to its first argument, the receiver, s, since it's the same one for all these functions, but rather by knowing what arguments were provided and dispatching the implementation that takes that arguments' shape.

Now, contrast that with Julia's multiple dispatch:

function sum(x::Int, y::Int)
    x + y
end

function sum(x::Float64, y::Float64)
    x + y
end

function sum(x::Int, y::Float64)
    float(x) + y
end

function sum(x::Float64, y::Int)
    x + float(y)
end

function sum(x::Int)
    x
end

function sum(x::Float64)
    x
end

Are you kidding me?

Yes, I admit, it does kinda seem like I'm making a fool of you.

I mean, other than the expected syntax differences, those implementations are virtually identical, down right to having to define, and implement, a separate case for the int-and-float versus float-and-int shapes.

But you shouldn't be surprised.

After all, conceptually both what FO and MD were developed to do was the same: allow us to develop many finite implementations of the same function, according to its possible inputs.

So, what is the difference?!

The answer, and the root difference, lies in how each of this two features, is implemented, at what part of the code execution, and what are the costs.

Slow and steady wins the race?

As mentioned, FO is only possible in compiled language with static type system.

When the compiler encounters a method for the first time it compiles it and keeps a reference to the compiled form in its functions lookup table.

To that extent, encounter a function for the first time is both the function's name (but not its receiver), and its inputs' shape.

It's allowed for a class to have several implementations of the same method, as long as their inputs differ.

The compiler is dutiful and obedient: it sees a method that's not already on the lookup table - it compiles the method.

Simple.

And costly: as mentioned, there might be the case that a certain implementation is never actually called.

Yet the compiler took the time to compile it, and now holds a reference to it in its memory.

Both time and memory have been effectively wasted!

Even though the compiler is what's referred to as "Ahead of Time" (AOT), meaning, among other things, that it knows all the code at compile-time, and knows whether a method is, or is not, invoked - it doesn't matter.

Everything is compiled.

Fast and loose

With multiple dispatch things are vastly different!

To begin with, MD is only available in interpreted languages, which should make sense as we just mentioned the compiler of compiled languages "assimilates" all code at compile-time.

Since interpreted languages parse the code line-by-line (with the exceptions of flow control structures), nothing is pre-compiled and computed.

When the Julia interpreter encounters a function it checks against its lookup table for a function with the same name and same arguments' shape.

If it isn't found on the table the function is compiled in a manner known as "Just in Time" (JIT), which is exactly what it sounds like: it's compiled just in time for it to be used.

A reference to this compiled instance is then stored on the lookup table, so the next time the same function is encountered the performance would be comparable to an AOT-compiled function call.

And, just to make it abundantly clear: only when the interpreter encounters a new function for the first time, does it compile it. As long as we're only calling functions already met or functions from the core library nothing gets compiled, and all functions calls are O(1), as they're already on the functions lookup table of the interpreter.

They STILL look like the same picture

So, what's the gain? It still sounds like a "to-mato/to-ma-to" argument.

Let me say it again, and if it still doesn't click for you, think it over for a second: in MD languages there no work is done until a function is first encountered: no CPU cycles are used to handle, e.g., tokenize, parse, compile, store a reference, etc., a function that may never even be called, no memory is wasted holding a reference to a function that, from the interpreter's point-of-view, doesn't even exist.

Yes, there is a price to pay on initial encounter of the function: JIT compilation is usually a bit pricier than AOT compilation because a JIT compiler lacks knowledge about the entire code, that AOT compilers have, but that's a rather small price to pay... as can be evident by Julia's performance that routinely matches that of C, though that's also due to Julia's extremely smart design and implementation, employing every performance optimizing trick in the book, on top of MD.

And then again, Julia programs don't have to wait for compilation to finish. Or start, for that matter. They just run... 🤪

Another forte of MD is its built-in propensity for "monkey patching" - extending code we don't have direct control of, e.g., a library:

Assume a library we use exports a given function utilFunc. The function however is defined only for certain arguments's shapes, but we want to use it with our own arguments' shapes.

In languages like C# we'd have to use "extension methods", a hack that was invented exactly for this purpose, and like all workarounds it feels bolted on the language syntax, and is verbose to code.

In Julia we simply add in our own code something alone the lines:

function utilFunc(our::Int, arguments::Float64, list::String)
    # Implementation goes here
end

Et voila. That simple.

We don't need the library's source, in fact the only reason we're naming our function the same as the library's is because they serve the same purpose so it makes sense.

In actuality, and unlike extension methods in C#, our function could've been called whatever name we wanted, and it would work the same. Zero coupling!

Why don't they build the entire airplane from the materials of the black box?

A question comes to mind, "if MD is so good, why aren't all programming languages implementing it? Especially if, as Julia proves, you can have an interpreted, JIT-compiled, dynamic, language, and get C-like performance..."

And the answer is twofold.

First, as mentioned, Julia's performance is not only due to MD. Sure, it helps, but under Julia's hood there is some remarkable software engineering. Julia's maintainers took every page of the "How to make computers faster" book... and then wrote a few more chapters themselves.

Second, safety.

For the small(?) price of AOT compilation and its consequences, statically typed languages offer the developer something an interpreted language simply can't: compile-time type safety!

Really is that simple. Personally, I'm more than willing to pay the "AOT tax" when using my loved F#, knowing the compiler will smack me on the nose anytime I try to do something stupid in my code.

Is it worth it? I think so. Others may disagree. I like my types strong and static.

Fin

That's it.

Not quite two sides of the same coin, they're two ways to implement the same concept in differing environments.

I hope you now have a better understanding of the difference between multiple dispatch and function overloading than when you started reading this post.