Functions

#computerscience #compilers #interpreter #learning

Well, It's been a minute. I'd finished the interpreter a long time ago, but had put off blogging it because a thousand different things came up these past few months. Anyhow, it now takes on the ability to handle functions - user defined, native, and nested - and even deal with shadowing, wherein a variable in a local scope "eclipses" the same (existing) variable in an outer scope. We'll also enter proper compiler territory by creating a "resolver" phase.

What I built: Commits 3da669d, 930014e, 5adb148, and 9d534f3

What I understood:

1) Updating the Grammar:

The function call takes the highest precedence here, above the unary rule.
We also create an object (class) called Callable that helps resolve whether or not something can be used as a function.
Ex: A string isn't callable, while classes and functions (user-defined and native) are. Python does something really cool by allowing all its datatypes to be callable, because they're really objects that're being called.
If the "callee" (entity calling the function) doesn't belong to LoxCallable, Lox raises an error.
The parser also allows for currying or chained function calls through (( "(" arguments? ")" )*, allowing for function(arg)(arg)(arg)

2) Arguments:

We also check for arity, wherein the number of parameters in the function definition must equal the number of arguments passed in the call.

3) Native Functions:

We create clock() in Java, which Lox can access.
This is an anonymous class implementing LoxCallable, instantiated and bound to a variable "clock" in the interpreter's globals environment.
It returns the system time in seconds, for us to perform benchmarks.

4) Function Declaration:

Every time the interpreter encounters fun, it creates a callable object out of it (through LoxCallable) and binds it to the function's name in a new environment created.
In this local environment, the function's arguments are bound to the parameters mentioned in the declaration.
Every time the interpreter encounters a function call, a new environment is created. This is to allow for recursion, which wouldn't be possible if an environment existed only for declarations.
Once the function's body is completed, this local environment is destroyed and the execution environment returns to the state before the call.

5) Return:

The interpreter looks for a return statement followed by an optional expression.
If no return is mentioned, nil is implicitly returned.
When return is encountered, the interpreter must immediately jump out of all its (current) environments and revert to the original function call site.
Instead of a complex stack to do so, we use a simple exception class called Return, which evaluates the return value (if given) and wraps it, finally returning it as the function's result.
This is kept inside a try-catch block, that looks for the return "exception", which if not found, causes the full function to be parsed and nil returned implicitly.

6) Scoping:

So far, Lox was able to support dynamic scoping wherein a called function's scope was always set to the global environment, and not the environment it was defined in.
This hindered the called function's access to variables in the environment it was defined in, once the environment ended (when the function finished executing).
Basically, a live reference to a mutable scope was being captured, when it should have captured the lexical environment it was defined in, instead of deferring name resolution to the call site.
To explain this with an example:

fun getName(name) {
     var person = name;
     fun greet() {
          print “Hello, “ + person + “!”;
     }
     return greet;
}
var momo = getName("Momo”);
var chutney = getName("Chutney”);

momo(); //Should give “Hello, Momo!” ; Instead, it gives an error: “Undefined variable!”
chutney(); //Should give “Hello, Chutney!”

This is because once getName("Momo") finishes, the variableperson` is destroyed, as the function's scope ended.
When momo() is called, it tries to find person in its parent scope, which has now become the global scope (which doesn't contain the variable person) → So error!
We deal with this issue of dynamic scoping, by using lexical scoping.
Here, when a function is first declared, we close over and capture the environment surrounding the function (inner function holds a live reference to the outer one, which the garbage collector can't destroy despite the completion of the function's execution).
When the function is called, this captured environment becomes the call's parent instead of globals.

7) Resolver:

Implementing closures through lexical scoping gave rise to another issue.
Because a closure holds a live reference to a "mutable" map representing the current scope, its captured view of the current scope can change based on code that runs later in the same block.
This shouldn't be the case. Because Lox has lexical/static scope, a variable's usage must always "resolve" to the same declaration throughout the program's execution.
Even a redeclaration/redefinition of the variable later in the code shouldn't influence behaviour written with the original/declared value in mind.
Let's understand this through an example:

var a = "one"; { fun showA() { print a; } showA(); var a = "two"; showA(); }

In this code, the variable a belongs to the global scope, with its value being "one."
The function showA creates a closure that captures the block/environment it is declared in.
When showA() is called, a is first searched for in the block, and when not found, is searched for in the environment above (i.e., global). Hence, the output for print a is "one."
But, a is redeclared to have the value "two" in the same block (i.e., in the same environment captured by showA())
So, when showA() is called after the redeclaration, because the scope was mutable, the output is "two" when it should've been "one" again.
The resolver, added after the parser and before the interpreter, thus uses a concept called "Persistent Data Structures" wherein the original data structure can't be tampered with directly.
Any change to an existing structure creates a newer structure that contains the original information along with the modification.
To give a parallel, this is how blockchain works. If you're familiar with it, you must know that every time a block is modified in a ledger, a new block is created referencing the older one's hash, while also containing the change.
When the resolver enters a block and sees a variable being "used" (after its declaration), it searches upward from the function's scope.

Scope 0 (the function's scope):ais not found Scope 1 (block scope):ais not found (asvar a = "two"hasn't been defined yet in the resolver's static pass) Scope 2 (global scope):ais found

Because the variable is found 2 hops away from the innermost scope (of its use), the resolver annotates that "use" or node with the number 2.
When the interpreter encounters print a, instead of searching for the variable, it just uses the given number of hops to jump to the declaration.

What's next: We're going to keep it Classy! 💅
(Hehe, classes and inheritance, in case you didn't get it)

Musings:
I’ve had the opportunity to witness some of the most wonderful and bewildering of things these past few months. Feels like I saw the full spectrum of life, much like Shakespeare’s seven ages of man. In a way, it really humbled me. For the longest time, I was afraid of letting go… of my beliefs, interests, and even some dreams, because I thought I’d lose my very identity. Now, I’ve learnt that much like functions, it’s okay to have a template of non-negotiables, upon which we can let life keep adding her own “implementations.” (I absolutely HAD to use that analogy, pleeease excuse me). Change and unpredictability don’t seem so daunting anymore, and maybe (like 0.05%) are good too. I suppose c’est la vie.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.