Well, it's been a while. Hyderabad's weather has been erratic lately and for a brief, brief time - so has my consistency towards this project. But I'm back!
Anyhoo, the second stage of the Interpreter is the Parser, which accepts the tokens generated by the Scanner and creates a syntax-tree out of them. This tree is based on certain specified rules - the Context Free Grammar.
As against regular language, which groups similar language into tokens but cannot adequately deal with deeply nested expressions, CFG allows for infinite possibilities of string/code (called derivations, because they’re “derived” from rules).
But before we create the Parser itself, we need to create and define these rules, basis which trees can be constructed.
What I built: Commit 3264f2b
What I understood:
1) Introducing Trees and Grammar
- Trees are made of terminals and non-terminals (references to other rules).
- The Scanner emits Tokens which are consumed by the Parser, and grouped based on the grammar's rules, to produce nodes for the tree.
- Literals (string/numeric) are often leaf or end nodes, and operators - the internal nodes.
- Rules are also capable of referring to themselves, allowing for recursion and hence infinite possibilities.
- Based on Robert's breakfast example to explain how rules are defined, I created my own example as well.
- Terminals are enclosed in double-quotes and are in blue, while non-terminals are in black. Grouping/OR/repetitions are in purple.
- We can pick any of the options for a rule.
-
Ex: I choose the first rule in chilling:
chilling → do activity;
- I now go to the non-terminal do and choose any of its rules. Let's say:
do → "watch";
- Because watch is a terminal, I have no other rule to refer to through it. So in chilling, I replace do with watch, making it:
chilling → "watch" activity;
- I now go to the nonterminal activity and choose a rule.
- I decide to pick music (movie would have been a more appropriate choice for "watch", but I want to demonstrate the flexibility CFG offers).
activity → music;
- In music, which is a non-terminal, I choose:
music → "piano";
- Because there's nowhere to go beyond "piano," I have:
activity → "piano";
- Finally, going back to the first rule we started out with, we have a full-fledged instruction ready, based only on a few grammar rules:
chilling → "watch piano";
2) Lox's actual grammar
- We create a base class (or rule) called
Expr
(expression), that refers to other subclasses (also rules), which can again refer toExpr
. This recursion is important because expressions generally consist of other expressions, which ultimately contain operators and operands. - Because
Expr
and its subclasses will have a similar, repetitive template, we metaprogram to have code write code for us. - We do so by creating a general template that takes from a description and creates the relevant subclass using a for-loop.
defineAst(outputDir, "Expr", Arrays.asList(
"Binary: Expr left, Token operator, Expr right",
"Grouping: Expr expression",
"Literal: Object value",
"Unary: Token operator, Expr right"
));
...
for (String type : types) {
String className = type.split(":")[0].trim();
String fields = type.split(":")[1].trim();
defineType(writer, baseName, className, fields);
}
3) The Expression Problem
- This is a huge issue programmers face when they wish to have multiple (sub)classes with similar behaviour (methods).
- In our situation, we deal with various subclasses like Binary, Unary, Literal, etc. There are some common methods necessary for them all.
- Ex: Printing or computing the expression
- One way to deal with this would be by adding a print method for each subclass, i.e.
printBinary()
,printUnary()
,computeBinary()
,computeUnary()
, etc. - The issue with this approach is when we'd like to add a new method - say
typeCheck()
- we'll have to update every subclass to implement its version of this new method, making the entire code bulky. - Another way to resolve this is by using a common method that checks which subclass it must operate on, through if-else. But that is also problematic. What if we decide to add a new type/subclass? We'll have to update every (common) method to accommodate this new type.
4) Classes' Behaviour - What should it be like?
- The
Expr
class and its subclasses are merely data structures that must be passed between phases of the Interpreter. - They must not have any "behaviour" (methods) of their own, lest we violate the 'Separation of Concerns' principle, which says that each part of the program must focus on one responsibility only.
- Because if we start mixing multiple responsibilities into one class, that class becomes harder to maintain, reuse, and extend without drastically altering it.
5) Solution: The Visitor-Design Pattern
- This is a standard approach for solving the Expression Problem in object-oriented languages like Java.
- I see this as a bidirectional communication method, that overcomes the issues of the unidirectional approaches mentioned earlier.
- Here, we can extend the behaviour of (sub)classes while preserving their inherent structure, by simply creating a new visitor.
- We create a Visitor 'interface' with a list of
visitSubclass()
methods, which aren't yet defined. - The main/base class has an
accept()
method that's extended and overridden by each of its subclasses. - We then create a new class for a new behaviour. This class implements the Visitor interface, and is therefore the Visitor class.
- When I say "implements" the visitor interface, we just override the interface's undefined methods and give them specific behaviour.
- When this new class' object is created, it accepts an expression from the parser. This is the Visitor object.
- Depending on the expression's type (subclass), the
accept()
method relevant to the type is called, with the class' object passed as an argument. - The subclass'
accept()
now receives the Visitor object and goes on to call thevisitSubclass()
method associated with it, while also passing the subclass object as an argument. This is for giving visitSubclass() access to the subclass' fields. - Remember when I said we overrode the interface's
visitSubclass()
method in the new class and gave it specific behaviour? That behaviour (method) gets implemented here.
What I didn't understand: The Visitor-Design Pattern
If the previous section completely went over your head, don't worry. It did for me too. I took two days to figure this out. 😁
- Let's understand this through an example. Assume that the behaviour we wish to implement here is of printing the expression. And assume that we need to print a binary expression.
- In our
Expr
class, we create the Visitor interface, with methods likevisitBinary()
,visitUnary()
,visitLiteral()
,visitGrouping()
, etc. Note that these methods are undefined here. - We create an
accept()
method forExpr
that is also undefined. - In each subclass extending
Expr
, we overrideExpr
’saccept()
, such that it calls thevisitSubclass()
method associated with it. For example, theBinary
subclass’saccept()
will callvisitBinaryExpr(this)
.
abstract class Expr {
abstract <R> R accept(Visitor<R> visitor);
interface Visitor<R> {
R visitBinaryExpr(Binary expr);
R visitGroupingExpr(Grouping expr);
R visitLiteralExpr(Literal expr);
R visitUnaryExpr(Unary expr);
}
static class Binary extends Expr {
Binary(Expr left, Token operator, Expr right) {
this.left = left;
this.operator = operator;
this.right = right;
}
@Override
<R> R accept(Visitor<R> visitor) {
return visitor.visitBinaryExpr(this);
}
final Expr left;
final Token operator;
final Expr right;
}
}
-
this
is passed as an argument because the subclass is giving the visitor object full access to itself, and therefore all its fields. - We now create a new class called
PrintExpr
that implementsExpr
's Visitor interface. - It then overrides all the
visitSubclass()
methods to print the expression associated with each subclass. -
visitBinaryExpr()
will be overridden to return/print the whole binary expression. - We can create as many behaviours as we like without changing a single thing about the subclasses.
class PrintExpr implements Expr.Visitor<String> {
String print(Expr expr) {
return expr.accept(this);
}
@Override
public String visitBinaryExpr(Expr.Binary expr) {
return expr.left.accept(this) + " " + expr.operator.lexeme + " " + expr.right.accept(this);
}
public static void main(String[] args) {
Expr expression = new Expr.Binary(
new Expr.Unary(
new Token(TokenType.MINUS, "-", null, 1),
new Expr.Literal(123)
),
new Token(TokenType.STAR, "*", null, 1),
new Expr.Grouping(
new Expr.Literal(45.67)
)
);
System.out.println(new PrintExpr().print(expression));
}
}
- If I had to explain it in a very crude way: Let's say there's a census being conducted, for which an officer must visit your house. They are supposed to count members of your house, and cannot visit without an invitation. Maybe they'll also issue certain ID cards. (Both these - counting and issuing cards - are behaviours)
- The Census Office creates an interface that sends out mails informing families of the census. (This is the Visitor Interface and families/houses are subclasses)
- When this mail is received by a house (liken this to the visitor object passing itself), it is supposed to accept the visitor (
accept()
). - The house is also supposed to send a mail back to the Census office saying "Yes, we agree to allow the officer inside." (This is the subclass giving the visitor object access to itself through
this
while callingvisitSubclass()
) - The officer then arrives and either counts or issues an ID card (depending on the class we've created to implement the behaviour).
What's next: For real, the Parser. Like "Parser" Parser. Proper Parser.
Musings:
Nothing interpreter related, honestly. But I had a really good time today. Spent about 15-20 minutes in the balcony looking at the city. I really, really love balconies. There's no such thing as a bad balcony, I guess?
You see, they show us a world beyond the comfort of wherever we are. They offer us a phenomenal vantage point from where we can watch countless little dramas unfold; from where we can both appreciate man's incredible skill in creating these chaotically beautiful structures, and also find ourselves amused at how it's the lone, calm green tree breaking through the maze of streets, that we find ourselves most in awe of.
Top comments (0)