Aaaand she's here! Last time, we created the rules/grammar basis which the parser converts tokens into syntax-tree. Now, we create association rules, i.e. we define the order in which expressions must be parsed to generate the tree.
Recursive descent parsing is used here, wherein the parser starts from the outermost/top-level rule with lowest precedence and traverses through subexpressions to make it to the tree's bottom, where it finds literals (which happen to have the highest precedence).
What I built: Commit 5750470
What I understood:
1) Association Rules
- The rules we create must be defined as functions which call other functions of higher precedence, until we reach literals.
- We create a
Parser
class with a list of tokens and a current pointer for noting the position. - We also use a
match()
function to look for certain tokens after the current one. This is the parser anticipating code based on the function it's currently in.
private boolean match(TokenType... types) {
for (TokenType type : types) {
if (check(type)) {
advance();
return true;
}
}
return false;
}
- A
ParseError
class (extending Java'sRuntimeError
class) is created to handle errors. -
parse()
is defined to callexpression()
-
expression()
callsequality()
, which callscomparison()
, which callsterm()
, which callsfactor()
, which further callsunary()
, which ultimately calls either itself orprimary()
-
primary()
checks forLITERAL
,TRUE
,FALSE
, andLEFT-PAREN
- When we encounter
LEFT-PAREN
, we useconsume()
toadvance()
when theRIGHT-PAREN
is found. Else, a custom error is reported: "Expect ')' after expression."
2) Error Handling
- We use a method called 'Panic Mode Error Recovery,' which uses a concept called synchronisation to find the nearest acceptable token to restart parsing from, whenever the parser encounters an unanticipated/wrong token.
-
Ex: Tokens like
CLASS
,VAR
,FUN
,FOR
,IF
, andWHILE
are good tokens to restart parsing from, because they mark the beginning of a new statement. - This is the parser "panicking" and looking for a safe and graceful exit from the error line.
- It also prevents the parser from reporting a cascade of errors associated with the initial error.
- Ex: If the parser sees:
var a = 1 + ;
var b = 2;
- It notes that after
+
, we have a;
, when we should have another operand. This is an error. - Without panic-mode error recovery, it would report a cascading errors like:
Error on line 1: Expected an expression after '+'.
Error on line 2: Unexpected token 'Var'
Error on line 2: Missing expression.
Error on line 2: Unexpected semicolon.
- With panic-mode,
var
in the second line, is according tosynchronisation()
, a safe place to resume parsing from. So error reporting looks like:
Error on line 1: Expected an expression after '+'.
What I didn't understand: How the hierarchy of association rules works. This is fundamentally based on the rules of precedence and association we define, and uses match()
to look for tokens, and then call the next appropriate rule (function).
Let's say we have the expression: 5-3*2
- First,
parse()
is called, which callsexpression()
, which callsequality()
, and so on until we reachprimary()
- Now,
primary()
recognises that the first character5
is aLiteral
and hence creates an object for it. -
primary()
returnsLiteral(5)
tounary()
, which returns it tofactor()
, which returns it toterm()
- Now that the pointer is currently with the second character,
term()
finds a match when it sees-
. Now that we have an operator, the right-operand must be found. - Hence,
term()
callsfactor()
again, and so on, untilprimary()
finds and returns3
as anotherLiteral
object. - This object now reaches
factor()
, which finds a match in*
, and decides that3*
needs a right-operand. - Again,
factor()
callsunary()
, which callsprimary()
, which finds and returns2
as aLiteral
object. - This object reaches
factor()
, which now has a full binary expression. It then creates aBinary
object likeBinary(3*2)
, and returns it toterm()
- Now,
term()
has its right-operand for5-
, and finally constructs a full binary expression, i.e.Binary(5-(3*2))
- In this endeavour, by giving
factor()
higher precedence overterm()
, we ended up preserving the BODMAS rule. - And lo! The binary expression is returned all the way to the top of the stack until
parse()
returns it too. It is from this expression/object that we create a syntax tree.
What's next:
We’re going to evaluate the expressions we constructed the tree for! We're also going to extend the parser to handle not just expressions, but also statements - which is what actual code looks like.
Musings:
In my opinion, concepts are best understood through examples and metaphors. Unless we really engage with the subject at hand, we cannot internalise what's being taught. And we do this best by trying to use the pool of knowledge we have through our own experiences in life.
I once read in the Upanishads that the best process to learn anything is by shravana (listening to or reading from a credible guru or text), manana (contemplating through deep thought, analysis, questioning, and even explaining it to another), and finally nididhyasana (meditating upon the knowledge gained from the previous two stages, to transform it from a concept into a lived reality).
While I’m not as thorough with this process as I’d like to be, I do try my best to understand a concept through various perspectives. It also gives me the opportunity to apply these learnt formulae to problems beyond computer-science.
Top comments (0)