Antlr4 parser generator Java tips

Tianyi Liu — Tue, 09 Mar 2021 05:09:31 +0000

I'm new to Antlr(and any other parser generators;_;), so it took me a while to get familiar with the parser, lexer and visitor. Here I note down some tips that might be helpful in the future.

Avoid multiple lexer rules that match the exact same pattern, otherwise one of them will be never matched. In my parser, there are two nodes and a weight, where all of them are numbers, but I thought since they are different elements, I should use different lexer rules for each of them, like the following:

tuple
    : nodefrom weight nodeto
    ;
nodefrom: [0-9]+;
weight: [0-9]+;
nodeto: [0-9]+;

And this is causing mismatching. To avoid this, it's better to use only one Lexer rule for Integers:

tuple
    : nodefrom weight nodeto
    ;
nodefrom: INT;
weight: INT;
nodeto: INT;
INT: [0-9]+;

Because there might be several operations involved in the grammar, so we need to add tags(#operator) for each one, which leads to a separate visitor function. For example,

expr
    : expr '*'         #aster
    | expr '.' expr    #concate 
    | expr '+' expr    #plus
    | tuple            #tup
    | '(' expr ')'     #parens
    ;

This is part of my parser, which uses 3 operations, star, concatenation, and plus. With #aster, #concate and #or, it'll generate interfaces visitAster(), visitConcate() and visitPlus(), and then we can implement these operations separately. If we don't create these tags, there will be only one interface created, which is visitExpr().

When implementing the methods for visitor class, we only need to focus on those ones that are related to the values we want to compute when traversing the parse tree. For example, in my parse tree, each node is a tuple
(Node1, Weight_1, Weight_2, Node2), and there are visit methods for each value. If the visitor I create only requires Weight_1 for computation, then I need to implement visitWeight_1() specifically.
I found this visitor class is really convenient, where it generates a method for each parse rule automatically, and you can get the value of one node simply by ctx.ValueName(). So when I write the code for visitor methods, I always run grun Calc prog -gui calc.txt to open a visualized parse tree, and check the relationship between those nodes.

This is part of the parse tree I'm using. There are 6 elements in tuple, (nodef, wc, wd, nodet)(including parentheses). So if right now I want to get the value of wc in the function visitTuple(Parser.TupleContext ctx), I can write something like ctx.wc().

Still working...

DEV Community: Tianyi Liu

Antlr4 parser generator Java tips