DEV Community

Vicente Maldonado
Vicente Maldonado

Posted on • Originally published at Medium on

Python Lark Parser introduction

Lark is a Python parsing library. Unlike parser generators like Yacc it doesn’t generate a source code file from a grammar — the parser is generated dynamically. Let’s see hot it works. You import Lark:

from lark import Lark
Enter fullscreen mode Exit fullscreen mode

then specify the grammar:

grammar = """
start: WORD "," WORD "!"
%import common.WORD
%ignore " "
"""
Enter fullscreen mode Exit fullscreen mode

The grammar can be a Python string or read from a separate file. After that, just create a Lark class instance, initializing it with the grammar:

parser = Lark(grammar)
Enter fullscreen mode Exit fullscreen mode

and you are ready to parse:

def main():
    print(parser.parse("Hello, world!"))
    print(parser.parse("Adios, amigo!"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()
Enter fullscreen mode Exit fullscreen mode

parser.parse returns a Tree instance containing the parse tree:

Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'world')])
Tree(start, [Token(WORD, 'Adios'), Token(WORD, 'amigo')])
Enter fullscreen mode Exit fullscreen mode

That’s it, clean and simple. It’s up to you to decide what to do with the parsed string. Let’s see where we can go from there. Here is an example of a simple arithmetic expression parser:

from lark import Lark

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER

sub\_expr: NUMBER "-" NUMBER

%import common.NUMBER
%ignore " "
"""
Enter fullscreen mode Exit fullscreen mode

The grammar ignores spaces. Also note that the grammar terminals are written in uppercase letters (NUMBER) while the grammar rules are written in lowercase letters (start, add_expr and sub_expr). %import and %ignore are directives. You can find the grammar reference in the Lark documentation. We can import definitions from other grammars — in this case common.lark .( common.lark just contains some useful definitions). The above grammar will successfully parse addition and subtraction expressions, like:

1+1
2-1
3 - 2
Enter fullscreen mode Exit fullscreen mode

and nothing else. Next, create the Lark object:

parser = Lark(grammar)
Enter fullscreen mode Exit fullscreen mode

and we are ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))    

if \_\_name\_\_ == '\_\_main\_\_':
    main()
Enter fullscreen mode Exit fullscreen mode

The output is as expected:

Tree(start, [Tree(add\_expr, [Token(NUMBER, '1'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '2'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '3'), Token(NUMBER, '2')])])
Enter fullscreen mode Exit fullscreen mode

Note that this example just prints the parse tree as before. Let’s transform it to something more useful:

from lark import Lark, Transformer

grammar = """
start: add\_expr
     | sub\_expr

add\_expr: NUMBER "+" NUMBER -> add\_expr

sub\_expr: NUMBER "-" NUMBER -> sub\_expr

%import common.NUMBER
%ignore " "
"""
Enter fullscreen mode Exit fullscreen mode

add_expr and sub_expr on the right hand side of the grammar rules are the names of the functions that are to be applied when a rule is successfully parsed. Let’s write them:

class CalcTransformer(Transformer):

    def add\_expr(self, args):
        return int(args[0]) + int(args[1])

    def sub\_expr(self, args):
        return int(args[0]) - int(args[1])
Enter fullscreen mode Exit fullscreen mode

Uh. For instance, when parsing

2-1
Enter fullscreen mode Exit fullscreen mode

args[0] will contain "2" and args[1] will contain "1" . In our transformer functions we convert both to integers and add or subtract them returning the result. Now create the Lark object:

parser = Lark(grammar, parser='lalr', 
    transformer=CalcTransformer())
Enter fullscreen mode Exit fullscreen mode

For it to be able to accept transformers the parser needs to be a LALR parser. We are finally ready to parse:

def main():
    print(parser.parse("1+1"))
    print(parser.parse("2-1"))
    print(parser.parse("3 - 2"))

if \_\_name\_\_ == '\_\_main\_\_':
    main()
Enter fullscreen mode Exit fullscreen mode

The output is now:

Tree(start, [2])
Tree(start, [1])
Tree(start, [1])
Enter fullscreen mode Exit fullscreen mode

Better? 1+1 is 2, 2–1 is1 and 3–2 is also 1.

Of course this is just scratching the surface. If you are interested, you can find the full examples on Github.

Top comments (0)