Lark is a Python parsing library. Unlike parser generators like Yacc it doesn’t generate a source code file from a grammar — the parser is generated dynamically. Let’s see hot it works. You import Lark:
from lark import Lark
then specify the grammar:
grammar = """
start: WORD "," WORD "!"
%import common.WORD
%ignore " "
"""
The grammar can be a Python string or read from a separate file. After that, just create a Lark class instance, initializing it with the grammar:
parser = Lark(grammar)
and you are ready to parse:
def main():
print(parser.parse("Hello, world!"))
print(parser.parse("Adios, amigo!"))
if \_\_name\_\_ == '\_\_main\_\_':
main()
parser.parse returns a Tree instance containing the parse tree:
Tree(start, [Token(WORD, 'Hello'), Token(WORD, 'world')])
Tree(start, [Token(WORD, 'Adios'), Token(WORD, 'amigo')])
That’s it, clean and simple. It’s up to you to decide what to do with the parsed string. Let’s see where we can go from there. Here is an example of a simple arithmetic expression parser:
from lark import Lark
grammar = """
start: add\_expr
| sub\_expr
add\_expr: NUMBER "+" NUMBER
sub\_expr: NUMBER "-" NUMBER
%import common.NUMBER
%ignore " "
"""
The grammar ignores spaces. Also note that the grammar terminals are written in uppercase letters (NUMBER) while the grammar rules are written in lowercase letters (start, add_expr and sub_expr). %import and %ignore are directives. You can find the grammar reference in the Lark documentation. We can import definitions from other grammars — in this case common.lark .( common.lark just contains some useful definitions). The above grammar will successfully parse addition and subtraction expressions, like:
1+1
2-1
3 - 2
and nothing else. Next, create the Lark object:
parser = Lark(grammar)
and we are ready to parse:
def main():
print(parser.parse("1+1"))
print(parser.parse("2-1"))
print(parser.parse("3 - 2"))
if \_\_name\_\_ == '\_\_main\_\_':
main()
The output is as expected:
Tree(start, [Tree(add\_expr, [Token(NUMBER, '1'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '2'), Token(NUMBER, '1')])])
Tree(start, [Tree(sub\_expr, [Token(NUMBER, '3'), Token(NUMBER, '2')])])
Note that this example just prints the parse tree as before. Let’s transform it to something more useful:
from lark import Lark, Transformer
grammar = """
start: add\_expr
| sub\_expr
add\_expr: NUMBER "+" NUMBER -> add\_expr
sub\_expr: NUMBER "-" NUMBER -> sub\_expr
%import common.NUMBER
%ignore " "
"""
add_expr and sub_expr on the right hand side of the grammar rules are the names of the functions that are to be applied when a rule is successfully parsed. Let’s write them:
class CalcTransformer(Transformer):
def add\_expr(self, args):
return int(args[0]) + int(args[1])
def sub\_expr(self, args):
return int(args[0]) - int(args[1])
Uh. For instance, when parsing
2-1
args[0] will contain "2" and args[1] will contain "1" . In our transformer functions we convert both to integers and add or subtract them returning the result. Now create the Lark object:
parser = Lark(grammar, parser='lalr',
transformer=CalcTransformer())
For it to be able to accept transformers the parser needs to be a LALR parser. We are finally ready to parse:
def main():
print(parser.parse("1+1"))
print(parser.parse("2-1"))
print(parser.parse("3 - 2"))
if \_\_name\_\_ == '\_\_main\_\_':
main()
The output is now:
Tree(start, [2])
Tree(start, [1])
Tree(start, [1])
Better? 1+1 is 2, 2–1 is1 and 3–2 is also 1.
Of course this is just scratching the surface. If you are interested, you can find the full examples on Github.
Top comments (0)