DEV Community

loading...
Cover image for Advent of Code 2020: Day 19 abusing PEG grammar in Python the way it's not supposed to

Advent of Code 2020: Day 19 abusing PEG grammar in Python the way it's not supposed to

meseta profile image Yuan Gao Updated on ・3 min read

Another quick one today.

The Challenge Part 1

Link to challenge on Advent of Code 2020 website

Today's challenge provides a list of rules to parse and apply to messag... wait why does this seem so familiar? Oh, it's because the way the challenge describes rules is exactly the same way PEG grammars are written, which we just used in Day 18 (and Day 07, and Day 04).

Essentially the challenge wants you to implement expression grammar, the exact thing we were writing to solve previous day's challenges whenever we used PEG.

An example of the rules given was:

0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"
Enter fullscreen mode Exit fullscreen mode

PEG

So this got me thinking... these rules look a LOT like PEG, what if we just turned it into PEG? The particular flavour of PEG that Parsimonious uses would look like this:

RULE0 = RULE4 RULE1 RULE5
RULE1 = ((RULE2 RULE3) / (RULE3 RULE2))
RULE2 = ((RULE4 RULE4) / (RULE5 RULE5))
RULE3 = ((RULE4 RULE5) / (RULE5 RULE4))
RULE4 = "a"
RULE5 = "b"
Enter fullscreen mode Exit fullscreen mode

So, we can just write a function that does this conversion!

import re
def convert(line):
    line = line.strip().replace(":", " =")
    line = re.sub(r"(\d+)", r"RULE\1", line)
    line = re.sub(r"= (.*) \| (.*)$", r"= ((\1) / (\2))", line)
    return line
Enter fullscreen mode Exit fullscreen mode

Now, we can convert the challenge rules to PEG, and have Parsimonious use it:

from parsimonious.grammar import Grammar

data = open("input.txt").read().splitlines()
rules = data[:137]

rules.sort(key = lambda x: int(x.split(":")[0]))
converted = "\n".join(map(convert, rules))
grammar = Grammar(converted)
Enter fullscreen mode Exit fullscreen mode

With grammar now containing the challenge rules, we can run it on inputs:

grammar.parse("aaaabb")
Enter fullscreen mode Exit fullscreen mode

If this is successful, it returns a bunch of node information. If there's a string that isn't valid, it raises a ParseError

So we just need to wrap this in a function that catches this ParseError and returns True if successful, False if not:

from parsimonious import ParseError
def parse(line):
    try:
        grammar.parse(line)
    except ParseError:
        return False
    return True
Enter fullscreen mode Exit fullscreen mode

That's it! The whole code is:

from parsimonious.grammar import Grammar
from parsimonious import ParseError
import re

def convert(line):
    line = line.strip().replace(":", " =")
    line = re.sub(r"(\d+)", r"RULE\1", line)
    line = re.sub(r"= (.*) \| (.*)$", r"= ((\1) / (\2))", line)
    return line

data = open("input.txt").read().splitlines()
rules, messages = data[:137], data[138:]

rules.sort(key = lambda x: int(x.split(":")[0]))
converted = "\n".join(map(convert, rules))
grammar = Grammar(converted)

def parse(line):
    try:
        grammar.parse(line)
    except ParseError:
        return False
    return True

print("sum", sum(map(parse, messages)))
Enter fullscreen mode Exit fullscreen mode

Surprisingly short for this late. But perhaps we did something we weren't supposed to. Sneaky.

The Challenge Part 2

In the second part of the challenge, a couple rules are switched over to create loops. Since we've sorted the rules anyway, swapping them out is easy:

rules[8] = "8: 42 | 42 8"
rules[11] = "11: 42 31 | 42 11 31"
Enter fullscreen mode Exit fullscreen mode

Finally, there's a slight issue with how PEG deals with recursive matching, and it's a similar problem to the one discussed in the challenge text itself, and so we just have to insert a little manual workaround on rule 31 due to the way it tries to consume the entire message and fail to return.

The entire code looks like this:

from parsimonious.grammar import Grammar
from parsimonious import ParseError
import re

def convert(line):
    line = line.strip().replace(":", " =")
    line = re.sub(r"(\d+)", r"RULE\1", line)
    line = re.sub(r"= (.*) \| (.*)$", r"= ((\1) / (\2))", line)
    return line

data = open("input.txt").read().splitlines()
rules, messages = data[:137], data[138:]

rules.sort(key = lambda x: int(x.split(":")[0]))
rules[8] = "8: 42 | 42 8"
rules[11] = "11: 42 31 | 42 11 31"
converted = "\n".join(map(convert, rules))
grammar = Grammar(converted)

def parse(line):
    try:
        grammar.parse(line)
    except ParseError:
        if "RULE31" in repr(err):
            return True
        return False
    return True

print("sum", sum(map(parse, messages)))
Enter fullscreen mode Exit fullscreen mode

I'm not too happy with the RULE31 hack, I wonder if other PEG grammars or EBNF parsers would have the same issue. However, the solution was quick, and I didn't have to write any parsers by hand.

Onward!

Discussion (0)

pic
Editor guide