DEV Community

Cover image for Advent of Code 2020: Day 21 with Python sets
Yuan Gao
Yuan Gao

Posted on • Updated on

Advent of Code 2020: Day 21 with Python sets

After the "monster" of Day 20, this challenge was refreshing.

The Challenge Part 1

Link to challenge on Advent of Code 2020 website

This challenge is again about set theory. The wording of the challenge make it difficult to parse, but ultimately what it comes down to is we can generate lots of lists of ingredients, and we know the allergen is in all of them. In other words, the allergen is the ingredient at the intersection sets consisting of ingredients known to contain it.

Importing the data

Some light regex work imports the data:

import re
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")
data = open("input.txt").read().splitlines()

for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")
Enter fullscreen mode Exit fullscreen mode

Here we use two matching groups to match the chunks of ingredients, and chunks of allergens. There's also a non-matching group in there for good measure (not really needed, I already wrote it, and just left it in there).

Creating candidate ingredients

We know that each list of ingredients contains some allergens. So for each allergen, we start collecting the sets of ingredients known to contain it. We also collect a master list of all ingredients at the same time (needed later)

from collections import defaultdict
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)
Enter fullscreen mode Exit fullscreen mode

Output of all_candidates (from demo input)

defaultdict(list,
            {'dairy': [{'kfcds', 'mxmxvkd', 'nhms', 'sqjhc'},
              {'fvjkl', 'mxmxvkd', 'sbzzf', 'trh'}],
             'fish': [{'kfcds', 'mxmxvkd', 'nhms', 'sqjhc'},
              {'mxmxvkd', 'sbzzf', 'sqjhc'}],
             'soy': [{'fvjkl', 'sqjhc'}]})
Enter fullscreen mode Exit fullscreen mode

Next, we can narrow down our sets - we know that for each allergen and the sets of ingredients to contain it, that allergen must be one of the ingredients that appear in all sets. So to do this, we do a set intersection between all sets to find the ingredients common to all. We repeat this for every allergen:

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)
Enter fullscreen mode Exit fullscreen mode

Output from demo input

{'dairy': {'mxmxvkd'}, 'fish': {'mxmxvkd', 'sqjhc'}, 'soy': {'fvjkl', 'sqjhc'}}
Enter fullscreen mode Exit fullscreen mode

For example, fish is probably one of {'mxmxvkd', 'sqjhc'}

Counting ingredients not on the list

The challenge output wants ingredients that aren't allergens. So we take all of our candidates, which we can get at once using candidates.values(), combine them together using set.union(*candiadtes.values()), and then find the difference between our list of all ingredients: set(all_ingredients).difference(set.union(*candidates.values()))

However, once we find this ingredient list that aren't candidates for allergens, we need to count their occurrences inside our list of all ingredients. The full code for that fits on one line:

print("appearances", sum(all_ingredients.count(ingredient) for ingredient in set(all_ingredients).difference(set.union(*candidates.values()))))
Enter fullscreen mode Exit fullscreen mode

The full code for Part 1 is:

import re
from collections import defaultdict
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")

data = open("demo.txt").read().splitlines()
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)

print("appearances", sum(all_ingredients.count(ingredient) for ingredient in set(all_ingredients).difference(set.union(*candidates.values()))))
Enter fullscreen mode Exit fullscreen mode

The Challenge Part 2

In the second part of the challenge, we now must take our candidates and whittle them down using a process of elimination, this is similar to what we had to do in Day 16, though instead of using recursion, we can do it with this loop in which we keep removing known allergens that we have confirmed from the candidates until a candidate is left with a single allergen, at which point we mark it as known (by writing it into the confirmed dict)

confirmed = {}

while len(confirmed) < len(candidates):
    for allergen in candidates:
        new_set = candidates[allergen].difference(confirmed.values())

        if len(new_set) == 1:
            confirmed[allergen] = new_set.pop()

        candidates[allergen] = new_set
Enter fullscreen mode Exit fullscreen mode

Then, as the challenge requires us to print them using an alphabetical list of the ingredients that contain them:

print(",".join(v for k, v in sorted(confirmed.items(), key=lambda item: item[0])))
Enter fullscreen mode Exit fullscreen mode

The full code for this part is (including chunks of part 1 needed):

import re
from collections import defaultdict
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")

data = open("demo.txt").read().splitlines()
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)

confirmed = {}

while len(confirmed) < len(candidates):
    for allergen in candidates:
        new_set = candidates[allergen].difference(confirmed.values())

        if len(new_set) == 1:
            confirmed[allergen] = new_set.pop()

        candidates[allergen] = new_set

print(",".join(v for k, v in sorted(confirmed.items(), key=lambda item: item[0])))
Enter fullscreen mode Exit fullscreen mode

Onward!

Top comments (0)