Yuan Gao

Posted on Dec 21, 2020 • Updated on Dec 22, 2020

Advent of Code 2020: Day 21 with Python sets

#adventofcode #python

After the "monster" of Day 20, this challenge was refreshing.

The Challenge Part 1

Link to challenge on Advent of Code 2020 website

This challenge is again about set theory. The wording of the challenge make it difficult to parse, but ultimately what it comes down to is we can generate lots of lists of ingredients, and we know the allergen is in all of them. In other words, the allergen is the ingredient at the intersection sets consisting of ingredients known to contain it.

Importing the data

Some light regex work imports the data:

import re
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")
data = open("input.txt").read().splitlines()

for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

Here we use two matching groups to match the chunks of ingredients, and chunks of allergens. There's also a non-matching group in there for good measure (not really needed, I already wrote it, and just left it in there).

Creating candidate ingredients

We know that each list of ingredients contains some allergens. So for each allergen, we start collecting the sets of ingredients known to contain it. We also collect a master list of all ingredients at the same time (needed later)

from collections import defaultdict
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)

Output of all_candidates (from demo input)

defaultdict(list,
            {'dairy': [{'kfcds', 'mxmxvkd', 'nhms', 'sqjhc'},
              {'fvjkl', 'mxmxvkd', 'sbzzf', 'trh'}],
             'fish': [{'kfcds', 'mxmxvkd', 'nhms', 'sqjhc'},
              {'mxmxvkd', 'sbzzf', 'sqjhc'}],
             'soy': [{'fvjkl', 'sqjhc'}]})

Next, we can narrow down our sets - we know that for each allergen and the sets of ingredients to contain it, that allergen must be one of the ingredients that appear in all sets. So to do this, we do a set intersection between all sets to find the ingredients common to all. We repeat this for every allergen:

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)

Output from demo input

{'dairy': {'mxmxvkd'}, 'fish': {'mxmxvkd', 'sqjhc'}, 'soy': {'fvjkl', 'sqjhc'}}

For example, fish is probably one of {'mxmxvkd', 'sqjhc'}

Counting ingredients not on the list

The challenge output wants ingredients that aren't allergens. So we take all of our candidates, which we can get at once using candidates.values(), combine them together using set.union(*candiadtes.values()), and then find the difference between our list of all ingredients: set(all_ingredients).difference(set.union(*candidates.values()))

However, once we find this ingredient list that aren't candidates for allergens, we need to count their occurrences inside our list of all ingredients. The full code for that fits on one line:

print("appearances", sum(all_ingredients.count(ingredient) for ingredient in set(all_ingredients).difference(set.union(*candidates.values()))))

The full code for Part 1 is:

import re
from collections import defaultdict
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")

data = open("demo.txt").read().splitlines()
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)

print("appearances", sum(all_ingredients.count(ingredient) for ingredient in set(all_ingredients).difference(set.union(*candidates.values()))))

The Challenge Part 2

In the second part of the challenge, we now must take our candidates and whittle them down using a process of elimination, this is similar to what we had to do in Day 16, though instead of using recursion, we can do it with this loop in which we keep removing known allergens that we have confirmed from the candidates until a candidate is left with a single allergen, at which point we mark it as known (by writing it into the confirmed dict)

confirmed = {}

while len(confirmed) < len(candidates):
    for allergen in candidates:
        new_set = candidates[allergen].difference(confirmed.values())

        if len(new_set) == 1:
            confirmed[allergen] = new_set.pop()

        candidates[allergen] = new_set

Then, as the challenge requires us to print them using an alphabetical list of the ingredients that contain them:

print(",".join(v for k, v in sorted(confirmed.items(), key=lambda item: item[0])))

The full code for this part is (including chunks of part 1 needed):

import re
from collections import defaultdict
rule = re.compile("^((?:\w+ )+)\(contains (.*?)\)$")

data = open("demo.txt").read().splitlines()
all_candidates = defaultdict(list)
all_ingredients = []
for line in data:
    ing_str, all_str = rule.match(line).groups()

    ingredients = ing_str.strip().split(" ")
    allergens = all_str.split(", ")

    for allergen in allergens:
        all_candidates[allergen].append(set(ingredients))

    all_ingredients.extend(ingredients)

candidates = {}
for allergen_name, ingredient_list in all_candidates.items():
    candidates[allergen_name] = set.intersection(*ingredient_list)

confirmed = {}

while len(confirmed) < len(candidates):
    for allergen in candidates:
        new_set = candidates[allergen].difference(confirmed.values())

        if len(new_set) == 1:
            confirmed[allergen] = new_set.pop()

        candidates[allergen] = new_set

print(",".join(v for k, v in sorted(confirmed.items(), key=lambda item: item[0])))

Onward!

DEV Community

Advent of Code 2020: Day 21 with Python sets

The Challenge Part 1

Importing the data

Creating candidate ingredients

Counting ingredients not on the list

The Challenge Part 2

Top comments (0)

Read next

VSCode extensions are not removed from file system even after uninstalling, I created a solution !

Streamlit Part 5: Mastering Data Visualization and Chart Types

Algorithm to determine if brackets are balanced

🚢Exploratory Data Analysis on the Titanic Dataset🚢