# Advent of Code 2021: Day 03 with Python, and more numpy

The data is a big list of binary values, but we actually need to load the data as a big matrix of `1` or `0` to be able to efficiently slice the matrix while doing calculations.

To do this, we can have numpy fix the data for us. A regular import of lines using "U" format code for strings, imports the data as an array of strings. We also want to know how many bits are in each value, which we get from the first entry.

``````lines = np.loadtxt("day_3.txt", "U")
bits = int(len(lines))
``````

Resulting in:

``````array(['011010010110', '101110100110', '011011011110', '100011010001',
'011011100111', '011000100101', '101101011001', '010001001001',
'100010110111', '001110110101', '100111011001',
``````

Then, we can make a `view` of the data, which will give us a single large array containing one character per entry, and we also cast it to `int` type:

``````lines.view('U1').astype(int)
``````

Resulting in:

``````array([0, 1, 1, ..., 1, 1, 0])
``````

Finally, we need to put this back into the right shape that it was in before, so we `reshape` it. The final import code is:

``````lines = np.loadtxt("day_3.txt", "U")
bits = int(len(lines))
data = lines.view('U1').astype(int).reshape(lines.shape, bits)
``````

## Binary array to int conversion

For both parts of the question, we need to convert an array of bits, like `[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0]` into a number like `1602`. There are several ways of doing it, the way I've chosen is to multiply the array by another array that is made up of powers of 2: `[..., 32, 16, 8, 4, 2, 1]`. So I will pre-calculate `pow2` early on in the code:

``````pow2 = 1<<np.arange(bits)[::-1]
``````

Other methods involve converting the array into a string, and using python's built-in `int()` function to convert it. `int` has base-conversion abilities:

``````bin2dec = lambda x: int("".join(map(str,x.astype(int))),2)
``````

Finally, numpy actually has a function to pack bits back into bytes, however it's a little awkward to use as you need to pad it out to the correct number of values for your eventual type.

``````bin2dec = lambda x: np.packbits(np.pad(x,[32-bits, 0])).view('>u4')
``````

I'll be using the `pow2` method, precalculating it, and doing `pow2.dot(x)` any time I want to convert `x` to a number.

## Part 1

Part 1 requires counting the number of `0` or `1` in each column and comparing them to see which one is more. Since numpy can directly tell us all the elements that have a given value, and it can also do this per axis.

So the number of `1` in every bit position is:

``````ones = (data == 1).sum(axis=0)
``````

The result is a count for each column:

``````array([483, 487, 485, 467, 513, 509, 492, 532, 508, 497, 490, 506])
``````

Doing the same for zeros, and then comparing the two produces an array of true/false, and this can be turned back into a number my doing a dot product against `pow2` which was precalculated earlier.

``````ones = (data == 1).sum(axis=0)
zeros = (data == 0).sum(axis=0)
result = pow2.dot(ones > zeros) * pow2.dot(ones < zeros)
``````

## Part 2

Part 2 is...long. I don't even know if I can paraphrase the problem exactly. The idea is to take each column of bits successively, and test to see if each row has a bit value that is the same as the majority of other bits, and discard the value if not. Repeating for the minority case.

To achieve this, we loop through the columns using a `for` loop:

``````for idx in range(bits):
column = data[:, idx]
``````

And then for this columm of pixels, we figure out whether there are more `1` than `0` by taking the sum of the column, and seeing if that number is greater than half the size of the column:

``````column.sum()*2 >= column.size
``````

This will return `True` if the `1`s are more common than the `0`s. Since we need to then collect all the entries that match this value, we can do:

``````column == (column.sum()*2 >= column.size)
``````

This will return an array of `True` or `False` depending on whether the value in column is the same as the majority value or not.

Then, we can select just these entries out of our `data` matrix, and discard the rest, by using this array as the subscript when accessing the data matrix:

``````data[column == (column.sum()*2 >= column.size)]
``````

We have to apply this algorithm iteratively over the columns of the array, and stop when `data` gets to 1 row

``````for idx in range(bits):
column = data[:, idx]
data = data[column == (column.sum()*2 >= column.size)]
if len(data) == 1:
break
``````

However, we have to repeat this for the minority case as well. So the final code looks like:

``````a = b = data
for idx in range(bits):
acol = a[:, idx]
bcol = b[:, idx]
a = a[acol == (acol.sum()*2 >= acol.size)] if len(a) > 1 else a
b = b[bcol == (bcol.sum()*2 < bcol.size)] if len(b) > 1 else b
result = pow2.dot(a == 1) * pow2.dot(b == 1)
``````

## Full Code

``````import numpy as np

bits = int(len(lines))
data = lines.view('U1').astype(int).reshape(lines.shape, bits)
pow2 = 1 << np.arange(bits)[::-1]

ones = (data == 1).sum(axis=0)
zeros = (data == 0).sum(axis=0)
result = pow2.dot(ones > zeros) * pow2.dot(ones < zeros)
print("Part 1 result:", result)

a = b = data
for idx in range(bits):
acol = a[:, idx]
bcol = b[:, idx]
a = a[acol == (acol.sum()*2 >= acol.size)] if len(a) > 1 else a
b = b[bcol == (bcol.sum()*2 < bcol.size)] if len(b) > 1 else b
result = pow2.dot(a) * pow2.dot(b)
print("Part 2 result:", result)
``````