loading...

TIL: Python Comprehensions Make Matrices Work

rsdesoto profile image Ry ・4 min read

Hello world!

First, a super brief introduction: my name is Ry, and I’m midway through a web development bootcamp in Chicago. I was a consultant for the last 5+ years; I wrote a ton of code, but primarily for data analysis/data manipulation in SAS and bending various Microsoft products to my will. Now I’m doing this bootcamp and also boning up on my Python in the hopes of doing some kind of software development/data engineer/????? something other than consulting. I’m still a neophyte (at best) at anything that isn’t SAS or VBA, but I’m working hard and learning a lot and getting better! I'm going to be using this as a blog for things I learned, cool stuff I ran into, projects I'm particularly excited about, etc.

As part of my efforts to git gud, I’ve been doing Advent of Code for the first time! Which is how I ended up finding out something really weird about how Python handles creating lists of lists.

The Problem

Day 3 of Advent of Code involves a bunch of people trying to divide up a sheet of cloth, but they all have different ideas of where to cut and how much to cut. To translate that into less cute and story-relevant terms, I solved this by creating a 2D matrix and finding the coordinates of all overlapping sections within the matrix from the coordinates provided.

Essentially, what I wanted to create was as follows:

[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

where each 0 represents a single square inch of fabric.

The Solution: Take 1

I cycled through all the coordinates provided in the input; at each square inch covered by a slice, I added +1 to the corresponding 0 in my matrix. By the end, every element in the list with a value of 2 or more was covered by overlapping slices.

This is all quite straightforward; once I had gotten through the spatial reasoning required to understand the question in the first place, writing the code to cycle through everything was relatively trivial. What took me ages, however, was creating the matrix in the first place.

My first thought was to make a blank list. Or:

sleeve = [0] * 5
shirt = [sleeve] * 5

which provides you with a list like so:

[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

To make sure my coordinates were working correctly (I don’t believe in regex if I don’t really literally have to), I typed in what I naively thought would work to change a single cell—

shirt[2][1] = 500

Which, logically, provides:

[[0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0]]

…Sure! Okay! Maybe there’s something funny as a result of multiplying sleeve by 5?

Let’s make this out of whole cloth, then —

The Solution: Take 2

shirt = [[0] * 5] * 5

[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

so now it’s all be done in one step. Nothing weird could possibly have happened around the definition of "sleeve", right?

[[0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0],
 [0, 500, 0, 0, 0]]

…okay, sure! Let's try something else --

The Solution: Third Time's The Charm?

sleeve = []
shirt = []

for j in range(5):
    for i in range(1):
        sleeve.append(0)
    shirt.append(arr)

[[ 0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0],
  [0, 0, 0, 0, 0]]

shirt[2][1] = 500

[[ 0, 500, 0, 0, 0],
  [0, 500, 0, 0, 0],
  [0, 500, 0, 0, 0],
  [0, 500, 0, 0, 0],
  [0, 500, 0, 0, 0]]

here we are again

At this point, I went to the internet and asked what the heck is going on. The answer was, apparently, “use NumPy”, but I am stubborn and determined to figure it out on my own (mistake!).

I — actually still don’t quite know what’s going on. I suspect it has to do with pointers? And somehow the list “sleeve”, when it’s multiplied into the “shirt” list, is just copying in the single sleeve array instead of making copies of sleeve; I haven’t really managed to find any explanation, though, so if you know a ton about python and pointers and how this all works, please (please, for the love of god, this is driving me up the walls) let me know the explanation!

(Is it a pointer? I got the flu during the one (1) CS course I took in college and missed pointer week and I assume all my suffering is a result of that)

Anyway — comprehensions were introduced to me as, “basically a for loop, with some minor differences, which you don’t have to worry about”. Turns out the minor differences you don’t have to worry about are making the damn thing work.

The Solution: Sure Why Not!!!

shirt = [[0 for i in range(1000)] for j in range(1000)] 

SEVERAL HOURS OF MY LIFE I WILL NEVER GET BACK.

w h y .

Discussion

pic
Editor guide
Collapse
rhymes profile image
rhymes

Hi Ry! I laughed reading this so it's a good start :-)

What you're experiencing is a combination of how Python syntax works and the fact that lists are mutable objects.

In the first example:

sleeve = [0] * 5
shirt = [sleeve] * 5

you're telling Python: "make sleeve a list of five numbers and then make shirt a list of five sleeves". So Python is going to copy the object pointed by the variable sleeve inside shirt five time, but it's the same object. You can easily see it like this:

>>> sleeve = [0] * 5
>>> shirt = [sleeve] * 5
>>> shirt
[[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]
>>> [print(id(l)) for l in shirt]
4384048520
4384048520
4384048520
4384048520
4384048520

id() is a function that returns the memory identity (address) of the object. As you can see, it's all the same object

The second example is basically the same, you're telling Python to multiply by five times the same list:

>>> shirt = [[0] * 5] * 5
>>> [print(id(l)) for l in shirt]
4387549256
4387549256
4387549256
4387549256
4387549256

In the third example you're actually building a new list for each iteration in the list comprehension, so it works:

>>> shirt = [[0 for i in range(5)] for j in range(5)]
>>> [print(id(l)) for l in shirt]
4387549896
4387550472
4387550344
4387549704
4387205896

:-)

Collapse
rsdesoto profile image
Ry Author

! This makes significantly more sense -- thank you!

Why does the first [0]*5 not run into the same issues? Is it because this is updating a single list?

Collapse
rhymes profile image
rhymes

Why does the first [0]*5 not run into the same issues? Is it because this is updating a single list?

That's because Python cheats a little bit. I don't know the exact list of optimizations but you're creating a list of integers so I think it reuses the same object in memory. An integer is an immutable object, it detects you're creating a list of 5 integers, so it just fills up a list of 5 zeroes. You can see what's going on:

>>> s
[0, 0, 0, 0, 0]
>>> [id(x) for x in s]
[4501366592, 4501366592, 4501366592, 4501366592, 4501366592]

they all are the same object in memory but since you can't change it, it's fine.

Thread Thread
rsdesoto profile image
Ry Author

Ooooohh, that makes sense -- thank you!!

Collapse
veky profile image
Veky

The answer is really simple: how many lists do you make? If you really want a 5x5 zero matrix, it's quite obvious you need 6 lists. If you make only 2 of them, some (many) of them are going to be the same.