Understanding how Python's list comprehensions work under the hood

#python #programming #koan

List Comprehensions

List comprehensions are one of the most popular features in Python, and it allows us to write idiomatic (or Pythonic) code.

But, as the student discovered, although the end result may be the same in the above example, the process it took to arrive at the solution is very different.

In fact, in Python 3.12, the process used to implement list comprehension changed significantly.

Let us begin at the river.

Part 1: The Inner World of the For Loop

In Koan 4 we learnt that Python has 4 scopes (Local, Enclosing, Global and Builtin). However, there is also another “hidden” scope. Consider this loop:

The loop above reassigns the value of x in the enclosing scope. After the loop runs, x is no longer 10. Its value is now 2. The loop's variable x "leaked" into the surrounding code.

Now consider the same code implemented as a list comprehension:

This code prints 10. The x in the comprehension is a new x. It does not touch the outer x. The list comprehension creates its own hidden scope.

Part 2: Peering into the water

But how does Python ensure that list comprehension has a distinct scope? To observe what is happening, we must first gather some tools to help us peek inside the code.

When you run a Python script, the CPython interpreter performs two main steps:

Compilation: The source code (the .py file you wrote) is compiled into a stream of bytecode instructions. This bytecode is cached into a .pyc file (pyc stands for "Python Compiled").
Execution: The CPython virtual machine (a program written in C) reads and executes the bytecode instructions one by one. If the source file has changed, the bytecode is re-created, and if not, the cached .pyc file is run directly

Inspecting the bytecode can inform us about how the program is run by the interpreter. However, the bytecode is stored in an efficient binary format that is not human readable.

To help us, we can use two functions:

compile() to compile a python source code string into a code object (also known as bytecode), with the following arguments:

* _**filename**_ : the filename from which the source code was read. If we are using the REPL, then it can be any string.

* _**mode**_ : use “exec” for multi-line statements, “eval” for single line statements, and “single” for single line interactive statements.

dis.dis to disassemble the bytecode and print it in a readable format

The output of dis.dis typically consists of several columns:

Line Number: This is the line number from the original Python source code. Note that a single line of Python code can compile into many bytecode instructions.
Offset: This is the byte offset of the instruction within the bytecode stream. It's used by jump instructions to navigate the code.
Instruction: This is the name of the bytecode instruction (e.g., LOAD_CONST, CALL_FUNCTION, RETURN_VALUE).
Argument: This is a value or a reference used by the instruction. It might be an index into a table of constants, variables, or names.
Argument Mnemonic: This provides a human-readable name for the argument, making it easier to understand what the instruction is doing (e.g., (range), (3), (None)).

We now have all the tools we need to peek inside the list comprehension, and to investigate why Python < 3.12 and Python >= 3.12, behave so differently.

Until next time, may your stillness reveal the unseen flow.

Continue to Part 2

Thanks for reading Python Koans! If you enjoyed this post, feel free to share it.

Python Koans | Vivis Dev | Substack

Python lessons wrapped in koans. Small puzzles, deep truths. Not your usual tutorial thread. Click to read Python Koans, by Vivis Dev, a Substack publication with hundreds of subscribers.

pythonkoans.substack.com