loading...
Cover image for A peek inside Python — The How What and Why of Bytecode

A peek inside Python — The How What and Why of Bytecode

lachlaneagling profile image Lachlan Eagling ・2 min read

In this post, we will dive head-first intro into what Python bytecode is, how to view it, and how to read and understand it.

[Cover Photo by Jeppe Hove Jensen on Unsplash]

What is Python bytecode?

Like many people who have worked with Python, you may only be used to seeing Python source code (.py files), but what happens to our Python source code for it to be able to executed by a CPU?

In a similar fashion to many other interpreted programming language's, Python first compiles its source code to an intermediate bytecode format which is in turn interpreted by the Python runtime and subsequently converted to native CPU instructions. The intermediate bytecode instructions are stored in pycache (.pyc) files which are then consumed by the Python runtime when executing a program.

How to view the generated bytecode.

The Python standard library provides the dis module which exposes an API for disassembling Python source code into bytecode instructions.

Official Docs: dis — Disassembler for Python bytecode.

We can utilise the dis(obj) function within this module to print out the disassembled bytecode of the object passed in as an argument.

Below is an example of a simple hello_world() function which has been disassembled using the dis() function.

Reading and understanding Python bytecode

The bytecode output is composed of the following properties.

  • The line number of the Python code that the current block of bytecode corresponds to.
  • The instructions index in the evaluation stack.
  • The opcode of the instruction.
  • The oparg, this is the argument for the opcode where applicable.
  • Where possible, the resolved oparg value.

Let's step through a simple, (perhaps somewhat contrived) example and outline what is being performed by each bytecode instruction

The following function simply takes two arguments, x & y, and returns the sum of the two provided arguments.

  1. The first two LOAD_FAST instructions push the x & y arguments provided to the add function onto the evaluation stack. The opargs provided to the LOAD_FAST instruction reference the index of the values to be loaded in the co_varnames array.

  2. The BINARY_ADD instruction then pops the two top items from the evaluation stack (x & y) and sums the two values. The result of the calculation is then pushed on to the top of the stack.

  3. RETURN_VALUE then returns the value from the top of the evaluation stack to the caller and exits the function.

Discussion

pic
Editor guide
Collapse
hindemostwoo profile image
Hindemost-Woo

You know what. A lot of things make sense after that. Cheers

Collapse
abdurrahmaanj profile image
Abdur-Rahmaan Janhangeer

Q: How to disassemble codes outside functions?

Collapse
lachlaneagling profile image
Lachlan Eagling Author

Excellent question, as far as I know, the only way is to save your source code in a .py file and disassemble the entire file in the terminal.

E.g. Run the following command in the directory where the source file is saved.

python3 -m dis your_code.py
Collapse
mpgovinda profile image
Govinda Malavipathirana

Does this bytecode physically generated? Where does this store?

Collapse
lachlaneagling profile image
Lachlan Eagling Author

Hi Govinda,

Apologies about the slow reply, have been very busy over the holiday period.

Yes, the bytecode is physically generated into files with the extension .pyc. These are then stored in the __pycache__ directory, these files are what the Python runtime actually executes when a program runs.

Collapse
mpgovinda profile image
Govinda Malavipathirana

Hi Lachlan,

Thanks for replying. As far as I know '.pyc' create only if we import module(s). So let say I'm not import anything but write everything inside the same module, So what would happen to the 'physical byte code' or '.pyc'. Does it even create, where I can find it?

Thread Thread
lachlaneagling profile image
Lachlan Eagling Author

Hi Govinda,

You are right, if no modules are imported I believe the byte code is generated on the fly and not actually saved to disk at all.

Collapse
rafaacioly profile image
Rafael Acioly

the second code example is the same of the first one 😅

Collapse
lachlaneagling profile image
Lachlan Eagling Author

Looks like something is going awry loading the gists from Github when the post loads. Refreshing the page the same one loads into both randomly sometimes 🤦‍♂️.

Will update the post and hardcode the examples rather than loading from embedded gists if this keeps happening.