Introduction
Well, first I had started doing this on github.io, but then a group of virtual gangsters beat me up and made me realize that there was no point in using that having this fantastic website. Well, there it is, i use this, as the title says, this is a raw python bytecode tutorial, i hope you enjoy it (Because there is a second part...)
Pre requirements
- Basic knowledge of Python
- Know what a bytes object is
- Know the concept of stack
What is Python?
Python is a multiparadigm interpreted programming language, it supports polymorphism, object-oriented programming (OOP / OOP) and imperative programming.
How does it work?
Python, as already named, is an interpreted language, this means that it passes through an interpreter that connects what the computer is going to do, with what you write. Python does not generate machine code as a C or C ++ program would generate, but rather works more or less like Java, it has a virtual machine that interprets bytecode. This default interpreter is CPython, which is responsible for executing the bytecode on your computer. Here we are not going to use compilers, but rather we are going to handle language implementations, basically interpreters that interprets (forgive the redundancy) the written code after translating it into bytecode. There is a wide variety of these, e.g. IronPython (C # implementation), Jython (pure Java implementation), Micropython (C version optimized to run on microcontrollers).
Here is a schematic of how Python works and the steps that the interpreter takes to run the code that you wrote.
How to create USABLE bytecode
Well, we have two things, first, stripped bytecode, that is, bytes in hexadecimal representing opcodes and parameters, and secondly, we have CodeType
, a data type in Python that helps us to create ByteCode that SUITABLE AND USABLE. Also to assemble, you have to know how to disassemble, we are going to use the module dis, this module is used to disassemble functions, files and code.
import dis
def sum (x, y):
return x + y
dis.dis (sum)
The output of that snippet of code is as follows
1. 4 0 LOAD_FAST 0 (x)
2. 2 LOAD_FAST 1 (y)
3. 4 BINARY_ADD
4. 6 RETURN_VALUE
>>>
As we can see, all of that is bytecode, now the explanation.
As you may have noticed, I listed the lines in the output in order to make this explanation easier.
Each instruction in Python has a specific OPCODE (Operation Code), in this case we use 3, LOAD_FAST BINARY_ADD RETURN_VALUE
, we will explain what each one does.
- LOAD_FAST: Loads a variable to the top of the stack (Top Of Stack).
- BINARY_ADD: Add the two values ββat the top of the stack and return them to the top of the stack.
- RETURN_VALUE: Returns the value that is in TOS.
Well, now that we've explain the opcodes, we can get an idea of ββhow our code works internally, but there are still doubts, annoying but necessary doubts, like these, "What is the 4 on the left side, the 4 that is at the beginning of the first line?", "What are the numbers to the left of the OPCODES? "Why does a 0 appear to the right of LOAD_FAST?, And the 1?", "We wouldn't want to loadx
and y
to add them instead of 0 and 1?".
Well, I will answer in order.
- The 4 is the line where the disassembled bytecode begins.
- These numbers represent the offset of the bytes.
- The 0 and the 1 correspond to an index, since the variables of the code are stored in a list (array), the 0 and 1 represent the index, however, the module dis tells us which variable is to the right of this number (hence the
0 (x)
and 1(y)
). *
How do we re-create our function to make it bytecode?
Well, the first thing we do is import CodeType
andFunctionType
(To pass it to function) from the [types] module (https://docs.python.org/3/library/types.html#module-types)
import dis
from types import CodeType, FunctionType
def sum (x, y):
return x + y
After this, we are going to create our object code
python
import dis
from types import CodeType, FunctionType
def sum (x, y):
return x + y
# This will be explained later, these are flags
CO_OPTIMIZED = 0x0001
CO_NEWLOCALS = 0x0002
CO_NOFREE = 0x0002
my_code = CodeType (
2, #argcount
0, #kwonlyargcount
2, #nlocals
2, #stacksize
(CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE), #flags
bytes ([124, 0, 124, 1, 23, 0, 83, 0]), #codestring
(0,), #constants
(), # names of constants or global (names)
('x', 'y',), #variable names (varnames)
'blog_no_name', #filename
'crafted_sum', #name (code name / function)
9, #Firstlineno (First line where this code appears)
b'', #lnotab
(), #freevars
(), # freecellvars
)
_sum = FunctionType (my_code, {})
result = _sum (213,3)
print (result)
# Expected output
# 216
Well well ... Many new things appear, we will explain these arguments right now.
CodeType: argcount, kwonlyargcount, nlocals, stacksize, flags, codestring, constants, names, varnames, filename, name, firstlineno, lnotab, freevars, freecellvars
Argument | Description |
---|---|
argcount | Number of arguments |
kwonlyargcount | Number of keyword arguments |
nlocals | Number of local variables (In this case 2, x and y) |
stacksize | Maximum size in bytes that the stack will have (In this case 2 because x y requires two spaces in the stack frame) |
flags | The flags are what determine some conditions of the bytecode, you can be guided by this reference . We are going to delve into flags in a more advanced tutorial. |
codestring | This is a list (array) of bytes containing the sequence in question, in 124 it means LOAD_FAST, 23 BINARY_ADD and 83 RETURN_VALUE |
constants | A tuple with the value of the constants (such as integers, False, True, built-in functions ...) |
names | A tuple containing the name of the constants respectively |
varnames | Local variable name |
filename | This string represents the name of the file, when this value is not used it can be any string |
name | Name of the code object or function |
firstlineno | Represents the first line in which the code is executed, relevant if we import a file, otherwise it can be any integer |
lnotab | This is a mapping between the offsets of the bytecode object and the offset of the lines, if you are not interested in putting information on the lines, you can use b'' |
freevars | I will explain these variables in an advanced tutorial, it is used in closures |
cellvars | These variables are defined within a closure |
One last two things to note before moving on to FunctionType
, the first is that the 0s that follow the opcodes * eg [124, 0, ...] * are the argument, and the second is that each bytecode can vary from version to version, to know or orient yourself about the codestring, you can use the following snippet
def sum (x, y):
return x + y
sum.__ code __.co_code
# Expected output in Python 3.7.9 (The version I use)
# b '|\x00|\x01\x17\x00S\x00'
# The bytes are interpreted as characters, probably to make it more readable. (If we put chr (124) it will print the character |)
"Crafting" the function
We are going to use FunctionType now.
FunctionType: code, globals, name, argdefs, closure
Argument | Description |
---|---|
code | Object code (osea, CodeType) |
globals | A dictionary containing the globals as follows `{" Name ": ValueName}` that way, Name becomes an identifier, and then it is accessed as if it were a variable |
name (Optional) | Override the value of the object code) |
argdefs (Optional) | A tuple that specifies the value of the default arguments |
closure (Optional) | A tuple that supplies the ties for the freevars |
Well, once this is clear, now we would only have to add a FunctionType with our object code (my_code
) and call it.
import dis
from types import CodeType, FunctionType
def sum (x, y):
return x + y
After this, we are going to create our object code
import dis
from types import CodeType, FunctionType
def sum (x, y):
return x + y
# This I will explain later, they are flags
CO_OPTIMIZED = 0x0001
CO_NEWLOCALS = 0x0002
CO_NOFREE = 0x0002
my_code = CodeType (
2, #argcount
0, #kwonlyargcount
2, #nlocals
2, #stacksize
(CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE), #flags
bytes ([124, 0, 124, 1, 23, 0, 83, 0]), #codestring
(0,), #constants
(), # names of constants or global (names)
('x', 'y',), #variable names (varnames)
'blog_no_name', #filename
'crafted_sum', #name (code / function name)
9, #Firstlineno (First line where this code appears)
b '', #lnotab
(), #freevars
(), # freecellvars
)
_sum = FunctionType (my_code, {})
result = _sum (213,3)
print (result)
# Expected output
# 216
This is all for now, later I will upload another tutorial explaining the closures
Top comments (0)