ArtyomKaltovich

Posted on Nov 29, 2020

How The Integer Type is Implemented in CPython

#algorithms #python

It would seem that you can't write an article about an ordinary integer type? However, everything is not so simple here, and the integer type is not so obvious.

If you're wondering why x * 2 is faster than x << 1.

And how to do the following trick:

>>> 42 == 4
True
>>> 42
4
>>> 1 + 41
4

Then you should read this article.

Integer Implementation

There are no primitive types in python - all variables are objects and are allocated in heap. At the same time, integers are represented by only one type (we do not consider decimal) - PyLongObject. It is implemented in the files longobject.h, longintrepr.h and longobject.c, you can see it on CPython project github page.

struct _longobject {
    PyObject_VAR_HEAD // standard header for variable length objects
    digit ob_digit [1]; // array of numbers
};

Any object in CPython includes two fields: ob_refcnt - a counter of references to an object and ob_type - a pointer to the type of an object; to objects that can change their length, the ob_size field is added - the current allocated size (actually used space can be less).

Thus, the integer type is represented by an array of variable length from individual digits, so python supports long arithmetic out of the box, in the second version of the language there was a separate type of "ordinary" integers. "Long" integers were created using the letter L or, if the result of operations between "ordinary" integers caused an overflow, in the third version it was decided to make every integer long by default.

# python2
num1 = 42
print num1, type (num1) # 42 <type 'int'>
num2 = 42L
print num2, type (num2) # 42 <type 'long'>
num3 = 2 ** 100
print num3, type (num3) # 1267650600228229401496703205376 <type 'long'>

The integer value stored by the _longobject can be represented by the next equation:

The digit is either 32-bit unsigned type (uint32_t) on 64-bit systems or a 16-bit unsigned short type on 32-bit systems.

The algorithms used in the implementation impose strict restrictions on SHIFT parameter from the previous formula, in particular, it must be divisible by 15, so for now CPython supports two values: 30 and 15, respectively for 64 and 32-bit systems. For negative values ob_size has a negative value too, zero is set by a number with ob_size = 0.

Arithmetic Operations

When performing arithmetic operations, the interpreter checks the current length of the operands, if they both consist of one digit, then standard arithmetic is performed.

static PyObject *
long_add (PyLongObject * a, PyLongObject * b)
{
    ...
    // both operands fit into one bit
    if (Py_ABS (Py_SIZE (a)) <= 1 && Py_ABS (Py_SIZE (b)) <= 1) {
    // sum them up and return the result
        return PyLong_FromLong (MEDIUM_VALUE (a) + MEDIUM_VALUE (b));
    }
    ...
};

Multiplication has a similar structure, in addition, the interpreter implements the Karatsuba algorithm and fast squaring, but they are not performed with every "long multiplication", but only for sufficiently large numbers, the number of digits in which are set by two constants:

// in case of unequal operands
#define KARATSUBA_CUTOFF 70
// in case of squaring
#define KARATSUBA_SQUARE_CUTOFF (2 * KARATSUBA_CUTOFF)

static PyObject *
long_mul (PyLongObject * a, PyLongObject * b)
{
    ...
    // both operands fit into one bit
    if (Py_ABS (Py_SIZE (a)) <= 1 && Py_ABS (Py_SIZE (b)) <= 1) {
    // stwodigits is a signed type that can hold two digits
    // int64_t on 64-bit systems and long on 32-bit systems.
        stwodigits v = (stwodigits) (MEDIUM_VALUE (a)) * MEDIUM_VALUE (b);
        return PyLong_FromLongLong ((long long) v);
    }
    ...

    // "School" long multiplication O(N ^ 2) if both operands are small enough.
    i = a == b ? KARATSUBA_SQUARE_CUTOFF: KARATSUBA_CUTOFF;
    if (asize <= i) {
        if (asize == 0)
            return (PyLongObject *) PyLong_FromLong (0);
        else
            return x_mul (a, b);
    }
    ...
};

Logical Operations and Comparisons

However, shift instructions do not have a check for short integers, they are always executed as long arithmetic. Therefore, multiplying by 2 turns out to be faster than shifting. It is interesting to note that division is still slower than right shift. Right shift also doesn't check for single-digit integers. But division is more complicated: there is only one function for both the quotient and remainder calculation. NULL is passed to the function, if you need to calculate only one, so, the function must check this case too, all this increases the overhead.

The comparison function also has no special case for "short" integers.

static int
long_compare(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t sign;

    // the case, of different size integers
    // just define the longest one with regards to the signs
    if (Py_SIZE(a) != Py_SIZE(b)) {
        sign = Py_SIZE(a) - Py_SIZE(b);
    }
    else {
        Py_ssize_t i = Py_ABS(Py_SIZE(a));
        // run cycle for integers of equal length
        // for "short" integers the number of iterations will be
        // less, no other optimizations are applied
        while (--i >= 0 && a->ob_digit[i] == b->ob_digit[i]);
        if (i < 0)
            sign = 0;
        else {
            sign = (sdigit)a->ob_digit[i] - (sdigit)b->ob_digit[i];
            if (Py_SIZE(a) < 0)
                sign = -sign;
        }
    }
    return sign < 0 ? -1 : sign > 0 ? 1 : 0;
}

Arrays (lists) of numbers

When creating a variable of an integer type, the interpreter must allocate a sufficient area in dynamic memory, then set the reference counter (type ssize_t), a pointer to the PyLongObject type, the current size of the bit array (also ssize_t) and initialize the array itself. For 64-bit systems, the minimum structure size is: 2 * ssize_t + pointer + digit = 2 * 8 + 8 + 4 = 28 bytes. Additional problems appear when creating lists of numbers: since numbers are not a primitive type, and lists in python store references to objects, they are not sequentially stored in heap.

This placement slows down iterating over the array: in fact, random access is performed, which makes it impossible to predict transitions.

In order to avoid unnecessary allocations of dynamic memory, which slows down the execution time and contributes to memory fragmentation, an optimization is implemented in CPython: at the startup stage, an array of small integers is preallocated: from -5 to 256.

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 257 // As expected in python, the right border is not included.
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif

As a result, a list of numbers in cpython is represented in memory like this:

It is possible to reach the preselected list of small integers from the script, armed with this article and the standard ctypes module:

Disclaimer: The following code is supplied as is, the author assumes no responsibility and cannot guarantee the state of the interpreter, as well as the mental health of you and your colleagues, after running this code.

import ctypes

# PyLongObject structure
class IntStruct (ctypes.Structure):
    # declaration of fields
    _fields_ = [("ob_refcnt", ctypes.c_long),
                ("ob_type", ctypes.c_void_p),
                ("ob_size", ctypes.c_long),
                ("ob_digit", ctypes.c_int)]

    def __repr __ (self):
        return (f"IntStruct (ob_digit = {self.ob_digit}, "
                f"ob_size = {self.ob_size}, "
                f"refcount = {self.ob_refcnt})")


if __name__ == '__main__':
    # get the address of number 42
    int42 = IntStruct.from_address (id (42))
    print (int42)

    int_minus_2 = IntStruct.from_address (id (-2))
    print (int_minus_2) # ob_digit = 2, ob_size = -1

    # change the value in the list of preallocated numbers
    int42.ob_digit = 4
    print (4 == 42) # True
    print (1 + 41) # 4

This list contains honest PyLongObjects, so, they have a reference counter, for example, you can find out how many zeros your script and the interpreter itself are using.

It follows from the internal implementation of integers that arithmetic in cpython cannot be fast, while other languages iterate sequentially over an array, read numbers into registers and directly call several processor instructions, cpython rushes around all memory, executing rather complex methods. In the simplest case of one-bit integers, in which it would be enough to call one instruction, the interpreter must compare the sizes, then create an object in dynamic memory, fill in all the service fields, and return a pointer to it, moreover, these actions require GIL locking. Now you understand why the numpy library was created and why it is so popular.

Boolean Type

I would like to finish the article about an integer type in cpython, suddenly, with information about a boolean type.

The fact is that for a long time in python there was no separate type for boolean variables, all logical functions returned 0 and 1. However, they decided to introduce a new type. At the same time, it was implemented as a child type to an integer.

PyTypeObject PyBool_Type = {
    PyVarObject_HEAD_INIT (& PyType_Type, 0) // the length of the boolean type cannot be changed
                                            // however, since it inherits from integer
                                            // there is already an ob_size field
    "bool", // type name
    sizeof (struct _longobject), // size in memory
    ...
    & PyLong_Type, // base type
    ...
    bool_new, // constructor
};

Moreover, each of the values of the boolean type is a singleton, the boolean variable is a pointer to an instance of True or False (None is implemented in a similar way).

struct _longobject _Py_FalseStruct = {
    PyVarObject_HEAD_INIT (& PyBool_Type, 0)
    {0}
};

struct _longobject _Py_TrueStruct = {
    PyVarObject_HEAD_INIT (& PyBool_Type, 1)
    { 1 }
};

static PyObject *
bool_new (PyTypeObject * type, PyObject * args, PyObject * kwds)
{
    PyObject * x = Py_False;
    long ok;

    if (! _PyArg_NoKeywords ("bool", kwds))
        return NULL;
    if (! PyArg_UnpackTuple (args, "bool", 0, 1, & x))
        return NULL;
    ok = PyObject_IsTrue (x);
    if (ok <0)
        return NULL;
    return PyBool_FromLong (ok);
}

PyObject * PyBool_FromLong (long ok)
{
    PyObject * result;

    if (ok)
        result = Py_True;
    else
        result = Py_False;
    Py_INCREF (result);
    return result;
}

PyObject_IsTrue (x) is a tricky mechanism for calculating a boolean value, you can see it in the documentation.

This inheritance leads to some funny effects, like the exact equality of True and 1, the impossibility of having both True and 1 as keys in the dictionary and the set, and the admissibility of arithmetic operations on a Boolean type:

>>> True == 1
True
>>> {True: "true", 1: "one"}
{True: 'one'}
>>> True * 2 + False / (5 * True) - (True << 3)
-6.0
>>> False < -1
False

The python language is great for its flexibility and readability, however, it must be borne in mind that all this has a price, for example, in the form of unnecessary abstractions and overhead, which we often do not think or guess about. I hope this article allowed you to dispel a little the "fog of war" over the source code of the interpreter, perhaps even encourage you to study it. The interpreter code is very easy to read, almost like the code in python itself, and studying it will allow you not only to learn how the interpreter is implemented, but also interesting algorithmic and system solutions, as well as, possibly, write more efficient code or become a developer of cpython itself.

DEV Community