Kanak Tanwar

Posted on Feb 13

reducing memory cost of boolean in python

#python #programming

while scrolling through twitter i came across a tweet about how python boolean variable takes up 192 bits. this was new to me, i had never really thought about how much space python variables actually take(if this was a concern i should not have been using python at all).

so i researched it a bit and though it wasn't entirely correct, it wasn't entirely wrong either (explanation below). this prompted me into looking more into how bools are represented in python and how (possibly using underhanded means) can we reduce the space taken by booleans.

booleans in python

booleans are simple. just 2 values - TRUE or FALSE. theoretically they can be represented by just 1 bit (0 or 1).

so why does python use 192 bits (on 32 bit systems) and 224 bits (on 64 bit systems)?

the True and False values in cpython are not primitives, but a python object (a subclass of int actually, which is also a python object and not a primitive).
the catch however is that the bool class is a singleton class (there exists only a single object which is shared by all the references)
this means that the 28 byte size is actually just paid once. apart from this constant cost however, we use an extra 8 bytes used by pointer references to the bool objects.

how small can booleans get

now that we have established that storing a boolean inside a container in CPython costs ~8 bytes per value due to pointer references, let us see how low we can go.

the theoritical limit will obviously be 1 bit (0 or 1), so let's try to reach that benchmark

results

i have measured most of the sizes by taking average over a list of values as that is what gives the actual image of how large the data is after removing the constant overhead

Representation	Storage cost per boolean (approx)
Python `bool` (in list/container)	8 bytes
`bytearray`	1 byte
`int` bitset	~0.13 bytes (~1.05 bits)

bool datatype (baseline)

True (or False) singleton object is of size 28 bytes.

a pointer to bool objects take 8 bytes of space. so the average value of using bool in python is ~8 bytes.

bytearray

bytearray in python allows us to storing raw bytes. so the smallest unit in a bytearray takes up only 1 byte.

bitset (kind of)

python does not have bitset like c++, but we can use int as a replacement. python int range is not fixed, so we can have a large number of bits.

True and False can be represented by 1 and 0 in an integer like 0b110100101.

the amortized cost of per bit when using int comes out to be ~0.13 bytes = ~1.05 bit (pretty close to the theoretical minimum)

conclusion

python isn’t designed for memory-dense primitives, but with a little bit of abuse you can get surprisingly close to the theoretical 1-bit limit.

should you do this in real code? probably not. if something like this is a concern you shoudln't be using python at all :).
but if you still do, here's how you can take advantage of it.

DEV Community