while scrolling through twitter i came across a tweet about how python boolean variable takes up 192 bits. this was new to me, i had never really thought about how much space python variables actually take(if this was a concern i should not have been using python at all).
so i researched it a bit and though it wasn't entirely correct, it wasn't entirely wrong either (explanation below). this prompted me into looking more into how bools are represented in python and how (possibly using underhanded means) can we reduce the space taken by booleans.
booleans in python
booleans are simple. just 2 values - TRUE or FALSE. theoretically they can be represented by just 1 bit (0 or 1).
so why does python use 192 bits (on 32 bit systems) and 224 bits (on 64 bit systems)?
the True and False values in cpython are not primitives, but a python object (a subclass of int actually, which is also a python object and not a primitive).
the catch however is that the bool class is a singleton class (there exists only a single object which is shared by all the references)
this means that the 28 byte size is actually just paid once. apart from this constant cost however, we use an extra 8 bytes used by pointer references to the bool objects.
how small can booleans get
now that we have established that storing a boolean inside a container in CPython costs ~8 bytes per value due to pointer references, let us see how low we can go.
the theoritical limit will obviously be 1 bit (0 or 1), so let's try to reach that benchmark
results
i have measured most of the sizes by taking average over a list of values as that is what gives the actual image of how large the data is after removing the constant overhead
| Representation | Storage cost per boolean (approx) |
|---|---|
Python bool (in list/container) |
8 bytes |
bytearray |
1 byte |
int bitset |
~0.13 bytes (~1.05 bits) |
bool datatype (baseline)
True(orFalse) singleton object is of size 28 bytes.
a pointer to bool objects take 8 bytes of space. so the average value of using bool in python is ~8 bytes.
bytearray
bytearray in python allows us to storing raw bytes. so the smallest unit in a bytearray takes up only 1 byte.
bitset (kind of)
python does not have bitset like c++, but we can use int as a replacement. python int range is not fixed, so we can have a large number of bits.
True and False can be represented by 1 and 0 in an integer like 0b110100101.
the amortized cost of per bit when using int comes out to be ~0.13 bytes = ~1.05 bit (pretty close to the theoretical minimum)

conclusion
python isnβt designed for memory-dense primitives, but with a little bit of abuse you can get surprisingly close to the theoretical 1-bit limit.
should you do this in real code? probably not. if something like this is a concern you shoudln't be using python at all :).
but if you still do, here's how you can take advantage of it.



Top comments (0)