Originally published at Pythongasm
Introduction
Behind Python’s simplicity lies a series of thoughtful decisions that makes it so user friendly.
In this article, we will look at some of these decisions, and understand how memory is managed in CPython.
is
operator
Let’s start with the is
operator.
We'll be using shell/IDLE for all these operations:
>>> a = [1,2,3]
>>> b = [1,2,3]
>>> a is b
>>> False
This is simply because a
and b
are pointing to two different locations in the memory.
We can verify this using id()
.
>>> id(a) == id(b)
>>> False
We can confirm that is
operator is doing nothing but just comparing the memory addresses of two objects if we look at its implementation:
int Py_Is(PyObject *x, PyObject *y)
{
return (x == y);
}
Source: CPython on GitHub
Use
is
operator instead of comparing memory addresses usingid()
, which is a costlier operation.
Now, if we do b = a
, then b
starts pointing to same memory as a
, hence a is b
is True
.
This is also true for all singletons in Python:
>>> a = None
>>> b = None
>>> a is b
>>> True
Singletons are objects that only exist once in the memory. If a variable is bind to a singleton object, it will point to this one and only memory.
However, integers are not singletons:
>>> a = 1234
>>> b = 1234
>>> a is b
>>> False
Great, let’s try that again with a different integer:
>>> a = 5
>>> b = 5
>>> a is b
>>> True
Oops, this looks weird. Actually, Python treats smaller integers a bit differently. When you start a Python environment, some integers are already allocated into memory. Hence, when you do a = 5
, Python smartly tags a
to an already allocated memory address. Same goes for b = 4
.
“smaller integers” here means integers in the range [-5, 256].
IDLE vs .py
file
We know that integers are immutable objects (can’t be changed at a memory level), so doesn’t it make sense to reuse all immutable objects?
They are not going to change anyway — so why not just allocate some space when it’s created for the first time, and whenever a new assignment is made with that object, simply point to that address.
That’s what Python often does. This is called interning.
Let’s rerun the same code in a .py
file:
a = 1234
b = 1234
k = a is b
>>> k
>>> True
This shouldn’t be surprising because when we compile a .py file, the whole code is parsed by Python (unlike shell or IDLE where each line is executed separately), and Python is able to intern values when required.
Interning responsibly
However, we shouldn’t count on this behaviour of interning. This doesn’t mean the memory management is weird and unpredictable.
The goal here is efficiency, regardless of how it is achieved — interning or not.
In some cases, the time spent for interning is a good trade-off but it might not be a great choice, when you have a big big string which probably isn’t going to be reused or compared. So it's better to save the time.
Since this is so subjective, and depends on the context, Python gives you the flexibility to choose if you want to intern a value manually, with it’s sys.intern()
API.
Conclusion
We saw some choices Python makes to make things efficient. These are implementation level details, and not a guarantee which your code should rely on; rather the takeaway here is understanding low level design.
For example, interning -5 is a choice, and so is interning all strings of __len__()
= 1. These might change over time.
There’s lot more to this topic, e.g. garbage collection, which will be covered in upcoming articles.
Top comments (0)