This is the third of a three-part series which covers various aspects of Python's memory management. It started life as a conference talk I gave in 2021, titled 'Pointers? In My Python?' and the most recent recording of it can be found here.
Check out Part 1 and Part 2 of the series - or read on for an discussion of object lifetimes, reference counting, and garbage collection in CPython!
How CPython can tell when you're done with an object, and what happens next
We ended Part 2 by asking the questions: once we've created an object x
, how and why does its 'lifetime' end? In this article, we'll learn the answers by exploring how CPython frees objects from memory. CPython isn't the only implementation of Python - for example, there's Skulpt, which Anvil uses to run Python in the browser - but it's the one we'll focus on specifically for this article.
We ended Part 2 with an exploration of the weird and wonderful things that can happen when you override the __eq__
magic method in Python. Now, in Part 3, we're going to look at doing the same thing with a different magic method: __del__
.
The __del__
magic method
The __del__
magic method, also called the finaliser of an object, is a method that is called right before an object is about to be removed from memory. It doesn't actually do the work of removing the object from memory - we'll see how that happens later. Instead, this method is meant to be used to do any clean-up work that needs to happen before an object is removed - for example, closing any files that were opened by the object when it was created.
We're going to be using the following class as an example throughout this section:
class MyNamedClass:
def __init__(self, name):
self.name = name
def __del__(self):
print(f"Deleting {self.name}!")
This is just a class that'll let us know when one of its instances is about to be removed from memory - or, more specifically, when Python expects to immediately remove the class instance from memory (this won't always be true, as we'll see!).
In the above example, we've defined our class to take a name
input when initialised, and when the finaliser is called, it'll let us know by print
ing the name
of the instance in question. That way, we can get a bit of insight into which of these objects are being removed from memory, and when.
So, when will CPython decide to remove an object from memory? There are (as of CPython 3.10) two ways this happens: Reference Counting and Garbage Collection.
Reference counting in CPython
If we have a pointer to an object in Python, that's a reference to that object. For a given object a
, CPython keeps track of how many other things point at a
. If that counter reaches zero, it's safe to remove that object from memory, since nothing else is using it. Let's see an example:
>>> jane = MyNamedClass("Jane")
>>> del jane
Deleting Jane!
Here we create a new object (MyNamedClass("Jane")
) and create a pointer that points at it (jane =
). Then, when we del jane
, we remove that reference, and the MyNamedClass
instance now has a reference count of 0. So, CPython decides to remove it from memory - and, right before that happens, its __del__
method is called, which prints out the message we see above.
If we create multiple references to an object, we'll have to get rid of all of them in order for the object to be removed:
>>> bob = MyNamedClass("Bob")
>>> bob_two = bob # creating a new pointer to the same object
>>> del bob # this doesn't cause the object to be removed...
>>> del bob_two # ... but this does
Deleting Bob!
Of course, our instances of MyNamedClass
could themselves contain pointers - after all, they're arbitrary Python objects, and we can add whatever attributes we like to them. Let's see an example:
>>> jane = MyNamedClass("Jane")
>>> bob = MyNamedClass("Bob")
>>> jane.friend = bob # now the "Jane" object contains a pointer to the "Bob" object...
>>> bob.friend = jane # ... and vice versa
What we've done in the above code snippet is set up some cyclic references. The object whose name
is Jane
contains a pointer to the one whose name
is Bob
, and vice versa. Where this gets interesting is when we do the following:
>>> del jane
>>> del bob
We've now remove the pointers that go from the namespace to the objects. Now, we can't access those MyNamedClass
objects at all - but we didn't get the print
message telling us they're about to be deleted. This is because there are still references to these objects, contained within each other, and therefore their reference counts are not 0.
What we've created here is a cyclic isolate; a structure where each object has at least one reference within the cycle, keeping it alive, but none of the objects in the cycle can be accessed from the namespace.
Below is a visual representation of what's going on when we create a cyclic isolate.
To begin, we create our two objects, each of which also has a name in the namespace.
Next, we connect our two objects by adding a pointer from each to the other.
Finally, we remove the pointers from the namespace by removing both of the original names for our objects. At this point, the two objects are inaccessible from the namespace, but each contains a pointer to the other so their reference counts are not zero.
So, clearly, reference counting on its own isn't sufficient for keeping the working memory of your runtime free of useless, irretrievable objects. This is where CPython's Garbage Collector comes in!
Collecting garbage in CPython
CPython's Garbage Collector (or GC for short) is Python's built-in way to get around the problem of cyclic isolates that we just encountered. By default, it's always running in the background, and it'll work its magic every now and then so you don't have to worry about cyclic isolates clogging up your memory.
The garbage collector is designed to find and remove cyclic isolates from CPython's working memory. It does this in the following way:
- It detects cyclic isolates
- It calls the finalisers (the
__del__
methods) on each object in the cyclic isolate - It removes the pointers from each object (thus breaking the cycle) - only if the cycle is still isolated after step 2 (more on this later!)
After this process is complete, every object that was previously in the cycle will now have a reference count of 0, and therefore will be removed from memory.
Although it works automatically, we can actually import it as a module from the standard library. Let's do that, so we can take an explicit look at how it works!
>>> import gc
Detecting cyclic isolates
CPython's garbage collector keeps track of various objects that exist in memory - but not all of them. We can instantiate some objects and see whether the garbage collector cares about them:
>>> gc.is_tracked("a string")
False
>>> gc.is_tracked(["a", "list"])
True
If an object can contain pointers, that gives it the ability to form part of a cyclic isolate structure - and that's what the garbage detector exists to detect and dismantle. Such objects in Python are often called 'container objects'.
So, the garbage collector needs to know about any object that has the potential to exist as part of a cyclic isolate. Strings can't, so "a string"
isn't tracked by the garbage collector. Lists (as we've seen) are able to contain pointers, and therefore ['a', 'list']
is tracked.
Any instance of a user-defined class will also be tracked by the garbage collector, as we can always set arbitrary attributes (pointers) on them.
>>> jane = MyNamedClass("Jane")
>>> gc.is_tracked(jane)
True
So, the garbage collector knows about all the objects that could potentially form a cyclic isolate. How does it know if one has formed? Well, it also knows about all the pointers in each of those objects, and where they point. We can see this in action:
>>> my_list = [“a”, “list”]
>>> gc.get_referents(my_list)
[‘list’, ‘a’]
The get_referents
method (also called a traversal method) takes an object, and returns a list of the objects it contains pointers to (its referents). So, the list above contains pointers to each of its elements, which are both strings.
Let's take a look at the get_referents
method in the context of a cycle of objects (not yet a cyclic isolate, though, since these objects can still be accessed from the namespace):
>>> jane = MyNamedClass("Jane")
>>> bob = MyNamedClass("Bob")
>>> jane.friend = bob
>>> bob.friend = jane
>>> gc.get_referents(bob)
[{'name': 'bob', 'friend': <__main__.MyNamedClass object at 0x7ff29a095d60>}, <class '__main__.MyNamedClass'>]
In this cycle, we can see that the object pointed to by bob
contains pointers to the following: a dictionary of its attributes, containing bob
's name
(bob
) and its friend
(the MyNamedClass
instance also pointed at by jane
). The bob
object also has a pointer to the class object itself, since bob.__class__
will return that class object.
When the garbage collector runs, it checks whether every object it knows about (that is, anything that returns True
when you call gc.is_tracked
on it) is reachable from the namespace. It does this by following all the pointers from the namespace, and pointers within the objects that those point to, as so on, until it builds up an entire view of everything that's accessible from code.
If, after doing this, the GC finds that there exist objects which aren't reachable from the namespace, then it can clear those objects up.
Remember, any objects that are still in memory must have a non-zero reference count, or else they'd have been removed due to reference counting. For objects to be unreachable and yet still have a non-zero reference count, they have to be part of a cyclic isolate, which is why we care so much about the possibility of these occurring.
Let's return to our cycle of friend
s, jane
and bob
, and turn that cycle into a cyclic isolate by removing the pointers from the namespace:
>>> del jane
>>> del bob
Now, we've got ourselves into the exact situation that the garbage collector exists to fix. We can trigger manual garbage collection by calling gc.collect()
:
>>> gc.collect()
Deleting Bob!
Deleting Jane!
4
By default, the garbage collector will perform this action automatically every so often (as more and more objects are created and destroyed within the CPython runtime).
The output that we see in the code snippet above contains the print
statements from our MyNamedClass
's __del__
method, and at the end there's a number - in this case, 4. This number is output from the garbage collector itself, and it tells us how many objects were removed.
You might think that only 2 objects (our two MyNamedClass
instances) were removed, but each of them also pointed to a string object (their name
). Once those two MyNamedClass
instances are removed, the reference count for each of those name
strings also falls to zero, so they're removed too, bringing the total to 4 objects.
Finalisers behaving badly
Earlier, we mentioned that the garbage collector works in a 3-step process: detecting cyclic isolates, calling the finalisers on each object in the cycle, then breaking the cycle by removing the pointers between the objects... if the cycle still remains isolated at this point. Now, the only way that the cycle could go from being isolated to not-isolated between the first and third step is if the finalisers do something to make that happen.
Let's define a class that does just that:
class MyBadClass:
def __init__(self, name):
self.name = name
def __del__(self):
global person # create an externally accessible pointer...
person = self # ... and point it at the object about to be removed
print(f“deleting {self.name}!”)
In this class's finaliser, a global variable is created. That means that even if an instance of MyBadClass
becomes inaccessible from the namespace (as part of a cyclic isolate, for example), it can still 'reach out' into the namespace, create a pointer there, and point that pointer at itself - thus de-isolating itself.
>>> jane = MyBadClass("Jane")
>>> bob = MyBadClass("Bob")
>>> jane.friend = bob
>>> bob.friend = jane
>>> del jane
>>> del bob
To see this in action, we set up the cyclic isolate structure, as we've done before with other (more well-behaved) classes. Then, we trigger garbage collection:
>>> gc.collect()
Deleting Bob!
Deleting Jane!
0
We see the print statements from each instance's __del__
method, but after that, the garbage collector prints us out a 0. That means that no objects were removed from memory - and that's because, after the garbage collector caused the finalisers to be called, it checked to make sure that the cycle was still isolated.
If the cycle were still isolated, then the garbage collector could safely remove all the pointers linking up the objects, reducing their reference counter to 0. But, in this case, the cycle was no longer isolated, and so the garbage collector doesn't break the links between the objects in it.
So, if we got rid of the jane
and bob
pointers, how can the cycle still be accessed from the namespace? The answer is that global person
variable that was created in the finaliser. Let's take a look at it:
>>> person
<__main__.MyNamedClass object at 0x7ff29a095d60>
>>> person.name
'Jane'
>>> person.friend.name
'Bob'
We can see that the object pointed to by person
is the same one that had previously been pointed to by jane
. This make sense if you look at the above output from calling gc.collect()
; the print
statement that appeared last was the one for the Jane
object, and therefore that was the object that set person = self
most recently.
In other words, the two objects have had their original pointers jane
and bob
removed - but when their finalisers are called, a new external pointer from the namespace is created, meaning that the cycle is no longer isolated and shouldn't be removed by the GC.
Doing this sort of thing can create strange results, because it means you can access objects whose finalisers have already run -- and that probably means they've cleaned themselves up in a way that means you shouldn't be interacting with them again. For example, their finalisers may have closed a file that other methods on the object will assume to still be open. once again: overriding magic methods is serious business!
So, does MyBadClass
break garbage collection entirely? The answer is no, and that's because of a very important property of finalisers: they can only be called once per object. After bob
's __del__
method has been called once (when it was triggered by the call to gc.collect()
), it's done, and can never be executed again. That means we can do the following:
>>> del person
>>> gc.collect()
4
We don't see the "Deleting Jane"
and "Deleting Bob!"
messages, because those are printed by the objects' finalisers - and those have already been called once, and can't be called again. But, since the person
pointer has been removed, the cycle is isolated again; and, because that person
pointer won't be recreated by a finaliser, the garbage collector can safely go ahead and remove the pointers linking our two MyBadClass
instances.
Then the garbage collector continues its work, printing out a 4
to let us know that those two objects and their name
attributes have been removed from memory - and all is well in the world (of our CPython interpreter, at least!) again!
So what have we learned?
Let's recap! These three articles have been a whistle-stop tour of how Python handles objects in memory. We've looked at how pointers work, why pointer aliasing happens, and whether you'll need to use =
, copy
or deepcopy
. We've also seen what object IDs are that the is
comparator uses them, and how we can override the __eq__
magic method to define our own equality conditions to make ==
do whatever we want. Finally, we've covered object lifetimes, the __del__
magic method, and how CPython frees objects from memory when they're not needed any more using reference counting and garbage collection.
Now that you know all of these things, you can go forth and write better Python code!
See this article as a talk
Various recorded versions of this talk are available at the following links:
More about Anvil
If you're new here, welcome! Anvil is a platform for building full-stack web apps with nothing but Python. No need to wrestle with JS, HTML, CSS, Python, SQL and all their frameworks – just build it all in Python.
Top comments (0)