DEV Community

Dima Danilov
Dima Danilov

Posted on • Originally published at Medium on

Python garbage collector and the other magic

Once upon a time, in the company where I work there was a bug in production that caused to the Too many files open error.

I was able to locate the class responsible for the issue, it was a class working with os.pipefile descriptors, let’s call it OsPipeHolder. You may find the simplified version of the code below:

#Listing #1

#!/usr/bin/python
import os


class OsPipeHolder(object):
    def __init__( self ):
        read, write = os.pipe()
        self._read = os.fdopen( read, "r" )
        self._write = os.fdopen( write, "w" )
        self.isClosed = self.is_closed

    def is_closed(self):
        return self._read and self._write

    def close(self):
        self._read.close()
        self._write.close()
        self._read = None
        self._write = None

    def __del__(self):
        print "You've deleted me!!!"
        if not self.is_closed():
            self.close()


if __name__ == "__main__":
    pipe = OsPipeHolder()
    del pipe
Enter fullscreen mode Exit fullscreen mode

The problem was that the application made a retry in the event of
failure and created a new instance of OsPipeHolder class.

As you may see from the code above there is __del__ method that should be called by the garbage collector.

But… for no reason, it is never called and the file descriptors are left open:

$ ./OsPipeHolder.py
$

As the reader may know, Python’s garbage collector destroys objects not referenced from the stack(1 or fewer references).

Despite we create only one instance of the object in the line #30 and do not copy it elsewhere, I propose to verify the number of references with sys.getrefcount:

#Listing #2

import os
import sys

class OsPipeHolder(object):
...

if __name__ == "__main__":
    pipe = OsPipeHolder()
    #we need -1 since passing an object to getrefcount
    #creates an additional reference
    print "Refcount:", (sys.getrefcount(pipe) -1)               
    del pipe
Enter fullscreen mode Exit fullscreen mode

Run results #2:

$ ./OsPipeHolder_refcount.py
Refcount: 2

As you see, every time we create an instance of OsPipeHolder Python creates two references!
So, maybe there is an internal reference inside the object itself.

In order to check it I have decided to print the information on all the attributes of OsPipeHolder:

#Listing #3

#!/usr/bin/python
import os
import sys

class OsPipeHolder(object):
...

if __name__ == "__main__":
    pipe = OsPipeHolder()
    #we need -1 since passing an object to getrefcount
    #creates an additional reference
    print "Refcount:", (sys.getrefcount(pipe) -1)          
    for i, attribute in enumerate(dir(pipe)):
            msg = "%d. Attribute name: %s\tinfo: %s" % (i, attribute, (getattr(pipe, attribute)))
            print msg

    del(pipe)
Enter fullscreen mode Exit fullscreen mode

Run results #3:

$ ./OsPipeHolder.py
Refcount: 2
0. Attribute name: __class__ info: 
1. Attribute name: __del__ info: >
...
21. Attribute name: close info: >
22. Attribute name: isClosed    info: >
23. Attribute name: is_close    info: >

At first glance, everything looks good, but in the line #22 we see that isClosed is a reference to the method is_close.
This is the inner reference cycle we were looking for!

Let’s comment it out:

#Listing #4

#!/usr/bin/python
import os
import sys

class OsPipeHolder(object):
    def __init__( self ):
        read, write = os.pipe()
        self._read = os.fdopen( read, "r" )
        self._write = os.fdopen( write, "w" )
        #self.isClosed = self.is_closed
...
Enter fullscreen mode Exit fullscreen mode

Run results #4:

$ ./OsPipeHolder.py
Refcount: 1
You've deleted me!!!
Enter fullscreen mode Exit fullscreen mode

Yey, finally our method __del__ was called!

But we still have two issues:

  1. We cannot remove isClosed since it was added for backward compatibility purposes
  2. Python is supposed to handle reference cycles easily!

Let’s start with the first.

I found the solution in Python sources.
In order to create an alias to a function, you simply declare isClosed as a "class-level attribute"(line #16).

#Listing #5

#!/usr/bin/python
import os
import sys

class OsPipeHolder(object):
    def __init__( self ):
        read, write = os.pipe()
        self._read = os.fdopen(read, "r")
        self._write = os.fdopen(write, "w")

    def is_closed(self):
        return self._read and self._write

    isClosed = is_closed

    def close(self):
        self._read.close()
        self._write.close()
        self._read = None
        self._write = None

    def __del__(self):
        print "You've deleted me!!!"
        if not self.is_closed():
            self.close()

if __name__ == "__main__":
    pipe = OsPipeHolder()
    print "Refcount:", (sys.getrefcount(pipe) -1)
    del(pipe)
Enter fullscreen mode Exit fullscreen mode

Run results #5:

$ ./OsPipeHolder.py
Refcount: 1
You've deleted me!!!
$ ./OsPipeHolder.py
Refcount: 1
You've deleted me!!!

As you see, we have only one reference to the instance and the garbage collector calls our __del__ method!
But why isn’t it called in the original code?
Let's take a look at the memory layout of our object:

As you see we have one reference(Ref #1) to the object from the stack and another inner reference (Ref #2).
After the Ref #1 is deleted and we have no other references from the stack the garbage collector is supposed to call the __del__ method despite the inner reference.
Why does it not happen?

Python documentation is your best friend and has an answer for everything!

A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects).
By default, this list contains only objects with __del__() methods.
1 Objects that have __del__() methods and are part of a reference cycle
cause the entire reference cycle to be uncollectable,
including objects not necessarily in the cycle but reachable only from it.

Python docs: https://docs.python.org/2/library/gc.html#gc.garbage

As you see, the __del__ method itself was the root cause!

Summary

  • Don’t create aliases with self since it leads to a redundant reference(only alias = method_name and not self.alias = self.method_name)
  • Remember the Zen of Python saying: “Explicit is better than implicit.” (call close method explicitly or use the with statement)
  • Python is not C++ so do not implement RAII there
  • Think in the Pythonic way and read the docs!:)

P.S. Originally I posted it at Medium

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs