DEV Community

Prashant Sharma
Prashant Sharma

Posted on • Updated on • Originally published at sharmapacific.in

Garbage Collection in Python

In languages like C or C++, the programmer is responsible for dynamic allocation and deallocation of memory on the heap. But in python programmer does not have to preallocate or deallocate memory.

Python uses following garbage collection algorithms for memory management -

  • Reference counting

  • Cycle-detecting algorithm (Circular References)

Reference counting

Reference counting is a simple procedure where referenced objects are deallocated when there is no reference to them in a program.
In short, When the reference count becomes 0, the object is deallocated(frees its allocated memory).

Let's have a look at below example -

def calculate_sum(num1, num2):
    total = num1 + num2
    print(total)
Enter fullscreen mode Exit fullscreen mode

In the above example, We have three local references num1, num2 and total. Here total is different from num1 and num2 because it only has reference inside the block thus its reference count is 1, num1 and num2 referenced outside the block so maybe their reference count more than one.

So, here when the function has finished execution the referenced count of total reduced to 0. Since it tracked by the garbage collector.

The garbage collector finds out that the total is no longer referenced(reference count field reaches 0) and frees its allocated memory.

The Variables, which are declared outside of functions, such variables do not get destroyed even after function has finished execution.

We can do the manual deletion also using the del statement. del statement removes a variable and its reference. When the reference count reaches 0, it will be collected by the garbage collector.

The reference counting algorithm has some issues also, such as circular references -

Circular References

A reference cycle occurs when one or more objects are referencing each other.

Alt Text

As you can see in the above image, the list object is pointing to itself, and object1 and object2 are pointing to each other. The reference count for such objects is always at least 1.

Let's go to the practical -

import gc

gc.set_debug(gc.DEBUG_SAVEALL) 

lst = []
lst.append(lst)

lst_address = id(lst)

del lst

object_1 = {}
object_2 = {}
object_1['obj2'] = object_2
object_2['obj1'] = object_1

obj_address = id(object_1)

del object_1, object_2
Enter fullscreen mode Exit fullscreen mode

In the above example, the del statement removes the variable and its reference to the objects.

Let's check-it deleted variables using gc.collect, gc.collect saves to gc.garbage instead of deleting.

>>> gc.collect()
3
Enter fullscreen mode Exit fullscreen mode

When we delete a variable, we only delete the __main__ reference. Now we don’t have access to lst, object_1 and object_2 at all, but these variables still have 1 reference, it means reference count is 1, the reference count algorithm will not collect it.

Check the reference count as below-

import sys
print(sys.getrefcount(obj_address))
print(sys.getrefcount(lst_address))

2
2
# 1 from the variable and 1 from getrefcount
Enter fullscreen mode Exit fullscreen mode

Multiply this number by 1 Million objects and you may have absolutely a serious memory leak issue.

For this kind of Reference cycle, Python has another algorithm specially dedicated to discovering and destroying circular references. It is also the only controllable part of Python’s GC

Summary

Python has 2 Garbage Collection algorithms. One for dealing with reference count, When the reference count reaches 0, it removes the object and frees its allocated memory. The other is the Cycle-detecting algorithm which discovers and destroys circular references.

I hope that you now have a fair understanding of the Garbage Collection algorithms in python.

If you have any suggestions on your mind, please let me know in the comments.

Reference

https://rushter.com/blog/python-garbage-collector/

Top comments (2)

Collapse
 
juancarlospaco profile image
Juan Carlos

If you are interested to play with GC, using Nim you can play with 6 GC,
choose and customize, including Rust-like one, Real-Time, Go lang one, No GC (manual).

For Python it works like Cython basically, it has a --gc: CLI param, that can choose the GC, also a GC_statistics() that print info,
you can turn on and off the GC on the fly from the code and more.
Some GC that books say are slow, are actually pretty fast in practice.
Good post, interesting.

Collapse
 
sharmapacific profile image
Prashant Sharma

@juan Thanks for sharing, Sure I will play with GC.