Introduction
Programs run on data. Data lives in Memory. Memory needs management.
In simple words, Garbage Collection is an automated process which aids memory management by identifying and reclaiming memory occupied by objects that are no longer in use by a program, preventing memory leaks and ensuring efficient resource utilization.
A World Without GC
Not all programming language support GC. Without a Garbage Collector, the programmer is responsible for "freeing up" memory which is not being used anymore. This is known as Manual Memory Management.
If you are familiar with C/C++, you must have heard/read about malloc()
and free()
or new
and delete
. These are constructs which can be used by the programmer to manage the lifecycle of objects, from creation to deletion.
However, programmers are only human, and even with the best of intentions, manual memory management can often lead to serious issues. Two of the most common such issues are
- Memory Leaks : This happens when you create an object on the heap but forget to delete it. Eventually, the heap gets filled with unused objects and any new request for allocating memory on the heap throws a Memory Limit Exceeded error.
- Dangling Pointers : This happens when you delete an object prematurely even when some other pointers still reference it. Depending upon the programming language, this can manifest differently. Most severely, it can cause intermittent issues which may survive QA and pop up unexpectedly in Production.
That being said, while Manual Memory Management has its drawbacks, it may be desirable when you want to build applications with super critical memory/latency needs. In such use cases, using manual memory management, the developer can have maximum control over the lifecycle of objects, and can decide when to create/delete them so as to optimise memory/latency performance.
Popular GC Algorithms
Modern programming languages like Java and Python support Garbage Collection. Often, The Garbage Collector runs in separate thread(s) than the main program thread(s) (also called as mutator
threads), and every time it executes, it works towards collecting the garbage. While the goal of Garbage Collectors remains the same, they can be implemented in many different ways. Two popular Garbage Collection algorithms are
- Mark And Sweep (Java) : This is classic algorithm which is used by Garbage Collectors for reclaiming memory occupied by "unreachable" or "dead" objects in the heap. It works in 2 phases. In the "Mark" phase all "reachable" or "live" objects are marked by traversing the object reference graph starting from the known root references. In the "Sweep" phase the heap is traversed and all memory which is not marked is reclaimed for reuse.
- Reference Counting (Python) : This is an algorithm which keeps track of the reference count of each object to track garbage. Every time a reference to an object is created, the reference count for that object is incremented. Similarly, every time a reference to an object is deleted or replaced, the reference count for that object is decremented. When the reference count of an object becomes 0, it is considered eligible for garbage collection.
Both of these have their pros and cons. For example
Mark and Sweep might cause "Heap Fragmentation" where the used heap memory becomes divided into small non-contiguous blocks. This makes it difficult to find space which may be needed for a Large contiguous data structure (like an Array). A variation of the Mark and Sweep, known as Mark, Sweep and Compact deals with Fragmentation.
Reference Counting cannot collect circular data structures (like circular linked lists) because each node has a reference count of 1 even after the original reference to the "head" has been removed. This inability of Reference Counting to collect circularly referenced structures can cause Memory Leaks. Due to this, Python versions 3.4 and above also include a cyclic garbage collector to handle circular references and objects that are not properly cleaned up by reference counting alone.
Top comments (0)