Kinanee Samson

Posted on Jul 7

Understanding Garbage Collection: How Modern Programming Languages Manage Memory

#programming #coding #go #javascript

You just had a brand new idea for an awesome project, you believe that the idea for this project is going to take the world by storm. Users and developers alike are not prepared for the next big thing in tech. So you open your favorite editor, im guessing VS Code, except of course you are already on the vibe train and you're using one of those over hyped VS Code AI forks (which is still VS Code by the way just with a lot of AI packed into it).

Let's assume you decide to use a mordern programming language like JavaScript or Go, you go about building your awesome new project you are going to abandon in the next 2 weeks when you get another awesome idea. You create variables, arrays and objects making sure to suck every available memory location from your CPU. Your program works or barely just, you don't know how or why it works but you do know that it works.

But let's pause for a moment and ask ourselves how come I am not using up all the available memory in my CPU despite creating so many variables and objects, somehow it seems that your app runs smoothly with little to no perfomance overhead, you ask yourself again how come the objects in my code are not using up all the available memory? Well this is because of the very awesome Garbage collector that sits at the heart of your favorite mordern programming language and this is going to be our focus for today's video.

We are going to look at garbage collection, what it is, how it works from top to bottom and why it is so useful and found in almost all mordern programming langauge. If you think that makes sense then absolutely smash the like button, ensure you subscribe to the channel to get access to all of our videos in the future as they drop. Make sure you stick around to the end cause I have a challenging question for you.

If you prefer watching video content, then you can watch the video on YouTube.

Garbage collection (GC) is a form of automatic memory management that involves automatically reclaiming memory in a computer program that is no longer being used. It frees developers from manually tracking and releasing memory, reducing the risk of memory leaks.

The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

Before the introduction of the garbage collector a developer was tasked with hunting down objects that have gone out of scope in a program and de-allocating that object from memory like a caveman. Garbage collection is not all sweet and no sour as it may take a significant proportion of a program's total processing time, and affect performance as a result.

We have already established that most mordern programming languages incoorperates a garbage collector either as part of the language specification (e.g., Java, C#, Go, and JavaScript) or effectively for practical implementation and these are said to be garbage-collected languages.

Languages, such as C and C++, were designed for use with manual memory management Some languages, like Ada, Modula-3, and C++ allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects.

Garbage Collection Strategies

We will talk about two major types of garbage collectors namely Tracing Garbage Collectors and Reference Garbage Collectors.

Tracing

Tracing garbage collection is the foundation of modern JavaScript garbage collectors. It's a strategy that identifies and reclaims memory by determining which objects are still reachable from known root references. This is the most common type of garbage collection – so much so that "garbage collection" often refers to the tracing method.

Tracing collectors are so called because they trace through the working set of memory. These garbage collectors perform collection in cycles. It is common for cycles to be triggered when there is not enough free memory for the memory manager to satisfy an allocation request. But cycles can often be requested by the mutator directly or run on a time schedule.

Reachability

When talking about Tracing Garbage Collectors we need to talk about Reachability. Reachability determines which objects stay in memory and which gets garbage collected. An object is reachable if it is referenced by at least one variable in the program, either directly or through references from other reachable objects. Objects can be reachable in only two ways:

-These are global variables and objects that are referenced from anywhere in the call stack (that is, all local variables and parameters in the functions currently being invoked).

Anything referenced from a reachable object is itself reachable; more formally, reachability is a transitive closure.

Algorithms Using Tracing

Mark and Sweep

Mark-and-sweep is the core algorithm behind JavaScript's garbage collection. It solves the critical flaw of reference counting by handling circular references naturally. In the naive mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared, except during the collection cycle.

Phase 1: Marking (Identifying Reachable Objects)

The first stage is the mark stage which does a tree traversal of the entire 'root set' and marks each object that is pointed to by a root as being 'in-use'. All objects that those objects point to, and so on, are marked as well, so that every object that is reachable via the root set is marked.

Phase 2: Sweeping (Reclaiming Memory)

In the second stage, the sweep stage, all memory is scanned from start to finish, examining all free or used blocks; those not marked as being 'in-use' are not reachable by any roots, and their memory is freed. For objects which were marked in-use, the in-use flag is cleared, preparing for the next cycle.

This method has several disadvantages, the most notable being that the entire system must be suspended during collection; no mutation of the working set can be allowed. This can cause programs to 'freeze' periodically (and generally unpredictably), making some real-time and time-critical applications impossible.

Tri-color marking

This is a specific technique used within the mark phase of mark-and-sweep. It categorizes objects into three colors. Because of the performance problems associated with a naive mark-and-sweep GC, most modern tracing garbage collectors implement some variant of the tri-color marking abstraction. Three sets are created – white, black and grey;

The white set, or condemned set is the set of objects are no longer being used and require clean up .
The grey set contains objects that have been discovered but their children haven't been fully scanned yet.
The black set is the set of Objects that have been visited and all their children have also been visited, so they are confirmed as live.

In many algorithms, initially the black set starts as empty, the grey set is the set of objects which are directly referenced from roots and the white set includes all other objects. Every object in memory is at all times in exactly one of the three sets.

The algorithm proceeds as following:

Pick an object o from the grey set
Move each white object that o references to the grey set. This ensures that neither this object nor any object it references can be garbage-collected.
Move o to the black set
Repeat the last three steps until the grey set is empty.

When the grey set is empty, the scan is complete; the black objects are reachable from the roots, while the white objects are not and can be garbage-collected.
Since all objects not immediately reachable from the roots are added to the white set, and objects can only move from white to grey and from grey to black, the algorithm preserves an important invariant – no black objects reference white objects. This ensures that the white objects can be freed once the grey set is empty. This is called the tri-color invariant.

The tri-color method has an important advantage – it can be performed "on-the-fly", without halting the system for significant periods of time. The system can perform garbage collection periodically, rather than as needed, and the need to touch the entire working set on each cycle is avoided.

Non Moving vs Moving Systems
The garbage collector may simply discard the unreachable objects and leave other parts of the system the way it is and this type of Garbage collector is known as a non-moving garbage collector.

In other systems the garbage collector might move some or all of the reachable objects into a new area of memory and update all references to the objects and this is known as a moving garbage collector.

Why the distinction? It seems on paper that the moving GC may appear to the doing too much when compared to the non-moving GC, however the moving algorithm leads to several performance advantages, both during the garbage collection cycle itself and during program execution, some of which are

The space that was freed up when the moving GC moved the reachable objects into a new memory is now considered free space without the system doing any extra work. In contrast a non-moving GC will need to ensure that it checks each memory occupied by each object and ensure that it is now free.
This gives the GC an ability to rapidly allocate new objects because the ALG naturally creates large swabs of free memory, all the system needs to do is to increase a free memory pointer to get access to more memory. A non-moving ALG will eventally lead to a fragmented heap and this will in turn require very expensive requesting of the list of free available blocks of memory in order to allocate new objects.
Objects can be moved very close to the objects they refer to in memory, increasing the chance that they will be located in the same cache line or virtual memory page. This can significantly speed up access to these objects through these references.

It is not all rosy with moving systems because their implementation does not give much room for pointer arithmetics and all references to objects are managed by the garbage collected environment. This is because any pointers to objects will be invalidated if the garbage collector moves those objects

Reference Counting

Reference counting is an alternative approach to automatic memory management where each object tracks how many references point to it. When references drop to zero, the object is immediately collected. The main advantage of the reference counting over tracing garbage collection is that objects are reclaimed as soon as they can no longer be referenced, and in an incremental fashion.

In real-time applications or systems with limited memory, this is important to maintain responsiveness. Reference counting performance does not deteriorate as the total amount of free space decreases. Reference counting has three main disadvantages over the tracing garbage collection;

The frequent updates it involves are a source of inefficiency. Reference counting requires every memory-managed object to reserve space for a reference count. In tracing garbage collectors, this information is stored implicitly in the references that refer to that object, saving space.
Reference counting can't handle reference cycles, an object which refers directly or indirectly to itself. A mechanism relying purely on reference counts will never consider cyclic chains of objects for deletion, since their reference count is guaranteed to stay nonzero. Methods for dealing with this issue exist but can also increase the overhead and complexity of reference counting.
In a concurrent setting, all updates of the reference counts and all pointer modifications must be atomic operations, which incurs an additional cost.

Implementation Strategies

Weighted reference counting

In weighted reference counting, each reference is assigned a weight, and each object tracks not the number of references referring to it, but the total weight of the references referring to it. A newly created object has a large weight say 1000 and when ever there's a new reference to that object is the weight of the original reference is split in two and one half points to the new reference.

Since the total weight does not change, the object's reference count does not need to be updated. Destroying a reference decrements the total weight by the weight of that reference. When the total weight becomes zero, all references have been destroyed. Weighted reference counting was independently devised by Bevan[12] and Watson & Watson[13] in 1987.

Deferred (Indirect) referrence counting

DRC is an optimization for reference counting (RC) that minimizes its core weakness: by delaying the update of reference counts for certain objects, particularly those stored on the stack.

Instead of updating counts for every reference change, it postpones these updates until a designated "reconciliation phase". This can significantly improve performance, especially in languages where stack references are common.

Some drawbacks of deferred reference counting are:

It Requires compiler support to identify and track references, especially those stored in registers and on the stack.
This strategy cannot reclaim objects with cyclical references (where objects refer to each other in a loop).
May require a stop-the-world reconciliation phase, which can pause execution

DEV Community