Garbage collection - explain it like im 5 request (language agnostic)

mporam profile image Mike Oram ・1 min read

I have been looking for an article about Garbage collection for junior developers, to give a high level overview of what it is, why we need it, how it works. I have been unsuccessful in my search.

So I invite everyone to comment their best "explain-it-like-im-five" descriptions of garbage collection. Or anything more detailed for a junior dev would be great!


markdown guide

Imagine your computer memory is a collection of storage units.

Your operating system is the owner of these storage units.
Your computer program is a customer that will rent these units.

Your program will store all kinds of things there: buttons, images, text, etc...
Whenever your program needs more store space, it asks your operating system to provide it with a key to a free unit.
Your operating system will find a free unit and provide your program with a key to access it.

Your operating system cannot automatically take these keys from your program.
Your program must manage all these keys by itself, it must return them when it no longer needs it.
This is cumbersome for programmers because it means that we need to keep track all these keys (and there can be a lot of them).

The solution to the key problem is Garbage Collector (GC).
Garbage Collector is your program's janitor.
It regularly inspects all rented storage units to check if they are in use.
If Garbage Collector finds a unit that is not in use, it will automatically return the key to the owner (operating system).

This is happening automatically, without programmers having to even think about it.


It's all about understanding how memory works.

Just as in physics we have weight units (grams), so in computing we have size units (bytes). Every file you download or every character you print it takes specific amount of your computer storage. It's basics you probably already know.

Now there are two types of storage in your computer: primary and secondary. Primary storage is usually known as memory or RAM and can be accessed directly by the processor, so it makes it fast, but trade off is it's low capacity.
Secondary storage is your hard drive, it's where all your files and applications are stored. Difference is that it can not be accessed directly by processor and thus needs first be loaded into Primary storage, which makes it significantly slower to operate on, on the other hand capacity is so large, that you don't really need to worry about it.

Now when you have a code, you have variables which contain data, the data have size and thus consumes storage. As amount of variables grow, also grow consumption of your memory, and when that consumption reaches a limit, application will crash, because there's nowhere to store that data anymore.

That's where garbage collection comes in, in short, it gets rid of variables that are not used anymore and so frees up a storage space, which can be used for other variables. Usually it works differently on different languages, but basic idea is to destroy variable whenever it's possible, for example at the end of the function (one of the reasons why it's a good idea to split code into smaller functions).