Discussion on: How does your language handle memory?

View post

I know Python a little better than I know Ruby and I'm not a GC expert so I'm going to keep it as easy to understand (for me too :D) as I can:

Python

Python has garbage collection as many other high level languages. Its GC uses a reference counting algorithm AND a generational algorithm.

Let's say variables are "mere" labels to some value in memory. Each variable pointing to the same value is a reference. When there are no variables pointing to that memory space the GC is free to collect the memory occupied by that value.

So, for example:

class Foo:
  pass

a = Foo()
b = a
c = b

In this case, in memory, there are three pointers to the istance of Foo(), you can verify that by using the id() function:

>>> class Foo: pass
...
>>> a = Foo()
>>> b = a
>>> c = b
>>> id(a), id(b), id(c)
(4468012592, 4468012592, 4468012592)

As you can see all three point to the same object, which means it has a reference count of 3.

If I were to delete all three references, for example by using del:

>>> del a; del b; del c

at that point in memory the GC can (optionally) decide to collect the space occupied by the instance of Foo() because the reference count of that object is 0.

Primitives types like numbers have only one instance of the value in memory, so they are not tracked by the GC.

As you can see variables that hold numbers with the same value point to the same object in memory:

>>> a = 1
>>> b = 1
>>> c = 1
>>> id(a), id(b), id(c)
(4464022176, 4464022176, 4464022176)
>>> import gc
>>> gc.is_tracked(a)
False

Objects instead are tracked by the GC:

>>> gc.is_tracked(Foo())
True

There's also a way to know how many references an object has:

>>> a = Foo()
>>> b = a
>>> c = a
>>> sys.getrefcount(a)
4

It says 4 because it's 3 + the one created to run .getrefcount()

Reference counting doesn't solve all the problems though, if you have circular references (two objects containing pointers to each other, sort of like a double linked list) RC won't help because it can't detect them.

So Python has an additional GC, the generational one (see my below explanation for Ruby) to fix the issue.

To recap:

Reference counting as an automatic GC
Generational GC as an optional GC

Resources:

Things you need to know about garbage collection in Python
gc module
Gargabe collection for Python - This page is at least 17 years old!!!

Ruby

Ruby used to have a simple mark & sweep algorithm, which means: cycle through all objects, mark all of them as living and sweep away those who are not living (those that are not reachable).

Since M&S can be quite slow and low performing (traversing all the allocated memory each cycle!!) they changed in Ruby 2.2 with a generational algorithm.

The difference is that generational GCs divide up the memory in spaces (generations) divided by age. The youngest objects are in one space, the older objects in another and so on.

This means that the GC can decide where to concentrate its resources, usually there are way more short lived objects than they are long (especially if you have a functional programming style ;-)) so that's where the GC starts.

It frees up the memory of the youngest generation before it moves into the space of the older ones.

So when the GC pauses to collect from the "young guns" it's faster than having to go through all of them each time.

Resources:

Incremental Garbage Collection in Ruby 2.2

GCs are way more complicated than my explanation but I hope it helps :-D