A while ago I’ve read a nice story how Instagram disabled garbage collection for their Python apps and memory usage dropped and performance improved by 15%. This seems counter-intuitive at first but uncovers amazing details about Python (namely, CPython) memory management.
Instagram disabling garbage collection FTW!
Instagram is a Python/Django app that is running on uWSGI.
To run a Python app uWSGI master process forks and launch apps in a child process. This should’ve been leveraging the Copy-on-Write (CoW) mechanism in Linux - memory is shared among the processes as long as it’s not modified. And shared memory is good because it doesn’t waste the RAM (because it’s shared) and it improves cache hit ratio because multiple processes read the same memory. Apps that are launched by uWSGI are mostly identical because it’s the same code and so there should be a lot of memory shared between uWSGI master and child processes. But, instead, shared memory was dropping at the start of the process.
At first, they thought that it was because of reference counting because every read of an object, including immutable ones like code objects, causes write to the memory for that reference counters. But disabling reference counting didn’t prove that, so they went for profiling!
With the help of perf, they found out that it was the garbage collector that caused most of the page faults - the collect
function.
So they decided to disable garbage collector because there is a reference counting that will still be used to free the memory. CPython provides a gc module that allows you to control garbage collection. Instagram guys found that it’s better to usegc.set_threshold(0)
instead of gc.disable()
because some library (like msgpack in their case) can reenable it back, but gc.set_threshold(0)
is setting the collection frequency to zero effectively disabling it and also it’s immune to any subsequent gc.enable()
calls.
This worked but the garbage collection was triggered at the exit of the child process and thrashed CPU for the whole minute which is useless because the process was about to be replaced by the new one. This can be dismissed in 2 ways:
- Adding
atexit.register(os._exit, 0)
. This tells that at the exit of your Python program just hard exit the process without further cleanup. - Use
--skip-atexit-teardown
option in the recent uWSGI.
With all these hacks the next things now happen:
- uWSGI master process launch a handful of children for the application
- GC is disabled when the child is starting up, so it’s not causing a lot of page faults, saving from CoW and allowing master and children to share much more memory and providing much higher CPU cache hit ratio
- When a child dies it does its own cleanup but skips final GC saving shutdown time and preventing the useless CPU thrashing
(Python) memory management
What I’ve discovered from this story is that CPython has an interesting scheme for automatic memory management – it uses reference counting to release the memory that is no longer used and tracing generational garbage collector to fight cyclic objects.
So this is how reference counting works. Each object in Python has reference counter (ob_refcnt
in the PyObject
struct) - a special variable that is incremented when the object is referenced (e.g. added to the list or passed to the function) and decremented when it’s released. When the ref counter value is decremented to zero it’s released by the runtime.
Reference counting is a very nice and simple method for automatic memory management. It’s deterministic and avoids any background processing which makes it more efficient on the low power systems such as mobile devices.
But, unfortunately, it has some really bad flaws. First, it adds overhead for storing the reference counter in every single object. Second, for multithreaded apps ref counting has to be atomic and thus must be_synchronized between CPU cores_ which is slow. And finally, the references can form cycles which prevent counters from decrementing and such cyclic objects remains allocated forever.
Anyway, CPython uses reference counting as the main method for memory management. As for the drawbacks is not that scary in most cases. Memory overhead for storing ref counters is not really noticeable - even for million objects, it would be only 8 MiB (ref counter is ssize_t
which is 8 bytes). Synchronization for ref counting is not applicable because CPython has Global Interpreter Lock (GIL).
The only problem left is fighting cycles. That’s why CPython periodically invokes tracing garbage collector. CPython’s GC is generational, i.e. it has 3 generations - 0, 1 and 2, where 0 is the youngest generation where all objects are born and 2 is the oldest generation where objects live until the process exits. Objects that are survived GC get moved to the next generation.
The idea of dividing the objects into generations is based on the heuristic that most of the objects that are allocated are short-lived and so GC should try to free these objects more frequently than longer-lived objects that are usually live forever.
All of these might seem complicated but I think it’s a good tradeoff for CPython to employ such scheme. Some might say - why not leave only GC like most of the languages do? Well, GC has its own drawbacks. First, it must run in the background which in CPython not really possible because of GIL, so GC is a stop-the-world process. And second, because GC happens in the background, the exact time frame for object releases is undetermined.
So I think for CPython it’s a good balance to use ref counting and GC to complement each other.
In the end, CPython is not the only language/runtime that is using reference counting. Objective-C, Swift has compile time automatic reference counting (ARC). Remember that ref counting is more deterministic, so it is a huge win for iOS devices.
Rust also uses reference counting
C++ has smart pointers which basically are objects with reference counters, which are destructed by C++ runtime.
Many others languages like Perl and PHP also uses reference counting for memory management.
But, yeah, most of the languages now are based on pure GC:
- Java/JVM
- C#/CLR
- Go
- Haskell/GHC
- Ruby/MRI
- Many others like Lisp
Conclusion
CPython has an interesting scheme for managing memory - objects lifetime are managed by reference counting and to fight cycles it employs tracing garbage collector.
Top comments (0)