Memory allocation in games
In my previous article I touch this question, but it wasn't the primary topic. Today we discuss garbage collection (GC) in games and graphics/physics simulation software, where program speed most important.
Where best practices come from?
Almost everywhere you may hear or read that opinion, presented as fact : "Avoid lots of allocations, reuse objects you already allocated, practice objects pool".
Those arguments (I confidently suppose) come from the world of programming languages with no default GC (C, C++, Rust and others), where memory allocations are, by default, malloc calls, which might be slow.
Today we give attention to languages with GC, such as C#, Java and Python (missed your favorite language? Ask for it in comments!).
For those languages (C#/Java/Python), advice to avoid allocations and reusing memory in object pools will lead to performance losses and unnecessary code complexity. And why so, you will know after reading this article.
Objects pool vs GC
What objects pool is? It is an optimization pattern, supposed to save time on allocating/releasing memory for massively used objects. Applying this pattern in language with GC, you have lots of long-living objects in heap and almost no short-living objects, what gives several negative effects :
objects pool
- longer garbage collection for objects out of pool due to lots of old objects in heap
- heap fragmentation
- constantly moving objects in pool increase CPU cache misses
- no predictable allocation patterns, GC can't optimize
- overhead of adding/removing/searching objects in pool
- reallocating memory on extending objects pool
In practice, results I observed were exactly what I describe here - less performance, less FPS.
Now, what about to not use objects pool and let GC do its work? Picture completely opposite :
GC
- shorter garbage collection, because most of objects are young
- no heap fragmentation
- objects are allocated sequentially, perfect for CPU cache
- all young objects born and die together, forming predictable patterns for GC
- GC allocation generally is just increasing heap pointer, very fast
- very flexible, no necessary to control object numbers
Why it doesn't work in Unity, Godot, Unreal Engine?
My advises work in context of developing the whole system in single approach. Popular game engines, such as Unity, Godot, Unreal Engine, provide ready-to-use systems, optimized for common practices. Unity engine references all active instances of GameObject, Component and other UnityEngine.Object descendants, and destroying those objects is removing references on them from engine. Those objects expected to live long, respond to engine events and change internal state accordingly. So, unity engine reminds one large objects pool in sense of negative effects for performance.
Unreal Engine and Godot both implemented with C++, so they implement own ways to manage memory, optimizing it for most common use cases, so lots of allocations will definitely reduce performance there.
Real evidence
There is a cut from GC logs one of my game demos made with Clojure and Java.
These logs demonstrate performance on 25vs25 battle scene with 50 units in total.
[50,037s][info][gc] GC(62) Pause Young (Normal) (G1 Evacuation Pause) 447M->213M(505M) 7,961ms
[51,514s][info][gc] GC(63) Pause Young (Normal) (G1 Evacuation Pause) 448M->213M(505M) 4,246ms
[52,978s][info][gc] GC(64) Pause Young (Normal) (G1 Evacuation Pause) 448M->213M(505M) 3,502ms
[54,363s][info][gc] GC(65) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 3,223ms
[55,727s][info][gc] GC(66) Pause Young (Normal) (G1 Evacuation Pause) 448M->213M(505M) 2,372ms
[57,044s][info][gc] GC(67) Pause Young (Normal) (G1 Evacuation Pause) 448M->213M(505M) 2,780ms
[58,419s][info][gc] GC(68) Pause Young (Normal) (G1 Evacuation Pause) 448M->213M(505M) 2,282ms
[59,752s][info][gc] GC(69) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 3,023ms
[61,066s][info][gc] GC(70) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 3,137ms
[62,388s][info][gc] GC(71) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 2,975ms
[63,694s][info][gc] GC(72) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 2,451ms
[65,016s][info][gc] GC(73) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 4,378ms
[66,620s][info][gc] GC(74) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 3,651ms
[67,930s][info][gc] GC(75) Pause Young (Normal) (G1 Evacuation Pause) 449M->213M(505M) 2,560ms
As you may see, garbage collection is happening approximately once in 1 second, generally taking ~3ms and collecting ~300MB of garbage. It also means, that those ~300MB were allocated during that 1 second. What do we have?
- game is allocating GC memory in rate of
~300MB/sec - GC happening once each second
- taking
~0.3%of all time in total - collecting
~300MBof garbage each second - keeping peak memory consumption on
505MB(perfect for systems with limited RAM)
Someone would expect me using some 16 cores 5GHz CPU for achieving those results, but no, it's this one :
lulook@lulook:~$ lscpu | grep "Model name"
Model name:
Intel(R) Core(TM) m3-8100Y CPU @ 1.10GHz
Repeating my previous article
Most of programming advises work only in context, so don't believe them, always test applying to your use cases.
Limits everyone talk about may exist only because of poor system design or premature optimizations in one context, but step out that context and rules change.
Top comments (2)
An undeniable victory.
πππ