Heads up! This blog post is a reading log I'm writing. I'm spending 30 minutes a day reading The Book of the Runtime.
I left the last reading log starting to cover some content on how the allocator component of the garbage collector in the CLR worked. Continuing from there, the book clarifies that when objects are allocated they are categorized into large or small objects depending on their size. The distinction is important because the size of an object determines how difficult it is to garbage collect.
As I recall, this type of categorization is pretty common across a few GC implementations, so the CLR is not unique here. Correct me, if I completely imagined things in my programming language design courses.
After this, the book introduced two new keywords: an allocation context and an allocation quantum. Oh boy! Did the word quantum just get dropped on us? I'm gonna be real with y'all. I got a little bit lost in this lingo.
- Allocation contexts are smaller regions of a given heap segment that are each dedicated for use by a given thread. On a single-processor (meaning 1 logical processor) machine, a single context is used, which is the generation 0 allocation context.
- The Allocation quantum is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.
OK, so let's break down the first one. Allocation contexts are a per-thread concept. For multi-threaded processes, each thread will have its own allocation context. On single processers, only one allocation context is used. This part seemed incomplete to me. It seems like the documentation is trying to draw our attention to a special point here but I might be missing it. Single-processor machines cannot have multiple threads running at the same time so there's no need to manage separate allocation contexts for memory used by each thread. This seemed obvious to me but maybe there is more to it?
In any case, the "allocation quantum" bit is where things get interesting. First of all, this phrase made me laugh out loud because it sounds rather silly if you say it a few times fast
is the size of memory that the allocator allocates each time it needs more memory,
I think I was trying to hard to understand the allocation quantum for more than what it was. But ultimately, I would rephrase the description as follows. Let me know if I'm missing something here.
The allocation quantum is a unit that represents the amount of memory that is allocated whenever the allocator requests more memory to allocate new objects. This unit is typically 8k bytes.
Speaking of 8k bytes, that's a great segaway into the next section of the book. It makes the point that the allocation quantum unit is small enough that it probably won't fit any large objects in it. As a result, large objects are allocated directly onto the GC heap and not into an allocation context.
I'm unconvinced as to the benefits of this but the book goes on to clarify that a lot of benefits of the allocation context really come in to play when dealing with small objects. I'll avoid retelling this list of benefits here since I actually found the list in the book to be rather easy-reading. Long story short: the architecture helps keep memory close and tidy which makes clean-ups easier.
The next portion of the book dives into the allocator's counterpart: the collector. The book lists the design goals of the collector, the first two of which are contradictory (that might be the wrong word, maybe mutually exclusive, no....that's not it, hmmmm....):
- The garbage collection process has to happen frequently such that there is not a lot of unused by allocated memory on the heap.
- The garbage collection process has to happen infrequently enough to not use too much CPU.
At odds! That's the word that I was looking for earlier. Wow...journaling this stuff out is a vocabulary challenge. So we need to find a balance where we are running a GC just enough. This is a pretty standard goal for GC implementations. The book lists another goal for the GC, if a GC cycle does happen it should remove as much memory as possible. No point taking up CPU cycles to remove a small bit of memory!
With these goals in mind, I've reached the end of today's 30 minutes. I'll be picking it up tomorrow with a dive into the logical and physical representations of the managed heap.