Darius Juodokas

Posted on Jun 9, 2022 • Edited on Jun 16, 2022

JVM. Garbage Collector

#java #jvm #gc #memory

Meet the JVM Memory Manager (a.k.a. Garbage Collector, a.k.a. GC)

In a native application, your code asks the OS to allocate memory for its work. In a (pseudo)interpreted application you ask the interpreter/engine/VM to give you some memory. If you recall, JVM decouples application code from the platform nuances, so that the developer would not have to worry about the HOWs. The developer should only be concerned about the application code and leave JVM tuning for the middleware specialists.

Now, since JVM has its approach when it comes to memory management (memory pools), it's created a great opportunity to tweak things, but also a great problem - how to manage things internally. JVM has this concept of Garbage Collector (GC). It's "a guy", who overlooks all the memory. Call it the JVM Memory Manager if it suits you better.

Like any other manager IRL, management is a role. You can have many managers, each with its style and approach. Despite their differences, they all have the same goal: to make things happen without a crash. JVM GC is also a role, and there are multiple GC implementations out there - each with its approaches to how they achieve its goals. But they all have the same goals and the same responsibilities.

GC responsibilities

When we are talking about GC, we are talking about GC-managed memory pools. Basically, there's one major pool, managed by GC - it's Heap. Different sources (even Ora docs) disagree about the location of PermGen/Metaspace (classloaders), nevertheless, this region is also partially managed by GC. In this series of posts, I'll be referring to Heap when talking about GC-managed memory, unless stated otherwise.

Memory allocation

All the memory allocations in Heap are proxied through the GC. Your code asks GC to allocate memory for an object, GC picks the best spot for it, allocates, and returns you a reference to that sweet spot. GC is very much familiar with the generational Heap layout (YoungGen/OldGen/Eden/Survivors) and it's entirely up to the GC to choose what, how, and when to do with all that memory in its possession.

Memory cleanup

JVM is a very tight ship, perhaps even understaffed. Every crew member has to do their part outstandingly to stay afloat AND to deliver the cargo undamaged. GC is in charge of memory management, and it's the GC's responsibility to make sure there is enough memory to use when needed. Preferably, GC's actions themselves should consume as few resources as possible and impact the ship's journey as little as possible. The goal is to always have available memory when required, keep the used memory undamaged, and not slow down the ship or the crew.

Now let's return from the analogy. GC is also in charge of memory cleanup. It keeps track of all the objects allocated in the memory and discards them as soon as these objects are no longer needed. However, operating on a references pool that's actively used in real-time is nothing short of heart surgery. To put it mildly, it's a complicated task.

Memory reservation/release (from the OS perspective)

JVM has this concept of "Committed Memory". Imagine memory as a sandbox. You want to build a castle in a hostile sandbox (other kids want to play too!).

You build a sand wall and reserve some space for your castle
You build your castle
You also want to build other buildings, but you're now too close to the wall. So you look around, find an unused spot in the sandbox, and move part of your wall in that direction. Then you rebuild the missing segments of that wall. Hooray! Now you have room for your stables!
You build other buildings
You accidentally tear down part of your castle. Rebuilding it with dry sand is a pain, so you decide - what the hell - let's make a smaller castle.
Now you have plenty of unused space in your territory. Other kids want to play too... So you shrink your territory by moving a part of your wall inwards, this way releasing plenty of space for other kids' toy trucks to move around.

All the kids are playing in the same sandbox. Edges of the sandbox represent system memory bounds. You can't build a castle outside the sandbox.
Your wall is GC's committed memory. It reserves a good share of the system memory and allows you to allocate objects in that region. If you have very few objects, there's no need to have a kingdom this large, so the GC shrinks the committed memory to more manageable and reasonable levels. If your kingdom is back to the golden age - the GC grows the Committed area back up and beyond.

You can tell the GC how to recognize if your application is in the gold age or dark times.

-XX:MinHeapFreeRatio (e.g. -XX:MinHeapFreeRatio=10) will tell the GC to commit more memory if there's only this much (%) left unused. In this example, as soon as 90% of reserved memory is consumed, GC is allowed to grow its committed region.
-XX:MaxHeapFreeRatio (e.g. -XX:MaxHeapFreeRatio=60) will tell the GC to uncommit memory if this much (%) is unused. In this example, if only 40% of reserved memory is actually used, GC is allowed to shrink its committed region, releasing memory to the OS.

Setting those flags doesn't mean the GC will necessarily do as told. Some GCs might ignore those flags, others - use them more as guidelines than rules (just like Hector Barbossa said).

While this additional complexity may seem unnecessary, it actually is an excellent optimization techique. In fact, some GCs usually ignore the flags suggesting when to shrink the Heap and don't shrink it at all. That's because of how applications allocate memory in RAM. Now, RAM is solely managed by the OS. Any application that wants a sliver of memory must ask the OS to be so kind and provide it. Asking the OS to do anything means the application has to issue a syscall. Syscalls are functions of the OS API, that applications can invoke to make the OS do something for them. The problem here is that:

issuing a syscall means crossing from userspace to kernelspace and returning some value back from kernelspace to userspace. Each syscall invocation runs through some checks and validations, which are time-consuming.
all the syscalls are synchronous/blocking, meaning the application has to wait for the syscall to return the value from the OS

Each memory allocation requires at least 1 syscall (malloc()/mmap()), and each time some memory block is no longer needed, it takes 1 syscall to release it (free()/munmap()). Knowing how many objects a JVM creates each second, all these operations could add up to visible delays. In order to prevent that, GCs tend to avoid those syscalls as much as possible: they allocate memory only once and hardly ever release it back to the OS. When an object is collected, instead of calling free(), GC flags that object's region as FREE in its own books with the intention to reuse this memory for other objects. This way the GC ensures that:

there's no unnecessary overhead in calling the OS API
the GC will have an available block of memory for the JVM when it needs one, and other applications leaking memory on the same server will not affect java's performance (i.e. if other processes consume all the available RAM, JVM will still be able to operate within the bounds of memory it has already reserved (committed))

This behaviour oftentimes causes headaches for OPS: the java process memory consumption grows very high and then platoes - it looks like the JVM might have a memory leak or, as most falsely believe, the JVM is incredibly memory-hungry / memory-inefficient. These are but common misconceptions based on a lack of knowledge on how JVM/GC works and why it does things the way it does. I hope I've managed to clear this part out - it's an optimization technique.

GC tuning

As of today, there are 7 GC implementations:

SerialGC
ParallelGC (default up through jre8)
CMS
G1 (defult for jre9 onwards)
EpsilonGC
ShenandoahGC
ZGC

All of them approach the problem very differently and use different algorithms to maintain JVM's memory. For this reason, it is very difficult to simply write down instructions on how to tune the GC properly. Each GC has its own knobs and levers, each GC is tuned differently.

There still are some things in common that could help you tune GCs.

All the generational GCs (not all GC implementations are generational) have a concept of MinorGC (YoungGen collections) and MajorGC (OldGen collections). It's the MajorGCs that are usually the problem, as they slow down or stop the JVM.
As soon as OldGen gets full, a MajorGC is triggered. As soon as either of the YoungGen's pools is full, a MinorGC is triggered.
You can configure JVM aiming for either of the modes (assuming '.....' is application runtime and '##' is GC pauses):
- Throughput mode: aims for better responsiveness of the application. The tradeoff usually is longer GC pauses. Graphically this looks like this:
```
....................#######.................######...............#######....
```
- Short-pause times mode: aims to have as shorter pauses as possible during GCs. The tradeoff is somewhat slower overall responsiveness (e.g. GC is cleaning garbage while the application does its work, slowing the application a little bit). This mode also requires more memory for bookkeeping. Graphically this looks like this:
```
.#....#.....##.....###...#......#.....#.......##.....######.......#.....#...
```
You can choose different GCs for YoungGen and OldGen (although not all the possible combinations are available).
Larger YoungGen causes fewer MinorGCs and fewer objects promoted to OldGen. However, the OldGen itself will be smaller and you're likely to see more MajorGCs unless the objects do not live long enough to be promoted to OldGen.
Too small survivors might cause large objects (e.g. large collections) to be promoted directly to the OldGen. If an application creates lots of large collections and discards them quickly, you might want to keep an eye on that survivors' region.

Challenges

Finding unused objects

An unused object (or a no longer needed object) is an object, which is no longer referred to. The problem is: referred from what? From another object? True. But what does that other project have to be referred to by? See where this is going?

Where is the starting node of the graph?

There are a few. And they are called GC Roots.

Classes loaded by the system class loader (not custom class loaders)
Live threads
Local variables and parameters of the currently executing methods
Local variables and parameters of JNI methods
Global JNI reference
Objects used as a monitor for synchronization
Objects held from garbage collection by JVM for its own purposes

GC traverses all the references starting with each GC root, finds and marks all the objects, that are still being used. This is called the marking phase of collection.

Once all the live objects are tagged, GC scans all the objects again and removes the ones that do not have the mark. This is the sweeping phase of collection.

Since the graph is in use by the application, it's very difficult to mark all the objects in an ever-changing graph. This is why GCs tend to make marking a Stop-The-World phase, during which the application is stopped and only GC threads are running. This is a classical GC behavior, which some GCs implement as-is, while others augment it in some way.

Fragmentation

Suppose you have a memory region. # represents contiguous used memory blocks, and $ represents memory blocks that can be collected.

|#$##$$##    $#$#$#$  #$$$# ##$#$$##      |

After collecting garbage, the same layout looks like this:

|# ##  ##     # # #   #   # ## #  ##      |

It's spotty. It's sparse. The memory became fragmented. Now JVM has to keep track of all the free regions and decide which new objects fit into which slots best.

Another problem with fragmentation is a premature OutOfMemoryError. Suppose, after the collection you want to allocate 2 contiguous blocks of 6x@ size (large arrays). You can place the first block at the end:

|# ##  ##     # # #   #   # ## #  ##@@@@@@|

But where does the second one go? There's plenty of free space in the memory, but there are no slots to fit the second 6x@ long block. This situation yields a premature OutOfMemoryError.

To deal with such situations, GCs should be able to compact the remaining memory blocks, preferably into a single contiguous block. Like this:

|###############                          |

Now you can easily allocate four 6x@ memory blocks without a fuss. This defragmentation in JVM terms is called compaction of the memory. Different GC implementations compact memory in different ways: some use a mark-and-copy algorithm, others - mark-and-sweep, and others.

Another source of fragmentation is Local Allocation Buffers (Promotion and Thread): PLABs and TLABs.

YoungGen (Eden) is fragmented by TLABs: Eden is divided into chunks of various sizes (e.g. 5xsize=1, 17xsize=2, 7x3, 83x4, etc... - the number of chunks of each size is estimated based on statistics), and each thread reserves several such memory chunks for future allocations. If all threads only allocate objects of size=4, they will eventually exhaust all the available chunks and will only be left with smaller ones. While larger ones can be split to fit a smaller object, the smaller chunks are not large enough to fit such objects and cannot be joined together. As a result, Eden still has plenty of free space (smaller chunks), but it is unable to allocate a contiguous memory block for another large object. MinorGC is invoked, and statistics are updated.

OldGen (and YoungGen Survivors) is fragmented by PLABs. With each MinorGC some objects will probably be promoted to the higher memory pool (Eden->survivor; survivor->OldGen). It's all great when the promotion is done with just 1 thread. However, when there are multiple threads, we want them to avoid locking each other. Just like TLABs avoid locking in Eden (each thread has its own isolated set of memory chunks in Eden), PLABs avoid locking in OldGen and Survivors. When an object is promoted, a thread moves it to a region in the higher memory pool, that is dedicated to that particular thread. This way threads do not compete for memory regions and promotion is very quick. However, as each thread has some memory regions preallocated (of different sizes), it's only natural that some regions will remain unused. That's fragmentation.

Generational collection

Generational garbage collections choose which memory regions to clean up first. It's been observed, that >90% of newly created objects are only required "here" and "now", and they are no longer needed soon after. This means that if there was a way to identify them as soon as possible, we could remove >90% of no longer needed memory blocks. And that's what generational collectors do.

Minor collection

New objects are allocated in the Eden - a part of the YoungGen. Once Eden fills up, a GC is invoked. During minor collection the GC performs several steps:

collect Survivor region
- clean up the active Survivor region from dead objects
- find all the objects in the active Survivor region that have survived N minor collections and move them to the OldGen (N can be set with -XX:MaxTenuringThreshold=N)
- move (hard-copy) all the remaining objects from the active Survivor region to the second Survivor region
- clean the active Survivor region
- deactivate the active Survivor region and activate the second Survivor region (swap them)
collect the Eden region
- identify objects that are no longer needed and remove them
- move all the remaining objects to the currently active Survivor region
  - if there is not enough room in the Survivor region, overflow to the OldGen (i.e. whatever doesn't fit in Survivor, move them directly to the OldGen)

MinorGC is very effective, as most of the garbage is collected right there. Whatever is not collected upon Eden collection, is likely to be no longer needed and collected before copying from one Survivor to another. TenuringThreshold gives surviving objects more time to become irrelevant while they are still in the YoungGen. If an object survives more collections than set with the MaxTenuringThreshold parameter, the object is considered as old and is promoted to the OldGen during the Minor GC.

Typically MinorGC is completely or mostly StopTheWorld collection, but the small amount of data makes those pauses short enough and mostly irrelevant.

Major collection

OldGen is a large portion of Heap and it's also collected by the GC. OldGen collections are called Major collections. Normally, it takes time for OldGen to fill up, as most of the garbage is collected with MinorGC. However, as the JVM runs, the following factors tend to eventually fill the OldGen up as well:

Each Minor Collection potentially moves some objects to the OldGen.
Some large objects, that do not fit in Eden, are allocated directly in the OldGen.
Live objects not collected during Eden collection are moved to the active Survivor region and, if the Survivor fills up, it overflows to the OldGen.
Garbage objects in Eden with their finalize() methods overridden, are not released during MinorGC - instead they are moved to the active Survivor region, which is likely to fill up and overflow to OldGen.

Typically, OldGen collections are partially or completely StopTheWorld collections. It's the amount of data that makes Major collection pauses lengthy.

Full collection

This is not an official term. However, it's actively used (and feared) in the industry. A Full collection is nothing but a Minor Collection followed by a Major Collection. Some GCs manage to merge both the phases and execute them concurrently, while others collect one region first and then collect another. Since the Full collection collects everything (YoungGen and OldGen), it's the longest collection. Usually, it's triggered when the Minor collection tries to promote survivors to the OldGen but there is not enough space, so it triggers the Major collection to free up some of the OldGen space.

What's causing the collections?

There's a good enough summary of GC causes in this Netflix GitHub page (since the time of writing they have migrated to Atlassian docs). Should this page ever disappear, here's a copy-paste of its contents:

The various GC causes aren't well documented. The list provided here comes from the gcCause.cpp file in the JDK and we include some information on what these mean for the application.

System.gc_[¶](https://netflix.github.io/atlas-docs/spectator/lang/java/ext/jvm-gc-causes/#systemgc_ "Permanent link")

Something called System.gc(). If you are seeing this once an hour it is likely related to the RMI GC interval. For more details see:

Unexplained System.gc() calls due to Remote Method Invocation (RMI) or explict garbage collections

sun.rmi.dgc.client.gcInterval

FullGCAlot¶

Most likely you'll never see this value. In debug builds of the JDK there is an option, -XX:+FullGCALot, that will trigger a full GC at a regular interval for testing purposes.

ScavengeAlot¶

Most likely you'll never see this value. In debug builds of the JDK there is an option, -XX:+ScavengeALot, that will trigger a minor GC at a regular interval for testing purposes.

Allocation_Profiler¶

Before java 8 you would see this if running with the -Xaprof setting. It would be triggered just before the JVM exits. The -Xaprof option was removed in java 8.

JvmtiEnv_ForceGarbageCollection¶

Something called the JVM tool interface function ForceGarbageCollection. Look at the -agentlib param to java to see what agents are configured.

GCLocker_Initiated_GC¶

The GC locker prevents GC from occurring when JNI code is in a critical region. If GC is needed while a thread is in a critical region, then it will allow them to complete, i.e. call the corresponding release function. Other threads will not be permitted to enter a critical region. Once all threads are out of critical regions a GC event will be triggered.

Heap_Inspection_Initiated_GC¶

GC was initiated by an inspection operation on the heap. For example, you can trigger this with jmap:

$ jmap -histo:live <pid>

Heap_Dump_Initiated_GC¶

GC was initiated before dumping the heap. For example, you can trigger this with jmap:

$ jmap -dump:live,format=b,file=heap.out <pid>

Another common example would be clicking the Heap Dump button on the Monitor tab in VisualVM.

WhiteBox_Initiated_Young_GC¶

Most likely you'll never see this value. Used for testing hotspot, it indicates something called sun.hotspot.WhiteBox.youngGC().

No_GC¶

Used for CMS to indicate concurrent phases.

Allocation_Failure¶

Usually this means that there is an allocation request that is bigger than the available space in the young generation and will typically be associated with a minor GC. For G1 this will likely be a major GC and it is more common to see G1_Evacuation_Pause for routine minor collections.

On Linux the JVM will trigger a GC if the kernel indicates there isn't much memory left via mem_notify.

Tenured_Generation_Full¶

Not used?

Permanent_Generation_Full¶

Triggered as a result of an allocation failure in PermGen. Pre Java 8.

Metadata_GC_Threshold¶

Triggered as a result of an allocation failure in Metaspace. Metaspace replaced PermGen and was added in java 8.

CMS_Generation_Full¶

Not used?

CMS_Initial_Mark¶

Initial mark phase of CMS, for more details see Phases of CMS. Unfortunately, it doesn't appear to be reported via the mbeans and we just get No_GC.

CMS_Final_Remark¶

Remark phase of CMS, for more details see Phases of CMS. Unfortunately, it doesn't appear to be reported via the mbeans and we just get No_GC.

CMS_Concurrent_Mark¶

Concurrent mark phase of CMS, for more details see Phases of CMS. Unfortunately, it doesn't appear to be reported via the mbeans and we just get No_GC.

Old_Generation_Expanded_On_Last_Scavenge¶

Not used?

Old_Generation_Too_Full_To_Scavenge¶

Not used?

Ergonomics¶

This indicates you are using the adaptive size policy, -XX:+UseAdaptiveSizePolicy and is on by default for recent versions, with the parallel collector (-XX:+UseParallelGC). For more details see The Why of GC Ergonomics.

G1_Evacuation_Pause

An evacuation pause is the most common young gen cause for G1 and indicates that it is copying live objects from one set of regions, young and sometimes young + old, to another set of regions. For more details see Understanding G1 GC Logs.

G1_Humongous_Allocation¶

A humongous allocation is one where the size is greater than 50% of the G1 region size. Before a humongous allocation, the JVM checks if it should do a routine evacuation pause without regard to the actual allocation size, but if triggered due to this check the cause will be listed as humongous allocation. This cause is also used for any collections used to free up enough space for the allocation.

Last_ditch_collection¶

For perm gen (java 7 or earlier) and metaspace (java 8+) the last-ditch collection will be triggered if an allocation fails and the memory pool cannot be expanded.

ILLEGAL_VALUE_-last_gc_cause-_ILLEGAL_VALUE¶

Included for completeness, but you should never see this value.

unknown_GCCause¶

Included for completeness, but you should never see this value.

References

Written with StackEdit.

Meet the JVM Memory Manager (a.k.a. Garbage Collector, a.k.a. GC)

GC responsibilities

Memory allocation

Memory cleanup

Memory reservation/release (from the OS perspective)

GC tuning

Challenges

Finding unused objects

Fragmentation

Generational collection

Minor collection

Major collection

Full collection

What's causing the collections?

System.gc_[¶](https://netflix.github.io/atlas-docs/spectator/lang/java/ext/jvm-gc-causes/#systemgc_ "Permanent link")

FullGCAlot¶

ScavengeAlot¶

Allocation_Profiler¶

JvmtiEnv_ForceGarbageCollection¶

GCLocker_Initiated_GC¶

Heap_Inspection_Initiated_GC¶

Heap_Dump_Initiated_GC¶

WhiteBox_Initiated_Young_GC¶

No_GC¶

Allocation_Failure¶

Tenured_Generation_Full¶

Permanent_Generation_Full¶

Metadata_GC_Threshold¶

CMS_Generation_Full¶

CMS_Initial_Mark¶

CMS_Final_Remark¶

CMS_Concurrent_Mark¶

Old_Generation_Expanded_On_Last_Scavenge¶

Old_Generation_Too_Full_To_Scavenge¶

Ergonomics¶

G1_Evacuation_Pause

G1_Humongous_Allocation¶

Last_ditch_collection¶

ILLEGAL_VALUE_-last_gc_cause-_ILLEGAL_VALUE¶

unknown_GCCause¶

References

Read next

What is the difference between forEach and map in streams?

Transactional Operations Across Multiple Services. A Method To The Madness.

Simple Factory

Strings: Checking for Palindromes