DEV Community

jvmind
jvmind

Posted on

Debugging a C2 JIT Compiler Infinite Loop on AArch64

tl;dr: A production Java 8 service on AArch64 experienced 100% CPU on a single core caused by a C2 JIT compiler infinite loop. The root cause was a cycle in C2's Ideal Graph triggered by dead code elimination of MemBarCPUOrder nodes. A verified workaround: -XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType


Prologue

A mysterious CPU spike appeared on a production Java 8 service running on OpenJDK 8u442 on AArch64 processors.

The symptom: one core was pinned at 100%, the entire application became sluggish, and top showed C2 CompilerThread0 as the culprit.

This issue was not immediately reproducible on demand. It only surfaced after sustained mixed workload, making it particularly challenging to diagnose.


1. The Flame Graph Revelation

We started with a flame graph taken during the incident:

  • 80% of samples landed inside MemNode::can_see_stored_value
  • 21% were in MergeMemNode::memory_at

These two functions, deep in the OpenJDK C2 compiler, were burning cycles. This was definitely an infinite loop inside the C2 compiler.


2. The Suspect Loop

A careful reading of memnode.cpp (from OpenJDK 8u442 source) revealed a potential while loop that could spin without exit:

while (current->is_Proj()) {
    int opc = current->in(0)->Opcode();
    if (opc == Op_MemBarRelease || opc == Op_StoreFence || 
        opc == Op_MemBarAcquire || opc == Op_MemBarCPUOrder ...) {
        Node* mem = current->in(0)->in(TypeFunc::Memory);
        if (mem->is_MergeMem()) {
            MergeMemNode* merge = mem->as_MergeMem();
            Node* new_st = merge->memory_at(alias_idx);
            if (new_st == merge->base_memory()) {
                current = new_st;
                continue;  // ← INFINITE LOOP RISK
            }
            result = new_st;
        }
    }
    break;
}
Enter fullscreen mode Exit fullscreen mode

The continue inside the while combined with current = new_st is the critical pattern. If new_st equals current, the loop never terminates.


3. Why Only AArch64?

We tried to reproduce on x86 – nothing. On AArch64 – the hang appeared after sustained operation.

Memory Model:

  • x86 (TSO) is strongly ordered. C2 rarely inserts MemBarCPUOrder barriers.
  • AArch64 (weak memory model) requires explicit barriers for safe publication.

The bug only fires when:

  1. There is a loop that creates objects and writes to a volatile field.
  2. C2 performs dead code elimination on a path inside that loop.
  3. The cleanup folds a MergeMem node and makes its base_memory point back to the loop's own Proj node.

This creates a cycle in the data-flow graph:

Proj → MemBarCPUOrder → MergeMem → base_memory → Proj (again)


4. Root Cause Summary

Root cause: In OpenJDK 8u442 on AArch64, C2's dead-code elimination can create a data-flow cycle:

Proj → MemBarCPUOrder → MergeMem → base_memory → same Proj

Trigger: org.springframework.core.MethodParameter.getParameterType() — a common Spring method that combines volatile accesses and potential object creation in a hot path.

Why 8u442: The upstream fix was not backported into this update.


5. Production Workaround (Verified)

For teams that cannot rebuild the JDK:

-XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType

This keeps the method at C1/interpreted level and avoids triggering the C2 bug. Performance impact is negligible because getParameterType is not a hot path in most Spring applications.


6. Full GDB Command Sequence

For reference, here is the complete GDB workflow:

# 1. Find container PID on host
docker inspect <container> --format '{{.State.Pid}}'

# 2. Allow ptrace (if needed)
echo 0 > /proc/sys/kernel/yama/ptrace_scope

# 3. Capture core
gcore -o /data/coredump/hang <PID>

# 4. Analyze with GDB
gdb -q \
  -ex "set sysroot /proc/<PID>/root" \
  -ex "thread apply all bt 3" \
  -ex "quit" \
  /proc/<PID>/root/<path-to-jdk>/bin/java \
  /data/coredump/hang.<PID> > /tmp/bt.txt 2>&1
Enter fullscreen mode Exit fullscreen mode

Conclusion

Key Findings:

  1. Root cause: C2's Ideal Graph can form a cycle when dead code elimination removes nodes in the object allocation path.
  2. Architecture specificity: The bug only manifests on AArch64 because only weak memory models require the MemBarCPUOrder nodes.
  3. Trigger: org.springframework.core.MethodParameter.getParameterType().

Lessons Learned:

  • Flame graphs are the first line of defense.
  • Container debugging requires creative tooling — host gcore + sysroot works where in-container debugging fails.
  • When debugging intermittent JVM issues, capture core dumps early.

Final Recommendation:

If you run Spring Boot on AArch64 and see unexplained 100% CPU from C2 CompilerThread:

  1. Profile with async-profiler to confirm it's in can_see_stored_value
  2. Add the exclusion: -XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType
  3. Capture a core dump and verify the cycle using CLHSDB
  4. Report to your JDK vendor with the evidence

Acknowledgments

The method for extracting ciMethod information from core dumps was adapted from Vladimir Sitnikov's excellent 2018 article on analyzing stuck C2 compilations.


This investigation was conducted on OpenJDK 8u442 running on AArch64 processors.


Tags: java, jvm, debugging, performance, aarch64

Top comments (0)