DEV Community: jvmind

The Butterfly Effect of a Compiler: Hunting an aarch64-only JVM SIGSEGV Down to the Source Line

jvmind — Thu, 09 Jul 2026 02:35:34 +0000

A 22-kilobyte truncated ZIP file crashes an entire JVM — but only on aarch64, only with the Maven artifact, and only when built by gcc 4.9.4. A journey from a hs_err log down to the exact source line, through cross-compilation, byte-level reproduction, and the ghost of a ten-year-old compiler.

The Scene

It begins, as these stories often do, with a SIGSEGV that kills the entire Java process. A service running on an aarch64 machine calls into SevenZipJBinding — the popular JNI binding around the 7-Zip/p7zip C++ library — to open a ZIP archive. For a well-formed archive, everything is fine. But feed it a truncated ZIP — one with local file headers but no End Of Central Directory (EOCD) record — and the JVM vanishes in a puff of native code:

# SIGSEGV (0xb) at pc=0x...2038, pid=..., tid=...
# Problematic frame:
# C  [lib7-Zip-JBinding.so+0x102038]
#    Java_net_sf_sevenzipjbinding_SevenZip_nativeOpenArchive+0x9b4
#
# Core dump written. Default location: /work/core or core.5856

Three things make this case genuinely hard:

It only happens on aarch64. The x86_64 artifact is unaffected.
It only happens with the Maven-published artifact. Rebuilding the library from source — even with the same gcc version — does not reproduce it.
The crash address is stable (lib7-Zip-JBinding.so+0x102038), but the binary is stripped, so addr2line returns ??:0.

What follows is the story of how we closed all three gaps — identifying the toolchain, byte-for-byte reproducing the crash, and finally resolving the faulting instruction to a single line of C++.

Step 1 — Reading the Wreckage

The HotSpot fatal error log (hs_err_pid*.log) gives us the crash register state and a fragment of disassembly:

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0

Registers:
R0=0x0053005797794eac    R29=0x00000055039fdb30   ...

Instructions: (pc=0x...2038)
0x...2028: e2 03 15 aa a9 80 ff 97 f4 03 00 aa a0 01 66 9e
0x...2038: 16 04 40 f9 56 23 00 b4 15 08 40 f9 e0 03 15 aa

The crash is a null-deref (si_addr=0x0). Let's decode the faulting instruction. The bytes at 0x...2038 are 16 04 40 f9, which in little-endian is 0xf9400416 — an A64 LDR (immediate, unsigned offset):

ldr x22, [x0, #8]      ; load 8 bytes from address (x0 + 8)

The instruction just before it (a0 01 66 9e = 0x9e6601a0) is:

fmov x0, d13          ; move FP register d13 into GP register x0

So the faulting instruction reads [d13 + 8], and d13 holds an invalid value. This is unusual: a floating-point register is being used to hold a pointer. The compiler (gcc 4.9) has decided to spill a pointer into a callee-saved FP register (d8–d15) to relieve register pressure on the GP file.

Step 2 — Disassembling the Maven Artifact

We pull the .so out of the jar and disassemble around the crash site with a cross-architecture objdump:

$ unzip -p sevenzipjbinding-linux-arm64-16.02-2.01.jar Linux-arm64/lib7-Zip-JBinding.so \
  | objdump ... -d -C --start-address=0x101f00 --stop-address=0x102200

102014: bl   jni::JMethod::initMethodID
102018: ldr  x21, [x21, #24]
10201c: cbz  x21, 10268c
102020: mov  x0, x27               ; JNIEnv*
102024: mov  x1, x22               ; jclass
102028: mov  x2, x21               ; jmethodID
10202c: bl   JNIEnv_::NewObject    ; --- create Java object ---
102030: mov  x20, x0               ; save return value
102034: fmov x0, d13               ; x0 = d13 (this pointer)        ← ★
102038: ldr  x22, [x0, #8]         ; *** SEGV: read this->field@8 *** ← CRASH
10203c: cbz  x22, 1024a4           ; NULL check

The crash is right after a NewObject call. The code loads a value from d13, then dereferences it. This looks like the textbook pattern of "forgot to check for a pending JNI exception" — and indeed, running with -Xcheck:jni had already warned us:

WARNING in native method: JNI call made without checking exceptions
when required to from CallStaticObjectMethodV

But that tempting narrative turns out to be wrong. Notice the crash dereferences d13, not the NewObject return value (which was saved into x20 and never touched). So this is not a "used a NULL JNI return value" story. We need to know what d13 actually is.

Step 3 — Tracing `d13` Back to Its Birth

Disassembling the whole nativeOpenArchive function from its start (0x101684) and grepping for d13 reveals its entire life:

101694: stp  d12, d13, [sp, #128]     ; prologue: save callee-saved d13
...
1017b8: bl   operator new(0x18)        ; new JNIEnvInstance (24 bytes)   ← ★
1017bc: add  x2, x29, #0xd0
1017cc: fmov d13, x2                  ; *** d13 = address of the new object ***
...
101df0: fmov x0, d13                   ; used (still valid here)
101dfc: fmov x0, d13                   ; used (still valid here)
102034: fmov x0, d13                   ; *** used again, now corrupted *** ← CRASH

d13 is assigned exactly once, at 0x1017cc, immediately after a new(0x18). It holds the address of a freshly allocated object — specifically, a JNIEnvInstance, the wrapper the library uses to scope JNI calls within this function. After the assignment, d13 is only read, never written again.

Here's the contradiction. d13 is callee-saved (AAPCS64 reserves the low 64 bits of d8–d15 as callee-saved). By the ABI, every bl we call between 0x1017cc and 0x102034 — including initMethodID and NewObject — must preserve it. Yet by 0x102034 it is garbage. Something along that call chain corrupted a callee-saved register that held a this pointer.

Step 4 — The Toolchain Fingerprint

The artifact is stripped, so addr2line gives us nothing:

$ addr2line -e lib7-Zip-JBinding.so -f -C 0x102038
?? ??:0

We need to rebuild with debug info — but every rebuild we tried (gcc 11, gcc 15, even Linaro gcc 4.9.4) did not crash. The bug refuses to be reborn from source. So our first question became: what exactly built the Maven artifact?

The ELF .comment section answers. It's written by gcc and usually survives stripping:

$ readelf -p .comment lib7-Zip-JBinding.so
String dump of section '.comment':
  [     0]  GCC: (crosstool-NG ) 4.9.4

gcc 4.9.4, via crosstool-NG. gcc 4.9 was the first stable gcc to support aarch64, and its backend was notoriously immature — aggressive about spilling pointers into FP callee-saved registers being exactly the kind of thing it did. Later gcc rewrote the aarch64 backend; that's why every modern rebuild dodges the crash.

Step 5 — Why "Same gcc" Still Doesn't Reproduce

We tried rebuilding with Linaro's prebuilt gcc 4.9.4 aarch64 toolchain. Same gcc version, same -O3. It did not crash. Same compiler, different result — how?

Because crosstool-NG and Linaro ship different binutils, glibc, and patch sets. Code generation for UB is exquisitely sensitive to the exact binary layout, and a slightly different instruction stream shifts the register allocator's decisions. The crash lives in the intersection of "gcc 4.9.4's aarch64 backend" and "the specific crosstool-NG configuration Maven used." Reproducing it required reproducing the toolchain, not just the compiler.

Step 6 — Finding the Exact Toolchain

Where did crosstool-NG 4.9.4 come from? The project's README pointed to DockCross — cross-compilation toolchains shipped as Docker images. Digging into DockCross's git history, we found the smoking gun:

commit 37c54a3  Thu Apr 16 2020
  [linux-arm64] bump up the version of gcc to 8
  linux-arm64 is currently using gcc 4...

And the config just before that commit:

CT_CC_GCC_VERSION="4.9.4"
CT_CC_GCC_V_4_9_4=y

SevenZipJBinding 16.02-2.01 was released in January 2020 — three months before DockCross bumped arm64 to gcc 8. So the release was built with dockcross/linux-arm64 at gcc 4.9.4. The .comment string is an exact match. We had the toolchain.

Step 7 — Byte-for-Byte Reproduction

Old Docker Hub tags get garbage-collected, but the tag dockcross/linux-arm64:20200119-1c10fb2 was still pullable. Building SevenZipJBinding with it produced a .so whose .comment matched, and then — finally — the crash came back:

	Maven artifact	Our dockcross rebuild
Crashes on bad.zip?	Yes	Yes
Crash offset	`+0x102038`	`+0x102038`
Faulting instruction bytes	`9e6601a0 f9400416`	`9e6601a0 f9400416`
`.comment`	`crosstool-NG 4.9.4`	`crosstool-NG 4.9.4`

Byte-identical at the crash site. We now had a crashing binary whose source we controlled.

Step 8 — The Source Line

One more rebuild, this time adding -g3 -ggdb to the Release flags (keeping -O3 so the codegen didn't shift). The crash offset stayed at 0x102038, and addr2line finally spoke:

$ addr2line -e lib7-Zip-JBinding-dockcross-g.so -f -C -i -p 0x102038

JNIEnvInstance::exceptionCheck()
  at jbinding-cpp/JBindingTools.h:337
  (inlined by) JavaToCPPSevenZip.cpp:291

The source confirms the disassembly story. In JavaToCPPSevenZip.cpp:

// line 290 - create the Java InArchiveImpl object
jobject inArchiveImplObject = jni::InArchiveImpl::_newInstance(env);  // -> NewObject
// line 291 - check whether NewObject threw
if (jniEnvInstance.exceptionCheck()) {                                  // *** CRASH ***
    archive->Close();
    ...
}

And exceptionCheck() is inlined from JBindingTools.h:

bool exceptionCheck() {
    if (_jniNativeCallContext) {        // line 337 - reads this->_jniNativeCallContext
        return _jniNativeCallContext->exceptionCheck(_env);
    }
    return _jbindingSession.exceptionCheck(_env);
}

The member access this->_jniNativeCallContext compiles to ldr x, [this, #8] — and this is d13. Everything lines up. The full causal chain:

A truncated ZIP (no EOCD) is opened; the archive ends up in an inconsistent state.
_newInstance(env) calls NewObject to create the Java InArchiveImpl.
During that call, the JNIEnvInstance::this pointer — held in callee-saved FP register d13 — gets corrupted (the physical-hardware crash gives si_addr=0x24, so d13 became 0x1c).
The inlined exceptionCheck() reads this->_jniNativeCallContext via ldr x22, [d13, #8], dereferences the corrupted this, and faults.

gcc 4.9.4's decision to keep this in d13 across a JNI call is the load-bearing mistake. Every modern gcc uses a GP callee-saved register instead, which is why the bug only shows up in this one binary.

The lesson: "Build with the latest compiler" is not just about new features — for UB-laden legacy code, it can be the difference between a flawless release and a JVM that vanishes on one architecture only. A compiler bug from 2014, frozen into a published artifact, can lie dormant for years until someone hands it the wrong 22 KB of input.

The Fix

For end users, an immediate, 100% effective workaround is to validate the ZIP before handing it to SevenZipJBinding — reject anything without an EOCD record. For the project itself, the clean fix is to rebuild the aarch64 artifact with a modern gcc (≥5); we confirmed that gcc 8 (current DockCross), gcc 11, gcc 15, and even Linaro 4.9.4 all avoid the crash.

Appendix — The Tools That Got Us There

hs_err_pid*.log — HotSpot's register dump and the raw faulting bytes were the foundation.
readelf -p .comment — the toolchain fingerprint that named gcc 4.9.4.
DockCross git history (git log -S "4.9" -- linux-arm64) — pinned the exact image used at release time.
dockcross/linux-arm64:20200119-1c10fb2 — the time machine that reproduced the binary.
addr2line -g3 — turned bytes into a source line.

Reproduce It Yourself

All scripts used in this investigation — the dockcross reproducer, the addr2line harness, and an A/B verification that proves the fix (gcc 4.9.4 crashes, gcc 12+ does not) — are open source:

🔗 github.com/jvmind/jvm-deep-dives → case-01-sevenzipjbinding-aarch64-sigsegv/

The scripts are Docker-based and run end-to-end on a regular x86_64 machine (they use QEMU user-mode to run the arm64 JVM; see the repo's README for the cross-architecture setup guide). Clone it and:

cd case-01-sevenzipjbinding-aarch64-sigsegv
./verify_fix.sh    # builds both toolchains, runs the crash test on each

Written after a multi-day debugging session. All disassembly, register values, and reproduction steps are verbatim from the actual investigation.

Can AI Diagnose JVM Incidents? Correlating GC Logs, Thread Dumps, and Heap Dumps

jvmind — Mon, 06 Jul 2026 01:34:28 +0000

Traditional JVM analysis tools parse logs.

JVMind runs a reasoning loop.

Instead of treating GC logs, thread dumps, and heap dumps as independent artifacts, JVMind’s AI Agent orchestrates a multi-step investigation process using a ReAct-style tool-calling architecture.

This article explains how that works under the hood.

The Core Problem: JVM Diagnostics Are Multi-Dimensional

A typical JVM incident involves multiple dimensions:

Memory allocation behavior (GC log)
Object retention structure (Heap dump)
Thread state and execution pattern (jstack)
Application execution logic

Most tools parse one of these dimensions.

But real root causes live in the intersections.

To detect those patterns automatically, JVMind uses a structured reasoning pipeline.

High-Level Architecture

JVMind is composed of four layers:

Artifact Parsers (deterministic analyzers)
Structured Signal Extractors
ReAct-based AI Agent
Evidence Correlation Engine

Conceptually:

Step 1 — Deterministic Parsing Layer

Before any AI reasoning begins, JVMind parses each artifact into structured data.

GC Log Analysis Interface

The GC report interface provides:

Allocation rate
GC frequency
Full GC ratio
Pause distribution
Throughput
Heap occupancy trend
Reclamation efficiency

The output is converted into structured signals like:

high_full_gc_ratio
low_reclaim_efficiency
allocation_rate_exceeds_heap_capacity

Thread Dump (jstack) Analysis Interface

The jstack analyzer extracts:

Thread state distribution
Deadlocks
Lock chains
BLOCKED hotspots
RUNNABLE-heavy CPU scenarios
Executor patterns
Stack frame summaries

This produces signals such as:

executor_waiting_pattern
high_runnable_ratio
no_deadlock_detected

Heap Dump Analysis Interface

The heap analyzer extracts:

Object count by class
Retained size
Dominator tree
Top memory consumers
Leak suspects
Ownership chains

From this, structured signals are generated:

heap_domination_by_single_class
large_number_of_virtual_threads
dominant_executor_related_objects

Step 2 — Signal Structuring

Instead of letting the AI read raw logs, JVMind feeds it structured signals.

Each signal includes:

Metric value
Severity
Context
Supporting evidence
Confidence level

This prevents hallucination.

The AI agent reasons only on verified extracted facts.

Step 3 — ReAct-Based AI Agent

JVMind uses a ReAct-style reasoning loop.

ReAct = Reason + Act.

Instead of generating one large answer, the agent:

Observes signals
Forms a hypothesis
Calls analyzer tools
Gathers more evidence
Refines reasoning
Produces structured diagnosis

Agent Execution Interface

The Agent UI shows:

Observations
Tool calls
Intermediate reasoning
Evidence collection
Final structured diagnosis

This transparency is critical.

The diagnosis is not a black box.

It shows how conclusions are reached.

Example: Virtual Thread Explosion

Let’s walk through the correlation process.

Phase 1 — GC Observation

Signals:

266 GCs in 23 seconds
107 Full GCs
Throughput 15.7%
Full GC reclaims zero bytes
Allocation rate 157 MB/s
Heap size 128 MB

Agent hypothesis:

Severe allocation pressure with low reclamation efficiency.

Phase 2 — Heap Correlation

Signals:

191,782 VirtualThread instances
74% heap retained
Dominator tree confirms VT ownership

Agent refines hypothesis:

Massive virtual thread accumulation dominating heap.

Phase 3 — Thread Behavior Analysis

Signals:

No deadlocks
Executor waiting in awaitTermination()
51/56 threads RUNNABLE

Agent refines hypothesis:

Submission rate likely exceeds completion rate.

Phase 4 — Cross-Dimension Validation

The agent cross-validates:

Allocation rate vs heap size
Virtual thread count vs retained size
Executor state vs object retention

Final conclusion:

Unbounded virtual thread creation under undersized heap caused GC death spiral and OOM.

Structured Root Cause Output

Instead of free text, JVMind outputs:

Executive summary
Cross-artifact evidence
Root cause chain
Confidence level
Supporting metrics
Remediation recommendations

This makes the output production-ready.

Why ReAct Is Critical

Without tool-calling:

The model might blame GC
Or misinterpret RUNNABLE threads
Or miss heap dominance patterns

The ReAct loop enforces:

Evidence collection
Hypothesis validation
Cross-checking signals
Iterative refinement

It mimics how senior JVM engineers reason.

Hallucination Control

JVMind prevents hallucination by:

Deterministic parsing
Structured signals
Evidence linking
No free-form log interpretation
Confidence scoring

Every conclusion is tied to specific metrics and objects.

Why This Approach Matters

Manual JVM troubleshooting requires:

Context switching between tools
Deep JVM internals knowledge
Hours of cross-referencing

JVMind compresses that process into:

Parsing
Signal extraction
ReAct reasoning
Evidence-backed diagnosis

It does not replace engineers.

It accelerates them.

Final Thought

JVM incidents are interaction failures.

GC behavior.

Thread lifecycle.

Heap structure.

Application logic.

JVMind’s ReAct-based AI Agent reasons across these dimensions instead of summarizing them individually.

That is the difference between reading logs and diagnosing systems.

Try JVMind

Upload:

GC logs
Thread dumps
Heap dumps

And let JVMind correlate them automatically.

Demo available.

No signup required.

https://jvmind.io

Virtual Thread OOM – A Case Study in Missing Backpressure

jvmind — Fri, 03 Jul 2026 07:02:54 +0000

TL;DR: A missing Semaphore in a virtual thread stress test led to 1,076 MB/s allocation rate, 324 Full GCs releasing 0 bytes, and 150k new virtual threads in 1 second. Virtual threads are powerful, but submit() is non-blocking – you must manage the creation rate.

The Incident

A stress test using JDK 26 virtual threads (Loom) ran for ~134 seconds before OOM. The application was using Executors.newVirtualThreadPerTaskExecutor() to submit short-lived tasks.

GC Report – What the Logs Showed

Metric	Value	Severity
Heap Size	2 GB (G1)	–
Allocation Rate	1,076 MB/s	2GB heap filled in <2 seconds
Total GC Events	607	~4.5 GCs per second
Full GC Count	324 (53%)	More Full GCs than Young GCs
Full GC Total Pause	118,114 ms	88% of runtime spent in Full GC
Application Throughput	~10%	JVM barely ran the app
Top Full GC Releases	`2047MB → 2047MB`	0 bytes released

The "0 bytes released" pattern across 324 Full GCs is the critical signal: everything in the heap was alive.

Thread Dump Analysis

Two jstack dumps taken 1 second apart:

jstack-1: Virtual thread #3342482
jstack-2: Virtual thread #3492195
+150,000 virtual thread IDs in 1 second

Carrier thread stack:

ForkJoinPool-1-worker-1 (daemon)
  → Carrying virtual thread #3342482 / #3492195
     at ConcurrentHashMap.sumCount / isEmpty
     at ThreadPerTaskExecutor.tryTerminate / taskComplete
     at VirtualThread.run

Thread state distribution: 64 total platform threads, 49 carrying virtual threads, 0 BLOCKED, 0 deadlocks – consistent with a virtual thread explosion.

The Offending Code

private void submitTasks(int threadCount, int workMs) {
    try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
        while (running) {  // ← no sleep, no rate limiting
            for (int i = 0; i < threadCount; i++) {  // ← 10,000 per round
                executor.submit(() -> {  // ← non-blocking, returns instantly
                    Thread.sleep(workMs);  // ← each thread lives ≥10ms
                    // do work
                });
            }
        }
    }
}

The cascade:

~100-200 loop iterations/sec × 10,000 threads = ~1-2 million submit() calls/sec
Each virtual thread carries Continuation stack + ThreadLocal + task object
Each thread lives at least 10ms → unbounded accumulation
2GB heap fills in <2 seconds
Young GCs (205) release ~8MB avg → can't keep up
G1 falls back to Full GC → 324 Full GCs, all release 0 bytes
JVM spends 88% of time in GC → OOM

Why Full GC Released 0 Bytes

Full GC traces:

[Full GC (G1 Evacuation Pause) 2047M->2047M, 0.362s]
[Full GC (G1 Evacuation Pause) 2047M->2047M, 0.371s]
[Full GC (G1 Evacuation Pause) 2047M->2047M, 0.358s]

All virtual threads and Continuations were alive – referenced by ForkJoinPool and the ThreadPerTaskExecutor. With all objects reachable from GC roots, the collector had nothing to reclaim.

The Fix – Adding Backpressure

Option 1: Semaphore (Recommended)

Semaphore sem = new Semaphore(maxPending);
while (running) {
    for (int i = 0; i < threadCount; i++) {
        sem.acquire();  // blocks when limit exceeded
        executor.submit(() -> {
            try { /* do work */ } finally { sem.release(); }
        });
    }
}

Option 2: Add Sleep in the Loop

while (running) {
    for (int i = 0; i < threadCount; i++) {
        executor.submit(...);
    }
    Thread.sleep(workMs);  // rate limit submissions
}

Option 3: Bounded Queue + CallerRunsPolicy

BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(10000);
ThreadPoolExecutor executor = new ThreadPoolExecutor(
    0, Integer.MAX_VALUE, 60, TimeUnit.SECONDS,
    queue,
    Thread.ofVirtual().factory(),
    new ThreadPoolExecutor.CallerRunsPolicy()  // throttles the submitter
);

Key Lessons

Virtual threads are not fire-and-forget – submit() is non-blocking by design, which means you must manage the creation rate
Full GC releasing 0 bytes is a strong diagnostic signal – it almost always means thread explosion or a massive live object graph
The virtual thread IDs in jstack don't lie – 150k new IDs in 1 second is a clear warning
Backpressure is not optional – without it, any fast producer can overwhelm the JVM

Data Mapping – GC + Threads + Code

GC/Thread Symptom	Root Cause in Code
Allocation rate 1,076 MB/s	`while` no sleep + non-blocking `submit()`
324 Full GCs, 0 bytes released	All virtual threads alive, GC roots reachable
150k new threads in 1 second	~1M `submit()` calls per second
Throughput ~10%	88% of time in Full GC

Tool Note

Both analyses were performed using a JVM analysis tool I'm building – it parses GC logs, correlates with thread dumps, and extracts root cause patterns. The tool helped identify these issues in minutes rather than hours.

JDK 26 G1 GC Dual Card Tables – A Benchmark Story

jvmind — Mon, 29 Jun 2026 09:42:23 +0000

TL;DR: JDK 26's G1 write barrier optimization (Dual Card Tables) delivers ~2.4x faster write barrier operations, but aggregate GC metrics can be misleading if you don't account for the application doing more work.

Background

The Dual Card Tables work landed in JDK 26, promising 5-15% throughput improvements for G1 GC. I wanted to understand how this behaves under a write-barrier-heavy workload, so I ran a controlled benchmark comparing JDK 25 vs JDK 26 G1.

Benchmark Setup

Workload: Write-barrier-heavy allocation test (storing newly allocated Objects into a fixed array)
Heap: 2GB, G1 GC
Runtime: ~31 seconds per test
JDKs: 25 vs 26 (both with G1)

Initial Observations (Misleading)

Metric	JDK 25	JDK 26	Change
GC Events	75	168	+124%
Total Pause Time	1.78s	3.26s	+83%
Throughput	94.30%	89.50%	-4.8 p.p.
Allocation Rate	2,874 MB/s	6,587 MB/s	+129%

On the surface, JDK 26 looked worse: more GC events, more total pause time, lower throughput. But this was a measurement artifact.

The Critical Data Point

The benchmark's raw output told a different story:

JDK	Result (ms/op)	Iterations
25	0.055 ± 0.013	54
26	0.023 ± 0.003	129

JDK 26 executes the same write-barrier operation in less than half the time – ~2.4x faster.

What Actually Happened

The allocation rate spike (2,874 → 6,587 MB/s) wasn't a regression. It was a consequence of the application running faster:

Allocation Rate = Allocated Bytes / Application Runtime

When the write barrier becomes faster, the application spends less time on barrier operations and more time actually doing work – so it allocates more bytes in the same wall-clock time. More allocations → more garbage → more GC events → more total pause time.

The "throughput regression" was actually a sign of throughput improvement.

Corrected Conclusion

Dimension	JDK 26 vs JDK 25
Write barrier performance	✅ ~2.4x faster
Single-pause latency	✅ Better across all percentiles
Effective throughput	✅ Significantly higher
GC events (count)	⚠️ Higher (because of more work)
Total pause time	⚠️ Higher (because of more work)

Key Takeaway

Aggregate GC metrics like "total pause time" or "throughput percentage" are not absolute measures of performance. They must be interpreted in context. JDK 26's G1 optimization is a clear win – it made the application run faster, which created more garbage, which triggered more GC activity.

Benchmark Code

// Simplified version – full code available on request
public class WriteBarrierBench {
    private static final int ARRAY_SIZE = 10000;
    private final Object[] array = new Object[ARRAY_SIZE];
    private volatile long blackhole;

    private void storeReferences() {
        for (int i = 0; i < array.length; i++) {
            array[i] = new Object();  // triggers write barrier
        }
        blackhole += array.length;     // prevents optimization
    }

    // ... measurement harness with warmup, iterations, etc.
}

Methodology Note

The benchmark uses a volatile long blackhole to prevent dead code elimination
Warmup iterations are included to allow JIT compilation
A bash harness controls JDK switching and GC logging
The test is controlled (single workload pattern) – results may not generalize to all allocation profiles

Open Questions

How does this scale with different heap sizes?
What does the behavior look like on other GC algorithms (Parallel, ZGC)?
Is there a direct way to measure write barrier overhead independently?

[Boost]

jvmind — Fri, 26 Jun 2026 15:09:31 +0000

jvmind

Jun 26

Debugging a C2 JIT Compiler Infinite Loop on AArch64

#java #jvm #performance #aarch64

3 min read

Debugging a C2 JIT Compiler Infinite Loop on AArch64

jvmind — Fri, 26 Jun 2026 14:45:47 +0000

tl;dr: A production Java 8 service on AArch64 experienced 100% CPU on a single core caused by a C2 JIT compiler infinite loop. The root cause was a cycle in C2's Ideal Graph triggered by dead code elimination of MemBarCPUOrder nodes. A verified workaround: -XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType

Prologue

A mysterious CPU spike appeared on a production Java 8 service running on OpenJDK 8u442 on AArch64 processors.

The symptom: one core was pinned at 100%, the entire application became sluggish, and top showed C2 CompilerThread0 as the culprit.

This issue was not immediately reproducible on demand. It only surfaced after sustained mixed workload, making it particularly challenging to diagnose.

1. The Flame Graph Revelation

We started with a flame graph taken during the incident:

80% of samples landed inside MemNode::can_see_stored_value
21% were in MergeMemNode::memory_at

These two functions, deep in the OpenJDK C2 compiler, were burning cycles. This was definitely an infinite loop inside the C2 compiler.

2. The Suspect Loop

A careful reading of memnode.cpp (from OpenJDK 8u442 source) revealed a potential while loop that could spin without exit:

while (current->is_Proj()) {
    int opc = current->in(0)->Opcode();
    if (opc == Op_MemBarRelease || opc == Op_StoreFence || 
        opc == Op_MemBarAcquire || opc == Op_MemBarCPUOrder ...) {
        Node* mem = current->in(0)->in(TypeFunc::Memory);
        if (mem->is_MergeMem()) {
            MergeMemNode* merge = mem->as_MergeMem();
            Node* new_st = merge->memory_at(alias_idx);
            if (new_st == merge->base_memory()) {
                current = new_st;
                continue;  // ← INFINITE LOOP RISK
            }
            result = new_st;
        }
    }
    break;
}

The continue inside the while combined with current = new_st is the critical pattern. If new_st equals current, the loop never terminates.

3. Why Only AArch64?

We tried to reproduce on x86 – nothing. On AArch64 – the hang appeared after sustained operation.

Memory Model:

x86 (TSO) is strongly ordered. C2 rarely inserts MemBarCPUOrder barriers.
AArch64 (weak memory model) requires explicit barriers for safe publication.

The bug only fires when:

There is a loop that creates objects and writes to a volatile field.
C2 performs dead code elimination on a path inside that loop.
The cleanup folds a MergeMem node and makes its base_memory point back to the loop's own Proj node.

This creates a cycle in the data-flow graph:

Proj → MemBarCPUOrder → MergeMem → base_memory → Proj (again)

4. Root Cause Summary

Root cause: In OpenJDK 8u442 on AArch64, C2's dead-code elimination can create a data-flow cycle:

Proj → MemBarCPUOrder → MergeMem → base_memory → same Proj

Trigger: org.springframework.core.MethodParameter.getParameterType() — a common Spring method that combines volatile accesses and potential object creation in a hot path.

Why 8u442: The upstream fix was not backported into this update.

5. Production Workaround (Verified)

For teams that cannot rebuild the JDK:

-XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType

This keeps the method at C1/interpreted level and avoids triggering the C2 bug. Performance impact is negligible because getParameterType is not a hot path in most Spring applications.

6. Full GDB Command Sequence

For reference, here is the complete GDB workflow:

# 1. Find container PID on host
docker inspect <container> --format '{{.State.Pid}}'

# 2. Allow ptrace (if needed)
echo 0 > /proc/sys/kernel/yama/ptrace_scope

# 3. Capture core
gcore -o /data/coredump/hang <PID>

# 4. Analyze with GDB
gdb -q \
  -ex "set sysroot /proc/<PID>/root" \
  -ex "thread apply all bt 3" \
  -ex "quit" \
  /proc/<PID>/root/<path-to-jdk>/bin/java \
  /data/coredump/hang.<PID> > /tmp/bt.txt 2>&1

Conclusion

Key Findings:

Root cause: C2's Ideal Graph can form a cycle when dead code elimination removes nodes in the object allocation path.
Architecture specificity: The bug only manifests on AArch64 because only weak memory models require the MemBarCPUOrder nodes.
Trigger: org.springframework.core.MethodParameter.getParameterType().

Lessons Learned:

Flame graphs are the first line of defense.
Container debugging requires creative tooling — host gcore + sysroot works where in-container debugging fails.
When debugging intermittent JVM issues, capture core dumps early.

Final Recommendation:

If you run Spring Boot on AArch64 and see unexplained 100% CPU from C2 CompilerThread:

Profile with async-profiler to confirm it's in can_see_stored_value
Add the exclusion: -XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType
Capture a core dump and verify the cycle using CLHSDB
Report to your JDK vendor with the evidence

Acknowledgments

The method for extracting ciMethod information from core dumps was adapted from Vladimir Sitnikov's excellent 2018 article on analyzing stuck C2 compilations.

This investigation was conducted on OpenJDK 8u442 running on AArch64 processors.

Tags: java, jvm, debugging, performance, aarch64

DEV Community: jvmind

The Butterfly Effect of a Compiler: Hunting an aarch64-only JVM SIGSEGV Down to the Source Line

The Scene

Step 1 — Reading the Wreckage

Step 2 — Disassembling the Maven Artifact

Step 3 — Tracing d13 Back to Its Birth

Step 4 — The Toolchain Fingerprint

Step 5 — Why "Same gcc" Still Doesn't Reproduce

Step 6 — Finding the Exact Toolchain

Step 7 — Byte-for-Byte Reproduction

Step 8 — The Source Line

The Fix

Appendix — The Tools That Got Us There

Reproduce It Yourself

Can AI Diagnose JVM Incidents? Correlating GC Logs, Thread Dumps, and Heap Dumps

The Core Problem: JVM Diagnostics Are Multi-Dimensional

High-Level Architecture

Step 1 — Deterministic Parsing Layer

GC Log Analysis Interface

Thread Dump (jstack) Analysis Interface

Heap Dump Analysis Interface

Step 2 — Signal Structuring

Step 3 — ReAct-Based AI Agent

Agent Execution Interface

Example: Virtual Thread Explosion

Phase 1 — GC Observation

Phase 2 — Heap Correlation

Phase 3 — Thread Behavior Analysis

Phase 4 — Cross-Dimension Validation

Structured Root Cause Output

Why ReAct Is Critical

Hallucination Control

Why This Approach Matters

Final Thought

Try JVMind

Virtual Thread OOM – A Case Study in Missing Backpressure

The Incident

GC Report – What the Logs Showed

Thread Dump Analysis

The Offending Code

Why Full GC Released 0 Bytes

The Fix – Adding Backpressure

Option 1: Semaphore (Recommended)

Option 2: Add Sleep in the Loop

Option 3: Bounded Queue + CallerRunsPolicy

Key Lessons

Data Mapping – GC + Threads + Code

Tool Note

JDK 26 G1 GC Dual Card Tables – A Benchmark Story

Background

Benchmark Setup

Initial Observations (Misleading)

The Critical Data Point

What Actually Happened

Corrected Conclusion

Key Takeaway

Benchmark Code

Methodology Note

Open Questions

[Boost]

Debugging a C2 JIT Compiler Infinite Loop on AArch64

Debugging a C2 JIT Compiler Infinite Loop on AArch64

Prologue

1. The Flame Graph Revelation

2. The Suspect Loop

3. Why Only AArch64?

4. Root Cause Summary

5. Production Workaround (Verified)

6. Full GDB Command Sequence

Conclusion

Acknowledgments

Step 3 — Tracing `d13` Back to Its Birth