<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: jvmind</title>
    <description>The latest articles on DEV Community by jvmind (@jvmind-devel).</description>
    <link>https://dev.to/jvmind-devel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004174%2F7ba56dc1-08a2-4431-8462-b6ffd863f577.png</url>
      <title>DEV Community: jvmind</title>
      <link>https://dev.to/jvmind-devel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jvmind-devel"/>
    <language>en</language>
    <item>
      <title>JDK 26 G1 GC Dual Card Tables – A Benchmark Story</title>
      <dc:creator>jvmind</dc:creator>
      <pubDate>Mon, 29 Jun 2026 09:42:23 +0000</pubDate>
      <link>https://dev.to/jvmind-devel/jdk-26-g1-gc-dual-card-tables-a-benchmark-story-24a9</link>
      <guid>https://dev.to/jvmind-devel/jdk-26-g1-gc-dual-card-tables-a-benchmark-story-24a9</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;: JDK 26's G1 write barrier optimization (Dual Card Tables) delivers ~2.4x faster write barrier operations, but aggregate GC metrics can be misleading if you don't account for the application doing more work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background
&lt;/h3&gt;

&lt;p&gt;The Dual Card Tables work landed in JDK 26, promising 5-15% throughput improvements for G1 GC. I wanted to understand how this behaves under a write-barrier-heavy workload, so I ran a controlled benchmark comparing JDK 25 vs JDK 26 G1.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Setup
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workload&lt;/strong&gt;: Write-barrier-heavy allocation test (storing newly allocated Objects into a fixed array)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heap&lt;/strong&gt;: 2GB, G1 GC&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime&lt;/strong&gt;: ~31 seconds per test&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JDKs&lt;/strong&gt;: 25 vs 26 (both with G1)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Initial Observations (Misleading)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;JDK 25&lt;/th&gt;
&lt;th&gt;JDK 26&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GC Events&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;168&lt;/td&gt;
&lt;td&gt;+124%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total Pause Time&lt;/td&gt;
&lt;td&gt;1.78s&lt;/td&gt;
&lt;td&gt;3.26s&lt;/td&gt;
&lt;td&gt;+83%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;94.30%&lt;/td&gt;
&lt;td&gt;89.50%&lt;/td&gt;
&lt;td&gt;-4.8 p.p.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Allocation Rate&lt;/td&gt;
&lt;td&gt;2,874 MB/s&lt;/td&gt;
&lt;td&gt;6,587 MB/s&lt;/td&gt;
&lt;td&gt;+129%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On the surface, JDK 26 looked worse: more GC events, more total pause time, lower throughput. But this was a measurement artifact.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Critical Data Point
&lt;/h3&gt;

&lt;p&gt;The benchmark's raw output told a different story:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;JDK&lt;/th&gt;
&lt;th&gt;Result (ms/op)&lt;/th&gt;
&lt;th&gt;Iterations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;0.055 ± 0.013&lt;/td&gt;
&lt;td&gt;54&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.023 ± 0.003&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;129&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;JDK 26 executes the same write-barrier operation in less than half the time&lt;/strong&gt; – ~2.4x faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Actually Happened
&lt;/h3&gt;

&lt;p&gt;The allocation rate spike (2,874 → 6,587 MB/s) wasn't a regression. It was a consequence of the application running faster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Allocation Rate = Allocated Bytes / Application Runtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the write barrier becomes faster, the application spends less time on barrier operations and more time actually doing work – so it allocates more bytes in the same wall-clock time. More allocations → more garbage → more GC events → more total pause time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "throughput regression" was actually a sign of throughput improvement.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Corrected Conclusion
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;JDK 26 vs JDK 25&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Write barrier performance&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;~2.4x faster&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single-pause latency&lt;/td&gt;
&lt;td&gt;✅ Better across all percentiles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective throughput&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;Significantly higher&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GC events (count)&lt;/td&gt;
&lt;td&gt;⚠️ Higher (because of more work)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total pause time&lt;/td&gt;
&lt;td&gt;⚠️ Higher (because of more work)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Key Takeaway
&lt;/h3&gt;

&lt;p&gt;Aggregate GC metrics like "total pause time" or "throughput percentage" are not absolute measures of performance. They must be interpreted in context. JDK 26's G1 optimization is a clear win – it made the application run faster, which created more garbage, which triggered more GC activity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark Code
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified version – full code available on request&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WriteBarrierBench&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="no"&gt;ARRAY_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="no"&gt;ARRAY_SIZE&lt;/span&gt;&lt;span class="o"&gt;];&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;long&lt;/span&gt; &lt;span class="n"&gt;blackhole&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;storeReferences&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;  &lt;span class="c1"&gt;// triggers write barrier&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;blackhole&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;array&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;// prevents optimization&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// ... measurement harness with warmup, iterations, etc.&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Methodology Note
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The benchmark uses a &lt;code&gt;volatile long blackhole&lt;/code&gt; to prevent dead code elimination&lt;/li&gt;
&lt;li&gt;Warmup iterations are included to allow JIT compilation&lt;/li&gt;
&lt;li&gt;A bash harness controls JDK switching and GC logging&lt;/li&gt;
&lt;li&gt;The test is controlled (single workload pattern) – results may not generalize to all allocation profiles&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Open Questions
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;How does this scale with different heap sizes?&lt;/li&gt;
&lt;li&gt;What does the behavior look like on other GC algorithms (Parallel, ZGC)?&lt;/li&gt;
&lt;li&gt;Is there a direct way to measure write barrier overhead independently?&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>java</category>
      <category>performance</category>
      <category>programming</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>jvmind</dc:creator>
      <pubDate>Fri, 26 Jun 2026 15:09:31 +0000</pubDate>
      <link>https://dev.to/jvmind-devel/-3mad</link>
      <guid>https://dev.to/jvmind-devel/-3mad</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc" class="crayons-story__hidden-navigation-link"&gt;Debugging a C2 JIT Compiler Infinite Loop on AArch64&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/jvmind-devel" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004174%2F7ba56dc1-08a2-4431-8462-b6ffd863f577.png" alt="jvmind-devel profile" class="crayons-avatar__image" width="96" height="96"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/jvmind-devel" class="crayons-story__secondary fw-medium m:hidden"&gt;
              jvmind
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                jvmind
                
              
              &lt;div id="story-author-preview-content-3999085" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/jvmind-devel" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4004174%2F7ba56dc1-08a2-4431-8462-b6ffd863f577.png" class="crayons-avatar__image" alt="" width="96" height="96"&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;jvmind&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 26&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc" id="article-link-3999085"&gt;
          Debugging a C2 JIT Compiler Infinite Loop on AArch64
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/java"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;java&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/jvm"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;jvm&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/performance"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;performance&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/aarch64"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;aarch64&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
            &lt;a href="https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            3 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Debugging a C2 JIT Compiler Infinite Loop on AArch64</title>
      <dc:creator>jvmind</dc:creator>
      <pubDate>Fri, 26 Jun 2026 14:45:47 +0000</pubDate>
      <link>https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc</link>
      <guid>https://dev.to/jvmind-devel/debugging-a-c2-jit-compiler-infinite-loop-on-aarch64-2mbc</guid>
      <description>&lt;p&gt;&lt;strong&gt;tl;dr&lt;/strong&gt;: A production Java 8 service on AArch64 experienced 100% CPU on a single core caused by a C2 JIT compiler infinite loop. The root cause was a cycle in C2's Ideal Graph triggered by dead code elimination of MemBarCPUOrder nodes. A verified workaround: &lt;code&gt;-XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Prologue
&lt;/h2&gt;

&lt;p&gt;A mysterious CPU spike appeared on a production Java 8 service running on OpenJDK 8u442 on AArch64 processors.&lt;/p&gt;

&lt;p&gt;The symptom: one core was pinned at 100%, the entire application became sluggish, and top showed C2 CompilerThread0 as the culprit.&lt;/p&gt;

&lt;p&gt;This issue was not immediately reproducible on demand. It only surfaced after sustained mixed workload, making it particularly challenging to diagnose.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Flame Graph Revelation
&lt;/h2&gt;

&lt;p&gt;We started with a flame graph taken during the incident:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;80% of samples landed inside MemNode::can_see_stored_value&lt;/li&gt;
&lt;li&gt;21% were in MergeMemNode::memory_at&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These two functions, deep in the OpenJDK C2 compiler, were burning cycles. This was definitely an infinite loop inside the C2 compiler.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. The Suspect Loop
&lt;/h2&gt;

&lt;p&gt;A careful reading of memnode.cpp (from OpenJDK 8u442 source) revealed a potential while loop that could spin without exit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_Proj&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;opc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;Opcode&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;opc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Op_MemBarRelease&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;opc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Op_StoreFence&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; 
        &lt;span class="n"&gt;opc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Op_MemBarAcquire&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;opc&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Op_MemBarCPUOrder&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;in&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypeFunc&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;is_MergeMem&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;MergeMemNode&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;merge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mem&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;as_MergeMem&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
            &lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;new_st&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;memory_at&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;alias_idx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_st&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;merge&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;base_memory&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_st&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ← INFINITE LOOP RISK&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;new_st&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The continue inside the while combined with current = new_st is the critical pattern. If new_st equals current, the loop never terminates.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Why Only AArch64?
&lt;/h2&gt;

&lt;p&gt;We tried to reproduce on x86 – nothing. On AArch64 – the hang appeared after sustained operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory Model&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;x86 (TSO) is strongly ordered. C2 rarely inserts MemBarCPUOrder barriers.&lt;/li&gt;
&lt;li&gt;AArch64 (weak memory model) requires explicit barriers for safe publication.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bug only fires when:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;There is a loop that creates objects and writes to a volatile field.&lt;/li&gt;
&lt;li&gt;C2 performs dead code elimination on a path inside that loop.&lt;/li&gt;
&lt;li&gt;The cleanup folds a MergeMem node and makes its base_memory point back to the loop's own Proj node.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This creates a cycle in the data-flow graph:&lt;/p&gt;

&lt;p&gt;Proj → MemBarCPUOrder → MergeMem → base_memory → Proj (again)&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Root Cause Summary
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Root cause&lt;/strong&gt;: In OpenJDK 8u442 on AArch64, C2's dead-code elimination can create a data-flow cycle:&lt;/p&gt;

&lt;p&gt;Proj → MemBarCPUOrder → MergeMem → base_memory → same Proj&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trigger&lt;/strong&gt;: org.springframework.core.MethodParameter.getParameterType() — a common Spring method that combines volatile accesses and potential object creation in a hot path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why 8u442&lt;/strong&gt;: The upstream fix was not backported into this update.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Production Workaround (Verified)
&lt;/h2&gt;

&lt;p&gt;For teams that cannot rebuild the JDK:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;-XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This keeps the method at C1/interpreted level and avoids triggering the C2 bug. Performance impact is negligible because getParameterType is not a hot path in most Spring applications.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Full GDB Command Sequence
&lt;/h2&gt;

&lt;p&gt;For reference, here is the complete GDB workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Find container PID on host&lt;/span&gt;
docker inspect &amp;lt;container&amp;gt; &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{{.State.Pid}}'&lt;/span&gt;

&lt;span class="c"&gt;# 2. Allow ptrace (if needed)&lt;/span&gt;
&lt;span class="nb"&gt;echo &lt;/span&gt;0 &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /proc/sys/kernel/yama/ptrace_scope

&lt;span class="c"&gt;# 3. Capture core&lt;/span&gt;
gcore &lt;span class="nt"&gt;-o&lt;/span&gt; /data/coredump/hang &amp;lt;PID&amp;gt;

&lt;span class="c"&gt;# 4. Analyze with GDB&lt;/span&gt;
gdb &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ex&lt;/span&gt; &lt;span class="s2"&gt;"set sysroot /proc/&amp;lt;PID&amp;gt;/root"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ex&lt;/span&gt; &lt;span class="s2"&gt;"thread apply all bt 3"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-ex&lt;/span&gt; &lt;span class="s2"&gt;"quit"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  /proc/&amp;lt;PID&amp;gt;/root/&amp;lt;path-to-jdk&amp;gt;/bin/java &lt;span class="se"&gt;\&lt;/span&gt;
  /data/coredump/hang.&amp;lt;PID&amp;gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /tmp/bt.txt 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Key Findings&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Root cause: C2's Ideal Graph can form a cycle when dead code elimination removes nodes in the object allocation path.&lt;/li&gt;
&lt;li&gt;Architecture specificity: The bug only manifests on AArch64 because only weak memory models require the MemBarCPUOrder nodes.&lt;/li&gt;
&lt;li&gt;Trigger: org.springframework.core.MethodParameter.getParameterType().&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Lessons Learned&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flame graphs are the first line of defense.&lt;/li&gt;
&lt;li&gt;Container debugging requires creative tooling — host gcore + sysroot works where in-container debugging fails.&lt;/li&gt;
&lt;li&gt;When debugging intermittent JVM issues, capture core dumps early.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Final Recommendation&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;If you run Spring Boot on AArch64 and see unexplained 100% CPU from C2 CompilerThread:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Profile with async-profiler to confirm it's in can_see_stored_value&lt;/li&gt;
&lt;li&gt;Add the exclusion: &lt;code&gt;-XX:CompileCommand=exclude,org.springframework.core.MethodParameter::getParameterType&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Capture a core dump and verify the cycle using CLHSDB&lt;/li&gt;
&lt;li&gt;Report to your JDK vendor with the evidence&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Acknowledgments
&lt;/h2&gt;

&lt;p&gt;The method for extracting ciMethod information from core dumps was adapted from Vladimir Sitnikov's excellent 2018 article on analyzing stuck C2 compilations.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This investigation was conducted on OpenJDK 8u442 running on AArch64 processors.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags&lt;/strong&gt;: java, jvm, debugging, performance, aarch64&lt;/p&gt;

</description>
      <category>java</category>
      <category>jvm</category>
      <category>performance</category>
      <category>aarch64</category>
    </item>
  </channel>
</rss>
