DEV Community

jvmind
jvmind

Posted on

JDK 26 G1 GC Dual Card Tables – A Benchmark Story

TL;DR: JDK 26's G1 write barrier optimization (Dual Card Tables) delivers ~2.4x faster write barrier operations, but aggregate GC metrics can be misleading if you don't account for the application doing more work.

Background

The Dual Card Tables work landed in JDK 26, promising 5-15% throughput improvements for G1 GC. I wanted to understand how this behaves under a write-barrier-heavy workload, so I ran a controlled benchmark comparing JDK 25 vs JDK 26 G1.

Benchmark Setup

  • Workload: Write-barrier-heavy allocation test (storing newly allocated Objects into a fixed array)
  • Heap: 2GB, G1 GC
  • Runtime: ~31 seconds per test
  • JDKs: 25 vs 26 (both with G1)

Initial Observations (Misleading)

Metric JDK 25 JDK 26 Change
GC Events 75 168 +124%
Total Pause Time 1.78s 3.26s +83%
Throughput 94.30% 89.50% -4.8 p.p.
Allocation Rate 2,874 MB/s 6,587 MB/s +129%

On the surface, JDK 26 looked worse: more GC events, more total pause time, lower throughput. But this was a measurement artifact.

The Critical Data Point

The benchmark's raw output told a different story:

JDK Result (ms/op) Iterations
25 0.055 ± 0.013 54
26 0.023 ± 0.003 129

JDK 26 executes the same write-barrier operation in less than half the time – ~2.4x faster.

What Actually Happened

The allocation rate spike (2,874 → 6,587 MB/s) wasn't a regression. It was a consequence of the application running faster:

Allocation Rate = Allocated Bytes / Application Runtime
Enter fullscreen mode Exit fullscreen mode

When the write barrier becomes faster, the application spends less time on barrier operations and more time actually doing work – so it allocates more bytes in the same wall-clock time. More allocations → more garbage → more GC events → more total pause time.

The "throughput regression" was actually a sign of throughput improvement.

Corrected Conclusion

Dimension JDK 26 vs JDK 25
Write barrier performance ~2.4x faster
Single-pause latency ✅ Better across all percentiles
Effective throughput Significantly higher
GC events (count) ⚠️ Higher (because of more work)
Total pause time ⚠️ Higher (because of more work)

Key Takeaway

Aggregate GC metrics like "total pause time" or "throughput percentage" are not absolute measures of performance. They must be interpreted in context. JDK 26's G1 optimization is a clear win – it made the application run faster, which created more garbage, which triggered more GC activity.

Benchmark Code

// Simplified version – full code available on request
public class WriteBarrierBench {
    private static final int ARRAY_SIZE = 10000;
    private final Object[] array = new Object[ARRAY_SIZE];
    private volatile long blackhole;

    private void storeReferences() {
        for (int i = 0; i < array.length; i++) {
            array[i] = new Object();  // triggers write barrier
        }
        blackhole += array.length;     // prevents optimization
    }

    // ... measurement harness with warmup, iterations, etc.
}
Enter fullscreen mode Exit fullscreen mode

Methodology Note

  • The benchmark uses a volatile long blackhole to prevent dead code elimination
  • Warmup iterations are included to allow JIT compilation
  • A bash harness controls JDK switching and GC logging
  • The test is controlled (single workload pattern) – results may not generalize to all allocation profiles

Open Questions

  • How does this scale with different heap sizes?
  • What does the behavior look like on other GC algorithms (Parallel, ZGC)?
  • Is there a direct way to measure write barrier overhead independently?

Top comments (0)