TL;DR: JDK 26's G1 write barrier optimization (Dual Card Tables) delivers ~2.4x faster write barrier operations, but aggregate GC metrics can be misleading if you don't account for the application doing more work.
Background
The Dual Card Tables work landed in JDK 26, promising 5-15% throughput improvements for G1 GC. I wanted to understand how this behaves under a write-barrier-heavy workload, so I ran a controlled benchmark comparing JDK 25 vs JDK 26 G1.
Benchmark Setup
- Workload: Write-barrier-heavy allocation test (storing newly allocated Objects into a fixed array)
- Heap: 2GB, G1 GC
- Runtime: ~31 seconds per test
- JDKs: 25 vs 26 (both with G1)
Initial Observations (Misleading)
| Metric | JDK 25 | JDK 26 | Change |
|---|---|---|---|
| GC Events | 75 | 168 | +124% |
| Total Pause Time | 1.78s | 3.26s | +83% |
| Throughput | 94.30% | 89.50% | -4.8 p.p. |
| Allocation Rate | 2,874 MB/s | 6,587 MB/s | +129% |
On the surface, JDK 26 looked worse: more GC events, more total pause time, lower throughput. But this was a measurement artifact.
The Critical Data Point
The benchmark's raw output told a different story:
| JDK | Result (ms/op) | Iterations |
|---|---|---|
| 25 | 0.055 ± 0.013 | 54 |
| 26 | 0.023 ± 0.003 | 129 |
JDK 26 executes the same write-barrier operation in less than half the time – ~2.4x faster.
What Actually Happened
The allocation rate spike (2,874 → 6,587 MB/s) wasn't a regression. It was a consequence of the application running faster:
Allocation Rate = Allocated Bytes / Application Runtime
When the write barrier becomes faster, the application spends less time on barrier operations and more time actually doing work – so it allocates more bytes in the same wall-clock time. More allocations → more garbage → more GC events → more total pause time.
The "throughput regression" was actually a sign of throughput improvement.
Corrected Conclusion
| Dimension | JDK 26 vs JDK 25 |
|---|---|
| Write barrier performance | ✅ ~2.4x faster |
| Single-pause latency | ✅ Better across all percentiles |
| Effective throughput | ✅ Significantly higher |
| GC events (count) | ⚠️ Higher (because of more work) |
| Total pause time | ⚠️ Higher (because of more work) |
Key Takeaway
Aggregate GC metrics like "total pause time" or "throughput percentage" are not absolute measures of performance. They must be interpreted in context. JDK 26's G1 optimization is a clear win – it made the application run faster, which created more garbage, which triggered more GC activity.
Benchmark Code
// Simplified version – full code available on request
public class WriteBarrierBench {
private static final int ARRAY_SIZE = 10000;
private final Object[] array = new Object[ARRAY_SIZE];
private volatile long blackhole;
private void storeReferences() {
for (int i = 0; i < array.length; i++) {
array[i] = new Object(); // triggers write barrier
}
blackhole += array.length; // prevents optimization
}
// ... measurement harness with warmup, iterations, etc.
}
Methodology Note
- The benchmark uses a
volatile long blackholeto prevent dead code elimination - Warmup iterations are included to allow JIT compilation
- A bash harness controls JDK switching and GC logging
- The test is controlled (single workload pattern) – results may not generalize to all allocation profiles
Open Questions
- How does this scale with different heap sizes?
- What does the behavior look like on other GC algorithms (Parallel, ZGC)?
- Is there a direct way to measure write barrier overhead independently?
Top comments (0)