I have created HZ8 as a new line for my custom memory allocator, Hakozuna.
HZ8 is not an allocator designed to be the "fastest across all benchmarks."
The goal is to return the post-workload RSS to a low level while maintaining a practical throughput.
In short, HZ8 is characterized as follows:
HZ8:
balanced low-RSS allocator
practical throughput
fail-closed ownership
cross-thread free correctness
I have consolidated the insights gained from experimenting with HZ3, HZ4, HZ5, and HZ6 into HZ8, organizing it as the primary allocator line to choose for general use.
HZ8 Design Principles
In HZ8, we put a particular emphasis on the following points:
- Keeping RSS low.
- Preventing breakdown under remote-heavy workloads.
- Handling cross-thread frees safely.
- Making ownership and route determination fail-closed.
- Balancing practical speed and memory usage rather than aiming for the absolute fastest.
The current default is HZ8-v2 / KeepRefill.
KeepRefill is a mechanism designed to avoid heavy empty/reactivate loops under remote-heavy workloads. When a medium run becomes empty, it retains the owner-local refill candidate rather than destroying it immediately.
Benchmark Results
The environment is Ubuntu 22.04.5 / Linux 6.8.0-90 / x86_64, RUNS=10, THREADS=16, ITERS=50000.
The representative results are as follows:
| Row | HZ8 ops/s | HZ8 post RSS | mimalloc ops/s | mimalloc post RSS | tcmalloc ops/s | tcmalloc post RSS |
|---|---|---|---|---|---|---|
| small_interleaved_remote90 | 12.023M | 2.91 MiB | 10.960M | 50.98 MiB | 23.900M | 32.94 MiB |
| main_interleaved_r90 | 6.048M | 4.57 MiB | 4.715M | 183.12 MiB | 12.178M | 90.31 MiB |
| medium_interleaved_r50 | 8.128M | 3.81 MiB | 4.151M | 162.54 MiB | 15.870M | 79.06 MiB |
tcmalloc shows strong throughput in many rows.
On the other hand, HZ8 demonstrates significantly lower post-workload RSS.
Therefore, the core proposition of HZ8 is as follows:
HZ8 is not intended to fully replace tcmalloc.
However, it is highly useful as an allocator that returns RSS to a low level while maintaining practical speed.
Comparison: MT lane x remote%
Aligning HZ3, HZ4, HZ5, HZ6, and HZ8 makes the positioning of HZ8 slightly easier to visualize.
| Lane | hz3 | hz4 | mimalloc | tcmalloc | Best HZ5 | HZ6 | HZ8 |
|---|---|---|---|---|---|---|---|
| main_r0 | 292.15M | 85.63M | 146.73M | 318.82M | 157.44M | 16.88M | 107.633M |
| main_r50 | 31.46M | 62.32M | 14.26M | 64.87M | 79.43M | 15.08M | 29.633M |
| main_r90 | 22.31M | 67.14M | 7.72M | 45.42M | 62.31M | 10.99M | 20.610M |
| guard_r0 | 318.98M | 156.68M | 258.19M | 375.71M | 149.00M | 189.48M | 224.750M |
| cross128_r90 | 2.78M | 27.66M | 3.52M | 7.21M | 22.39M | 6.38M | 37.342k |
HZ8 is not universally fast.
In particular, cross128_r90 is a current bottleneck.
However, since HZ8 is a line focused heavily on keeping RSS low, it shouldn't be evaluated solely by throughput.
LargeDirect Experiment
To address the weakness in cross128_r90, we also tested an opt-in profile called LargeDirectOwned.
This provides evidence showing that the performance bottleneck in cross128_r90 stems from the large/direct boundary.
cross128_r90:
baseline: 62.940k ops/s
LargeDirect candidate: 2.835M ops/s
ratio: 45.048x
However, the RSS increases:
peak RSS:
150.17 MiB -> 260.07 MiB
post RSS:
107.04 MiB -> 190.61 MiB
For this reason, LargeDirect is not enabled by default.
The default for HZ8 remains the KeepRefill balanced default.
Summary
HZ8 is not the fastest allocator.
However, it has become an allocator with the following distinct characteristics:
HZ8:
Practical speed
Low post-workload RSS
Resilience against remote-heavy workloads
Cross-thread free correctness
Fail-closed ownership
If speed is the sole metric, tcmalloc remains incredibly strong.
On the other hand, for workloads where returning RSS to a low level is critical, HZ8 occupies a very compelling position.
Moving forward, we plan to maintain the balanced line of HZ8 while advancing further speed-oriented research under HZ9.
Links
GitHub: https://github.com/hakorune/hakozuna
HZ8 paper / Zenodo: https://zenodo.org/records/21084279
Top comments (0)