DEV Community

CharmPic
CharmPic

Posted on

Hakozuna: A High-erformance Memory llocator with dvanced Pointer Tagging

Draft Content

Overview

In my project hakozuna, I am exploring different pointer header management strategies to optimize memory allocation performance. Specifically, I’ve been comparing two approaches: PTAG32 (a global tagging method) and S113 (an approach inspired by mimalloc).

Pointer Header Strategies

  • S113: Utilizes the mimalloc-style strategy for managing pointer metadata.
  • PTAG32: A global tagging method.

Performance Insights

Based on my recent benchmarks, S113 currently outperforms PTAG32 in multi-threaded scenarios.

The primary reason for the score difference seems to be CPU cache misses. While PTAG32 can be faster in single-threaded environments under certain conditions, it suffers from higher cache miss rates during heavy multi-threaded workloads. I am currently researching further optimizations for the PTAG32 approach.


Benchmark Results

MT Remote (R=90%, T=8)

This test measures operations per second across 8 threads.

Allocator ops/s vs tcmalloc vs mimalloc
S113 62.81M +56.2% +15.0%
PTAG32 57.95M +44.1% +6.1%
mimalloc 54.63M +35.9% -
tcmalloc 40.20M - -26.4%

Larson Benchmark (Scaling Efficiency)

The Larson benchmark highlights the scaling efficiency at 8 threads.

Allocator T=8 Scaling Efficiency
S113 113.05M 58.1%
PTAG32 112.63M 51.7%
mimalloc 104.15M 54.4%
tcmalloc 105.34M 50.6%

Conclusion

Currently, the S113 method shows superior scaling and raw performance in multi-threaded benchmarks. However, the potential of PTAG32 in specific single-threaded cases remains an interesting area for development.

Check out the source code here:
https://github.com/hakorune/hakozuna


Top comments (0)