Draft Content
Overview
In my project hakozuna, I am exploring different pointer header management strategies to optimize memory allocation performance. Specifically, I’ve been comparing two approaches: PTAG32 (a global tagging method) and S113 (an approach inspired by mimalloc).
Pointer Header Strategies
- S113: Utilizes the mimalloc-style strategy for managing pointer metadata.
- PTAG32: A global tagging method.
Performance Insights
Based on my recent benchmarks, S113 currently outperforms PTAG32 in multi-threaded scenarios.
The primary reason for the score difference seems to be CPU cache misses. While PTAG32 can be faster in single-threaded environments under certain conditions, it suffers from higher cache miss rates during heavy multi-threaded workloads. I am currently researching further optimizations for the PTAG32 approach.
Benchmark Results
MT Remote (R=90%, T=8)
This test measures operations per second across 8 threads.
| Allocator | ops/s | vs tcmalloc | vs mimalloc |
|---|---|---|---|
| S113 | 62.81M | +56.2% | +15.0% |
| PTAG32 | 57.95M | +44.1% | +6.1% |
| mimalloc | 54.63M | +35.9% | - |
| tcmalloc | 40.20M | - | -26.4% |
Larson Benchmark (Scaling Efficiency)
The Larson benchmark highlights the scaling efficiency at 8 threads.
| Allocator | T=8 | Scaling Efficiency |
|---|---|---|
| S113 | 113.05M | 58.1% |
| PTAG32 | 112.63M | 51.7% |
| mimalloc | 104.15M | 54.4% |
| tcmalloc | 105.34M | 50.6% |
Conclusion
Currently, the S113 method shows superior scaling and raw performance in multi-threaded benchmarks. However, the potential of PTAG32 in specific single-threaded cases remains an interesting area for development.
Check out the source code here:
https://github.com/hakorune/hakozuna
Top comments (0)