Engineering Notes: A work in progress.
Before diving into the Lane16 specific build, I discovered that the baseline version of hakozuna is already incredibly competitive against mimalloc. After fixing the debug logging and atomic bottlenecks, the raw performance results are as follows:
Full Benchmark Results (T=16, RUNS=5 Median)
| Size | Remote Rate (R) | Baseline (hakozuna) | mimalloc | hakozunavs mimalloc |
|---|---|---|---|---|
| Small | 0 (Local) | 377M | 395M | -5% |
| Small | 50 | 221M | 180M | +23% |
| Small | 90 | 199M | 177M | +12% |
| Medium | 0 (Local) | 338M | 264M | +28% |
| Medium | 50 | 74.7M | 78.7M | -5% |
| Medium | 90 | 53.4M | 43.4M | +23% |
| Mixed | 50 | 100M | 104M | -4% |
| Mixed | 90 | 73.9M | 76.1M | -3% |
Head-to-Head Comparison Summary
| Scenario | The Winner | Performance Gain |
|---|---|---|
| R=0 (Pure Local) | hakozuna | Dominant lead of +17% to +28% |
| R=50 (Small) | hakozuna | Strong lead of +23% |
| R=50 (Med/Mixed) | mimalloc | Slight edge for mimalloc (-4% to -5%) |
| R=90 (Small/Med) | hakozuna | Significant lead of +12% to +23% |
| R=90 (Mixed) | Tie | Nearly identical performance (-3%) |
Key Takeaways
- Local Mastery: In scenarios where memory is allocated and freed on the same thread (R=0), hakozuna shows a clear architectural advantage, outperforming mimalloc by up to 28% in medium sizes.
- High Remote Resilience: Even without the "Lane16" optimization, hakozuna manages remote frees (R=90) for small and medium objects more efficiently than mimalloc.
- Consistency: While mimalloc holds a slight edge in some mixed-workload scenarios at R=50, the gap is minimal, and hz3 is consistently within striking distance or ahead.
Top comments (0)