DEV Community

CharmPic
CharmPic

Posted on

## The Baseline Power: Performance Without Optimizations

Engineering Notes: A work in progress.

Before diving into the Lane16 specific build, I discovered that the baseline version of hakozuna is already incredibly competitive against mimalloc. After fixing the debug logging and atomic bottlenecks, the raw performance results are as follows:

Full Benchmark Results (T=16, RUNS=5 Median)

Size Remote Rate (R) Baseline (hakozuna) mimalloc hakozunavs mimalloc
Small 0 (Local) 377M 395M -5%
Small 50 221M 180M +23%
Small 90 199M 177M +12%
Medium 0 (Local) 338M 264M +28%
Medium 50 74.7M 78.7M -5%
Medium 90 53.4M 43.4M +23%
Mixed 50 100M 104M -4%
Mixed 90 73.9M 76.1M -3%

Head-to-Head Comparison Summary

Scenario The Winner Performance Gain
R=0 (Pure Local) hakozuna Dominant lead of +17% to +28%
R=50 (Small) hakozuna Strong lead of +23%
R=50 (Med/Mixed) mimalloc Slight edge for mimalloc (-4% to -5%)
R=90 (Small/Med) hakozuna Significant lead of +12% to +23%
R=90 (Mixed) Tie Nearly identical performance (-3%)

Key Takeaways

  1. Local Mastery: In scenarios where memory is allocated and freed on the same thread (R=0), hakozuna shows a clear architectural advantage, outperforming mimalloc by up to 28% in medium sizes.
  2. High Remote Resilience: Even without the "Lane16" optimization, hakozuna manages remote frees (R=90) for small and medium objects more efficiently than mimalloc.
  3. Consistency: While mimalloc holds a slight edge in some mixed-workload scenarios at R=50, the gap is minimal, and hz3 is consistently within striking distance or ahead.

Top comments (0)