## The Baseline Power: Performance Without Optimizations

#cpp

Engineering Notes: A work in progress.

Before diving into the Lane16 specific build, I discovered that the baseline version of hakozuna is already incredibly competitive against mimalloc. After fixing the debug logging and atomic bottlenecks, the raw performance results are as follows:

Full Benchmark Results (T=16, RUNS=5 Median)

Size	Remote Rate (R)	Baseline (hakozuna)	mimalloc	hakozunavs mimalloc
Small	0 (Local)	377M	395M	-5%
Small	50	221M	180M	+23%
Small	90	199M	177M	+12%
Medium	0 (Local)	338M	264M	+28%
Medium	50	74.7M	78.7M	-5%
Medium	90	53.4M	43.4M	+23%
Mixed	50	100M	104M	-4%
Mixed	90	73.9M	76.1M	-3%

Head-to-Head Comparison Summary

Scenario	The Winner	Performance Gain
R=0 (Pure Local)	hakozuna	Dominant lead of +17% to +28%
R=50 (Small)	hakozuna	Strong lead of +23%
R=50 (Med/Mixed)	mimalloc	Slight edge for mimalloc (-4% to -5%)
R=90 (Small/Med)	hakozuna	Significant lead of +12% to +23%
R=90 (Mixed)	Tie	Nearly identical performance (-3%)

Key Takeaways

Local Mastery: In scenarios where memory is allocated and freed on the same thread (R=0), hakozuna shows a clear architectural advantage, outperforming mimalloc by up to 28% in medium sizes.
High Remote Resilience: Even without the "Lane16" optimization, hakozuna manages remote frees (R=90) for small and medium objects more efficiently than mimalloc.
Consistency: While mimalloc holds a slight edge in some mixed-workload scenarios at R=50, the gap is minimal, and hz3 is consistently within striking distance or ahead.

DEV Community

## The Baseline Power: Performance Without Optimizations

Full Benchmark Results (T=16, RUNS=5 Median)

Head-to-Head Comparison Summary

Key Takeaways

Top comments (0)