DEV Community

CharmPic
CharmPic

Posted on

Hakozuna HZ5: Page/Runfast Sidecar Allocator Prototype Design Notes

Hakozuna HZ5: Design Notes for a Page/Run-First Sidecar Allocator Prototype

In this post, I will outline the design principles for HZ5, a new experimental prototype within Hakozuna, a memory allocator project implemented in C.

Up until now, Hakozuna has evolved primarily around the HZ3 and HZ4 allocator profiles. HZ5 is an extension of that lineage, but it isn't just a branch aimed at slightly accelerating the existing fast path.

In HZ5, we are revisiting the internal architecture of the allocator to explore the following directions:

  • Page/Run-first classification
  • Sidecar metadata
  • Fail-closed ownership determination
  • Descriptor-owned front-end
  • Page-oriented remote free
  • Profile-specific allocator lanes
  • Experimental build/benchmark paths for Linux and Windows

Rather than a finished product, HZ5 serves as a testbed for rethinking allocator design.


Differences Between HZ3, HZ4, and HZ5

Hakozuna is categorized into several distinct allocator profiles:

First, HZ3 / ACE-Alloc is a profile primarily designed for local-heavy allocation workloads. It focuses on compact memory usage and O(1) pointer-to-bin lookups using PTAG32.

Next, HZ4 is a message-passing / remote-free profile. It is intended for remote-heavy workloads and scenarios with high thread counts.

In contrast, HZ5 is a page/run-first sidecar allocator prototype.

The differences can be summarized as follows:

HZ3 / HZ4:
  Polishing existing profiles.
  Optimizing small object paths and remote-free paths.

HZ5:
  Reimagining classification, ownership, and metadata layout.
  Designing around pages/runs and descriptors.

Enter fullscreen mode Exit fullscreen mode

While HZ3 and HZ4 focus on "strengthening existing profiles," HZ5 is a prototype for "reconstructing the internal structure of the allocator from a different perspective."


HZ5 Design Principles

The core of HZ5 is the concept of classifying pointers in page/run units and consolidating that information into sidecar descriptors.

When performing a free, the allocator needs to determine at least the following:

  1. Does this pointer belong to this allocator? (Ownership)
  2. Which size class, run, or page does it belong to?
  3. Can it be returned locally, or should it be treated as a remote free?
  4. Which front-end or profile should it be dispatched to?

Instead of forcing this metadata to be stored immediately adjacent to the pointer itself, HZ5 explores a direction centered on descriptors corresponding to pages/runs.

The flow looks like this:

user pointer
    |
    v
page / run lookup
    |
    v
sidecar descriptor
    |
    +--> owner
    +--> profile
    +--> size class
    +--> front-end policy
    |
    v
allocation / free dispatch

Enter fullscreen mode Exit fullscreen mode

Crucially, HZ5 is not just about "placing metadata outside." By shifting metadata to a sidecar, we can centralize ownership, profile, and dispatch policy within the descriptor. This means that instead of each front-end making independent decisions, the descriptor dictates "how this pointer should be handled."


Why Sidecar Metadata?

One of the hardest parts of an allocator is the free(ptr) operation.

With malloc(size), the requested size is explicit. However, with free(ptr), only a pointer is provided. The allocator must recover the owner, size class, and return destination from that pointer alone.

Consequently, the placement of metadata is critical.

Some designs place metadata close to the user allocation, which can be advantageous for locality and implementation simplicity. HZ5, however, tests a page/run unit descriptor approach, where the allocator looks up the descriptor via the pointer's page/run.

This design emphasizes:

  • Clarifying pointer classification at the page/run level.
  • Centralizing ownership logic within the descriptor.
  • Organizing policy dispatch for each profile.
  • Opting for a fail-closed approach for unknown pointers or ambiguous states.

In this context, "fail-closed" means that instead of "proceeding despite uncertainty," the system defaults to the safe side if ownership cannot be confirmed.

In high-speed allocators, there is a temptation to eliminate every extra branch or lookup. However, allocator ownership logic is a high-impact area if corrupted. In HZ5, the clarity of classification is as much a design goal as raw speed.


Current State of the Windows Port

While HZ5 began as an experiment on Linux, we are now establishing native Windows build and benchmark paths.

However, I want to be cautious here: HZ5 is still experimental. It is not yet at a stage where it can generally replace existing allocators across the Windows ecosystem.

Its current role is as follows:

  • Establishing the pipeline for native Windows builds.
  • Enabling profile-specific benchmarks on Windows.
  • Observing workload trends (remote-heavy, local-heavy, mixed, etc.).
  • Comparing how designs formulated on Linux translate to the Windows environment.

Allocator benchmarks vary wildly depending on the environment. Factors such as OS, compiler, thread count, allocation size distribution, remote-free ratios, and RSS management all shift the results. Therefore, it is safer to treat HZ5 results not as "globally faster on Windows," but as "exhibiting these tendencies in this specific profile/benchmark lane."


Notes on Interpreting Benchmarks

It is difficult to judge an allocator by a single number.

High throughput may come at the cost of excessive RSS. An allocator that excels in local-heavy workloads might struggle when remote frees increase. Performance with small objects might not translate to large allocations or mixed workloads.

As such, HZ5 evaluates benchmarks across several axes:

  • Local-heavy vs. Remote-heavy.
  • Small object focused vs. Mixed size.
  • Throughput vs. RSS.
  • Scalability with increased thread counts.
  • Consistency between Linux and Windows.

The danger in allocator evaluation is claiming it is the "best" based on a single successful case. HZ5 is not for making such claims. Rather, it is an experiment to see what happens when the allocator's structure is varied by profile.


Why Assign a DOI?

We have assigned a DOI via Zenodo to HZ5 for this release.

(Note: HZ3/HZ4 also has its own separate artifact DOI.)

GitHub repositories are updated daily, which is great for development but problematic for citing a specific version of a paper, source, or artifact. A Zenodo DOI allows us to freeze the deliverables at a specific point in time.

The HZ5 artifact includes:

  • HZ5 source code
  • Design notes
  • Benchmarks and reproducibility artifacts
  • English and Japanese paper PDFs

Separating the DOIs for HZ3/HZ4 and HZ5 was intentional. HZ3/HZ4 focuses on the maturity and comparison of existing profiles, while HZ5 represents a different design line as a sidecar allocator prototype. Keeping them separate makes the intent clearer for readers and researchers.


Future Outlook

HZ5 is not "finished"; it has simply reached a stage where its form is organized enough to be shared.

Moving forward, I aim to:

  • Refine the Windows HZ5 path.
  • Expand Linux/Windows benchmark coverage.
  • Categorize stability by profile.
  • Consolidate comparisons with HZ3/HZ4.
  • Synchronize papers, READMEs, and artifacts.
  • Further streamline reproducibility procedures.

The behavior of an allocator can change drastically based on minor implementation details. This is exactly why we need to clarify which profile is effective for which workload under what constraints, rather than just saying it's "fast."

HZ5 is the testbed for that inquiry.

We still need to verify the effectiveness of page/run-first classification, sidecar descriptors, fail-closed ownership, and profile-specific dispatch. However, with this milestone, HZ5 has transitioned from a "prototype folder in a repo" to a "research artifact with design intent and a DOI."

I look forward to growing it further, step by step.

Top comments (0)