John Fleming

Posted on Dec 2

No Compromise Arena Allocation

#cpp #gamedev #performance #memorymanagement

When I first began developing my game engine, I decided to challenge myself by using only arena allocators. And while it provided some nice benefits like cache locality and near instant allocation and deallocation from arenas (after initially allocating the arena itself), it wasn't without its problems.

Temporaries Problem

Temporaries. Are. Everywhere. I was blind to this before using arenas because I would just use a vector for everything. Bad, I know, but It was nice, clean, and I never had to worry about it. Any time I needed intermediate arrays (for things like string building), a vector was always there for me, but now it wasn't, and now I needed a way to explicitly manage these temporaries.

First attempt: Separate Scratch Arenas

My initial solution to this was to pass a separate "scratch" arena by value that could be used to store intermediate values so that any changed bookkeeping would get lost at function return. For example:

void* someFunc(Arena* persistentArena, Arena scratchArena) {
    // temporary allocation, lost after function returns
    void* temp = allocate(&scratchArena, 64);

    // persistent allocation, lives beyond function
    return allocate(persistentArena, 128);
}

Since allocation in an arena is just updating the bookkeeping (i.e. an offset), this meant scratch arenas operated effectively like a stack. From what I found online, this most aligned with Chris Wellons' method of dealing with this issue.

This was nice because, for one, ownership was incredibly clear: the result lives in the arena passed by pointer, since that is the only possible way for the result to live longer than the arena. Additionally, if I only passed an arena by value, then I know for certain that the result doesn't live on the arena, because it can't (since all changed bookkeeping as a result of allocation is lost at function return).

Furthermore, if I wanted a function's results to live in a scratch arena instead of in a "persistent" arena (an arena passed by pointer, where allocations live longer than the function's stack frame as a consequence), then I could at any point just pass a pointer to my scratch arena instead of a pointer to my persistent arena, so that the called function's results will be lost after the "current" function returns (since the called function's results were placed in the scratch arena).

Nested Function Issue

The caveat to this is that if a function needs to have both a persistent and scratch arena, and if I made that function's persistent arena my scratch arena, then what do I pass for the function's scratch arena?

void* func(Arena* persistArena, Arena scratchArena) {
    void* temp = allocate(&scratchArena, 64);

    void* result = nestedFunc(
        &scratchArena,
        // what should be passed here?
    );

    return result;
}

There are 3 options I found: to make and pass a second scratch arena, to pass the persistent arena, or to use thread local storage.

Additionally, there were 2 ways I found to easily make a second scratch arena: to split the current scratch arena into two or to allocate a whole new scratch arena. I decided against the second strategy because I liked how arena allocation reduced dynamic allocations, and I decided against the first strategy because half-ing the scratch arena for each nested function would quickly reduce the available capacity of each nested scratch arena.

So the first option was a bust, but what about the second option? Well, it actually works pretty well, so long as the persistent arena just happened to be large enough to accommodate scratch space requirements (not likely in my case), and thats not even considering the possibility of it being allocated completely at the beginning of a function or filling up over time.

Okay, so were down to the last option: thread local storage. This is actually a pretty nice solution I read from Ryan Fleury's Untangling Lifetimes article, though it involves manually grabbing a scratch arena and checking if the passed persistent arena aliased the scratch arena (could happen if you call a function and pass the TLS scratch arena if the function result is just an intermediary / temporary).

This method also requires a manual "getScratch" and "releaseScratch" call at the beginning and end of functions that need to use the scratch arena. This was a bit frustrating, I had gotten really attached to the idea of the function's stack frame just naturally resetting the scratch arena, so going back to manually managing the reset felt bad. Additionally, I also removed the CRT, so I couldn't easily use the TLS method even if I was willing to manually manage the memory.

I was at an impasse. All of the methods seemed to have some sort of issue. Moreover, that check for if the arenas alias is actually really important to ensure you dont overwrite one arena by writing into another, which is an issue / safety concern with passing two separate arenas in general.

Exploring Dual Arenas

So I thought that maybe I shouldn't use 2 separate arenas, and instead searched about some dual arena methods and found a nice article by Stefan Reinalter. The idea is that scratch allocations are placed at the top of the arena growing downwards, while persistent allocations are placed at the bottom growing upwards. One added benefit of this approach to the previous ones is that arena space is completely maximized. This is because there's no case where a scratch allocation fails while there's available space for persistent allocations and vice versa, since the spaces for persistent and scratch allocations are combined into a single arena.

However, there was still the issue of manually managing the "scratch" space, i.e. the top of the arena, as well as the issue that I couldn't easily have a function place results into the scratch space, as I was able to with separate arenas.

Final Solution

To solve these issues, my engine uses an arena allocation strategy that takes elements from each strategy discussed so far. For starters, this is what the struct looks like:

struct ArenaFrame {
    Arena* pArena;
    uint32_t* pPersistOffset;
    uint32_t scratchOffset;
    bool isFlipped;
};

The idea is to use a single dual arena that is actually just a wrapper of 2 offsets into the same memory: a persistent offset pointer and a scratch offset value. This way, when you pass an ArenaFrame by value, any scratch allocations are automatically cleaned up when the function returns, while persistent allocations remain.

Additionally, to make ownership / effects clear, that function can take an ArenaFrame by rvalue reference, which makes it immediately obvious at the call site which pointer is being modified. For example:

void* func2(ArenaFrame&& frame) {
    // scratch allocation, cleaned up after return
    void* temp = scratchAlloc(&frame, 64);

    // persistent allocation, lives after return
    return alloc(&frame, 128); 
}

void* func1(ArenaFrame&& frame) {
    // temp lives in scratch space
    void* temp = func2(makeFrame(frame, &frame.scratchOffset));

    // final result lives in persistant space
    return func2(makeFrame(frame, frame.pPersistOffset));
}

furthermore, to allow a function to place results into scratch space, you can simply make the passed ArenaFrame's persist offset point to your own ArenaFrame's scratch offset, while setting the passed scratch offset from your ArenaFrame's persist offset, effectively flipping where persistent and scratch allocations go. Then, the isFlipped flag handles swapping the direction allocations grow in when this happens to ensure the expected behavior.

This is currently the method I use in my game engine, and so far it has been pretty painless. The API is extremely simple and explicit since you only ever need to pass a single ArenaFrame to any function, while the effects of a function are clear as day through the rvalue references. Here's a more complex example that shows how it all works together:

void* func2(ArenaFrame&& frame) {
    // Initial state: *frame.pPersistOffset = 0, frame.scratchOffset = 1000
    void* temp = scratchAlloc(&frame, 64);
    // After: *frame.pPersistOffset = 0, frame.scratchOffset = 936
    void* persist = alloc(&frame, 32);
     // After: *frame.pPersistOffset = 32, frame.scratchOffset = 936
    void* tempResult = anotherFunc(makeFrame(frame, &frame.scratchOffset));
    // After: *frame.pPersistOffset = 32, frame.scratchOffset = 904, 
    return alloc(&frame, 128);
    // After: *frame.pPersistOffset = 160, frame.scratchOffset = 904
}

void func1(ArenaFrame&& frame) {
    // Initial state: frame.scratchOffset = 1000, *frame.pPersistOffset = 0
    void* temp1 = func2(makeFrame(frame, frame.pPersistOffset));
    // After: frame.scratchOffset = 1000, *frame.pPersistOffset = 160
    void* temp2 = func2(makeFrame(frame, &frame.scratchOffset));
    // After: frame.scratchOffset = 840, *frame.pPersistOffset = 160
}

The stack naturally handles cleanup of temporaries without any manual management, there's no possibility of arena aliasing since there's only a single underlying memory block, and there's maximum space utilization through the dual-direction growth. Additionally, the combination of offset flipping with rvalue references makes it really clear at every function call exactly where returns live, as well as their lifetime.

So far, this method has helped simplify a lot of my allocation code, and hopefully it can be useful to you as well!

a full godbolt implementation

DEV Community

No Compromise Arena Allocation

Temporaries Problem

First attempt: Separate Scratch Arenas

Nested Function Issue

Exploring Dual Arenas

Final Solution

Top comments (0)

Read next

Senior Dev and Multi-threading

loop for and while.

Building a tiny type-safe typescript ECS (Entity-component-system)

C++: An Object-Oriented Programming Language