Rahul Shankar

Posted on Jan 30

Malloc-ation: How I Built My Own malloc from Scratch

#cpp #tutorial #programming #computerscience

Abstract:

malloc looks simple. You pass in the size you need, and it hands you a pointer.
But how does it actually do that?
That question sparked my curiosity, and long story short, I ended up implementing my own version of malloc — something closer to a raft than a cruise ship.
In this article, I share what I learned while building a custom malloc in C for Linux/UNIX systems. I walk through four increasingly complex versions of an allocator, each highlighting different trade-offs between speed, safety, and memory efficiency.
Src code link: https://github.com/RahulShankarV52/Memory_allocators_history

FYI

Before we begin, there are a few memory management concepts worth covering. The explanations below reflect my understanding and are simplified for clarity.

sbrk vs. mmap: sbrk and brk adjust the program break, effectively growing or shrinking the heap within the process’s existing address space. mmap, on the other hand, creates independent memory mappings that live outside the traditional heap. In practice, allocators often use sbrk for smaller allocations and mmap for larger ones.
The Hidden Tax (Metadata): When you request 100 bytes, the allocator usually reserves more. The extra space stores metadata — often in a header (and sometimes a footer) — containing information such as block size and allocation status. The user only ever sees a pointer to the usable payload.
Fragmentation:
- Internal: Wasting space inside a block (e.g., giving a 100-byte block for a 25-byte request).
- External: Having enough total free space, but it's scattered in tiny holes that are too small for a new request.
The System Call Tax: Interacting with the kernel is expensive. It’s far more efficient to request memory in larger chunks (e.g., 4 KB pages) and manage it in user space than to issue a system call for every small allocation.
Memory Alignment: Modern CPUs access memory in fixed-size chunks (commonly 8 or 16 bytes). Misaligned data can require multiple memory accesses, hurting performance. To avoid this, allocations are typically aligned to 16-byte boundaries.
Code snippets

Level 1: Bump allocator
The sbrk system call is used to increase the program break by a set size for each allocation
You cannot free individual blocks here, you have to free the entire allocation
It is fast but not robust.

#include <string.h>
#include <unistd.h>
//starting point of the dynamic allocation
void *mainHeap = NULL;

void *myalloc(int size) {
    // sbrk(size) returns the current program break 
    // that can then be accessed upto the required size
    void *allocated = sbrk(size);

    // fail handling
    if (allocated == (void *)-1) return NULL;

    // Setting the starting point if there wasnt one
    if (mainHeap == NULL) {
        mainHeap = allocated;
    }
    return allocated;
}
int myfree() {
    // If mainHeap is NULL, myalloc was never called; nothing to reset.
    if (mainHeap == NULL) {
        return -1;
    }

    // brk(addr) sets the Program Break 
    // to an ABSOLUTE address. By passing mainHeap, we instantly 
    // 'deallocate' everything we ever claimed.
    return brk(mainHeap);
}

int main() {
    const char *message = "Hello world";
    char *testmem = (char *)myalloc(strlen(message) * sizeof(char));
    strcpy(testmem, message);
    myfree();
    return 0;
}

Level2: Fixed Chunk Allocator

Unlike the bump allocator, this version allows individual allocations to be freed — but only for fixed-size objects.
Each iteration goes from logic model to full implementation
Unfortunately it suffers from internal and external fragmentation issues. #### v1: slow and basic using an array instead of actual memory

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define NUMBEROFROUNDS 1000000
#define NUMBEROFOBJECTS 100
//value to be stored
typedef struct {
  int x, y, z;
} Vector3;
//storage unit
typedef struct {
  Vector3 obj;
  bool allocated;
} PoolObject;

// objectpool: basically where the memory is accessed from 
PoolObject PoolObjects[NUMBEROFOBJECTS] = {0};

//malloc implementation
Vector3 *GetAllocation(void) {
  for (int i = 0; i < NUMBEROFOBJECTS; i++) {
    if (PoolObjects[i].allocated == false) {
      PoolObjects[i].allocated = true;
      return &(PoolObjects[i].obj);
    }
  }
  return NULL; // if none available
}
//free implementation
void FreeAllocation(Vector3 *v) {
  PoolObject *returned = (PoolObject *)v;
  returned->allocated = false;
}

int main() {
  for (int i = 0; i < NUMBEROFROUNDS; i++) {
    int numobjs = rand() % NUMBEROFOBJECTS;
    Vector3 *vectors[numobjs];
    for (int j = 0; j < numobjs; j++) {
      vectors[j] = GetAllocation();
    }
    printf("round %d -- got %d vectors \n", i, numobjs);
    for (int j = 0; j < numobjs; j++) {
      FreeAllocation(vectors[j]);
    }
  }
}

v2: faster with freelist

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define NUMBEROFROUNDS 1000000
#define NUMBEROFOBJECTS 100
//to prevent double free vulnerability
#define FLAG (struct PoolObject *)0x1

typedef struct {
  int x, y, z;
} Vector3;

typedef struct PoolObject {
  Vector3 obj;
  struct PoolObject *next; //replace bool with a pointer
} PoolObject;
// freelist will store all free pool objects
PoolObject *freelist = NULL;

// objectpool
PoolObject PoolObjects[NUMBEROFOBJECTS] = {0};

void initialize_pool() {
  //link all pool objects to the next one and freelist to
  //the first pool object as they are all free
  for (int i = 0; i < NUMBEROFOBJECTS - 1; i++) {
    PoolObjects[i].next = &(PoolObjects[i + 1]);
  }
  PoolObjects[NUMBEROFOBJECTS - 1].next = NULL;
  freelist = &(PoolObjects[0]);
}

//malloc implementation
Vector3 *GetAllocation(void) {
  if (freelist) {
    PoolObject *poolobj = freelist;
    freelist = freelist->next;
    poolobj->next = FLAG;
    return &(poolobj->obj);
  }

  return NULL; // if none available
}
//free implementation
void FreeAllocation(Vector3 *v) {
  PoolObject *returned = (PoolObject *)v;
  if (returned->next != FLAG) {
    printf("Error: Attempting to free free memory");
    return;
  }
  returned->next = freelist;
  freelist = returned;
}

int main() {
  initialize_pool();
  for (int i = 0; i < NUMBEROFROUNDS; i++) {
    int numobjs = rand() % NUMBEROFOBJECTS;
    Vector3 *vectors[numobjs];
    for (int j = 0; j < numobjs; j++) {
      vectors[j] = GetAllocation();
    }
    printf("round %d -- got %d vectors \n", i, numobjs);
    for (int j = 0; j < numobjs; j++) {
      FreeAllocation(vectors[j]);
    }
  }
}

v3: dynamic version using sbrk

#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define NUMBEROFROUNDS 1000000
#define NUMBEROFOBJECTS 100

#define FLAG (struct PoolObject *)0x1

typedef struct {
  int x, y, z;
} Vector3;

typedef struct PoolObject {
  Vector3 obj;
  struct PoolObject *next;
} PoolObject;

PoolObject *freelist = NULL;

#define CHUCKSIZE 10

bool grow_pool() {
  size_t bytes_needed = sizeof(PoolObject) * CHUCKSIZE;
  bytes_needed = (bytes_needed + 15) & ~15;
  PoolObject *PoolObjects = sbrk(bytes_needed);
  if (PoolObjects == (void *)-1) {
    return false;
  }
  for (int i = 0; i < CHUCKSIZE - 1; i++) {
    PoolObjects[i].next = &(PoolObjects[i + 1]);
    PoolObjects[i].obj = (Vector3){0, 0, 0};
  }
  PoolObjects[CHUCKSIZE - 1].next = freelist;
  PoolObjects[CHUCKSIZE - 1].obj = (Vector3){0, 0, 0};
  freelist = &(PoolObjects[0]);
  return true;
}

Vector3 *GetAllocation(void) {
  if (freelist == NULL) {
    if (!grow_pool()) {
      return NULL; // if none available
    }
  }

  PoolObject *poolobj = freelist;
  freelist = freelist->next;
  poolobj->next = FLAG;
  return &(poolobj->obj);
}

void FreeAllocation(Vector3 *v) {
  PoolObject *returned =
      (PoolObject *)((char *)v - offsetof(PoolObject, obj)); // more robust
  if (returned->next != FLAG) {
    printf("Error: Attempting to free free memory");
    return;
  }
  returned->next = freelist;
  freelist = returned;
}

int main() {
  for (int i = 0; i < NUMBEROFROUNDS; i++) {
    int numobjs = rand() % NUMBEROFOBJECTS;
    Vector3 *vectors[numobjs];
    for (int j = 0; j < numobjs; j++) {
      vectors[j] = GetAllocation();
    }
    printf("round %d -- got %d vectors \n", i, numobjs);
    for (int j = 0; j < numobjs; j++) {
      FreeAllocation(vectors[j]);
    }
  }
}

Level3: Dynamic sized Allocator

Allows for values of variable sizes to be stored in memory thus replacing the fixed size issue
Solves internal fragmentation by splitting existing blocks to required size during allocation and external fragmentation merging free blocks when freeing the memory
Uses a doubly linked list as the data structure and a security canary to detect corruption or overflow

#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <unistd.h>

// A security "Magic Number." 
#define CANARYNUMBER 0xBAD

typedef struct Block {
    size_t size;        // Size of the USER payload area
    bool is_free;       // Status of the block
    struct Block *next; // Next block in the physical heap
    struct Block *prev; // Previous block (allows backward merging)
    unsigned int canary;
} Block;

Block *heap_start = NULL;

// O(N) Linear Search: First fit algorithm
Block *find_free_block(size_t size) {
    Block *current = heap_start;
    while (current) {
        if (current->is_free && current->size >= size) {
            return current;
        }
        current = current->next;
    }
    return NULL;
}

void check_corruption(Block *block) {
    if (block->canary != CANARYNUMBER) {
        printf("Critical Error: Memory corrupted at %p\n", (void*)block);
    }
}

void *myalloc(size_t size) {
    if (size <= 0) return NULL;
    //Memory alignment
    size = (size + 15) & ~15;

    Block *myblock = find_free_block(size);
    //block is already present
    if (myblock) {
        // Splitting Logic: 
        //If the block we found is way bigger than what we need,
        // we "carve" it into two pieces so the leftover space isn't wasted.
        if (myblock->size >= (size + sizeof(Block) + sizeof(int))) {
            // Address math: 'myblock + 1' jumps over the header to the data.
            // We cast to (char*) to move precisely by 'size' bytes.
            Block *new_block = (Block *)((char *)(myblock + 1) + size);

            new_block->size = myblock->size - (size + sizeof(Block));
            new_block->is_free = true;
            new_block->canary = CANARYNUMBER;

            // Re-threading the doubly linked list
            new_block->next = myblock->next;
            if (new_block->next) {
                new_block->next->prev = new_block;
            }
            new_block->prev = myblock;
            myblock->next = new_block;

            myblock->size = size; // Original block shrinks to fit the request
        }

        myblock->is_free = false;
        check_corruption(myblock);
        return (void *)(myblock + 1); // Return pointer to data area
    }

    // Expansion: No suitable hole found, so we grow the heap using sbrk.
    size_t totalsize = sizeof(Block) + size;
    myblock = sbrk(totalsize);
    if (myblock == (void *)-1) return NULL;

    myblock->size = size;
    myblock->is_free = false;
    myblock->canary = CANARYNUMBER;
    myblock->next = NULL;

    // Add new memory to the end of our linked list
    if (heap_start == NULL) {
        heap_start = myblock;
        myblock->prev = NULL;
    } else {
        Block *current = heap_start;
        while (current->next) {
            current = current->next;
        }
        current->next = myblock;
        myblock->prev = current;
    }
    return (void *)(myblock + 1);
}

void myfree(void *ptr) {
    if (!ptr) return;

    // Step back by sizeof(Block) to reach the header
    Block *myblock = (Block *)ptr - 1;
    check_corruption(myblock);
    myblock->is_free = true;

    // Forward Coalescing: If the next block is free, swallow it.
    // [This Block (Free)] + [Next Block (Free)] = [One Big Free Block]
    while (myblock->next && myblock->next->is_free) {
        myblock->size = myblock->size + sizeof(Block) + myblock->next->size;
        myblock->next = myblock->next->next;
        if (myblock->next)
            myblock->next->prev = myblock;
    }

    // 5Backward Coalescing: If the previous block is free, it swallows us.
    if (myblock->prev && myblock->prev->is_free) {
        myblock->prev->size += sizeof(Block) + myblock->size;
        myblock->prev->next = myblock->next;
        if (myblock->next)
            myblock->next->prev = myblock->prev;
        myblock = myblock->prev;  // The pointer now starts at the merged head
    }
}

Level 4: True malloc(kinda but we roll with it)

Segregated Fit Allocator that achieves $O(1)$ search time by pre-sorting free blocks into dedicated "Bins" based on their size.
implements Boundary Tag Footers, creating a "bi-directional" map that allows instant merging with neighbors regardless of which bin they are in.
By organizing free memory into size-based bins and using boundary tags, this allocator reduces allocation time and enables faster coalescing.
OS alignment for better cpu call cycles and Memory poisoning to ensure freed data cannot be retrieved.

#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define CANARYNUMBER 0xBAD
#define NUM_BINS 8
#define PAGE_SIZE 4096 // Standard OS page size

// The Footer sits at the end of every block
// so we can look 'backward' in memory to find the previous neighbor.
typedef struct {
    size_t size;
    bool is_free;
} Footer;

typedef struct Block {
    size_t size;        // Payload size
    bool is_free;
    struct Block *next; // Points to next block IN THE SAME BIN
    struct Block *prev; // Points to prev block IN THE SAME BIN
    unsigned int canary;
} Block;

// Segregated Bins: Instead of one long list, we have 8 mini-lists.
Block *bins[NUM_BINS] = {NULL};

// --- Logic: Size Classing ---
int get_bin_index(size_t size) {
    if (size <= 32)   return 0;
    if (size <= 64)   return 1;
    if (size <= 128)  return 2;
    if (size <= 256)  return 3;
    if (size <= 512)  return 4;
    if (size <= 1024) return 5;
    if (size <= 2048) return 6;
    return 7; // Catch-all for large blocks
}

// Every time a block's size or status changes, we must sync its footer.
void update_footer(Block *block) {
    Footer *f = (Footer *)((char *)(block + 1) + block->size);
    f->size = block->size;
    f->is_free = block->is_free;
}

// --- Logic: Bin Management ---
void add_to_bin(Block *block) {
    int idx = get_bin_index(block->size);
    block->next = bins[idx];
    block->prev = NULL;
    if (bins[idx]) bins[idx]->prev = block;
    bins[idx] = block;
}

void remove_from_bin(Block *block) {
    int idx = get_bin_index(block->size);
    if (block->prev) block->prev->next = block->next;
    else bins[idx] = block->next;
    if (block->next) block->next->prev = block->prev;
}

void *myalloc(size_t size) {
    if (size <= 0) return NULL;
    size = (size + 15) & ~15; // 16-byte alignment for CPU efficiency

    // SEARCH: Go directly to the correct bin.
    int start_bin = get_bin_index(size);
    Block *found = NULL;
    for (int i = start_bin; i < NUM_BINS; i++) {
        Block *curr = bins[i];
        while (curr) {
            if (curr->size >= size) { found = curr; break; }
            curr = curr->next;
        }
        if (found) break;
    }

    if (found) {
        remove_from_bin(found); // Unhook from its size category 
        //as it is not free anymore

        // SPLITTING: If the block is too big, carve out the excess.
        size_t needed = size + sizeof(Block) + sizeof(Footer);
        if (found->size >= (needed + 32)) {
            Block *new_b = (Block *)((char *)(found + 1) + size + sizeof(Footer));
            new_b->size = found->size - size - sizeof(Block) - sizeof(Footer);
            new_b->is_free = true;
            new_b->canary = CANARYNUMBER;
            update_footer(new_b);
            // Re-bin the leftover into a potentially smaller bin
            add_to_bin(new_b); 

            found->size = size;
        }
        found->is_free = false;
        update_footer(found);
        return (void *)(found + 1);
    }

    // PAGE-ALIGNED GROWTH: If bins are empty, ask the OS for a full page.
    size_t total_req = sizeof(Block) + size + sizeof(Footer);
    // Rounding up to 4096 minimizes expensive system calls.
    size_t page_aligned = (total_req + (PAGE_SIZE - 1)) & ~(PAGE_SIZE - 1);

    Block *new_block = sbrk(page_aligned);
    if (new_block == (void *)-1) return NULL;

    new_block->size = size;
    new_block->is_free = false;
    new_block->canary = CANARYNUMBER;
    update_footer(new_block);

    return (void *)(new_block + 1);
}

void myfree(void *ptr) {
    if (!ptr) return;
    Block *curr = (Block *)ptr - 1;
    check_corruption(curr);

    // SECURITY: Poisoning the memory with 0xAA. 
    // This makes 'Use-After-Free' bugs easy to spot in a debugger.
    memset(ptr, 0xAA, curr->size);
    curr->is_free = true;

    // PHYSICS-BASED COALESCING: Ignore the bins; look at the neighbors in RAM.

    // Forward: Just jump ahead by the current size to find the next header.
    Footer *curr_f = (Footer *)((char *)(curr + 1) + curr->size);
    Block *next_p = (Block *)((char *)curr_f + sizeof(Footer));
    if (next_p->canary == CANARYNUMBER && next_p->is_free) {
        remove_from_bin(next_p);
        curr->size += sizeof(Block) + next_p->size + sizeof(Footer);
    }

    // Backward: Look at the 8 bytes behind us—that's the previous block's Footer
    Footer *prev_f = (Footer *)((char *)curr - sizeof(Footer));
    if (prev_f->is_free && prev_f->size > 0) {
        Block *prev_p = (Block *)((char *)curr - sizeof(Footer) - prev_f->size - sizeof(Block));
        if (prev_p->canary == CANARYNUMBER) {
            remove_from_bin(prev_p);
            prev_p->size += sizeof(Block) + curr->size + sizeof(Footer);
            curr = prev_p;
        }
    }
    update_footer(curr);
    add_to_bin(curr);
}

Special case: Arena allocator:

Standalone Arena & Stack Allocator that bypasses the C standard library entirely by requesting raw memory pages directly from the Kernel via mmap.
operates as a Bump Allocator, using a single pointer that moves forward for $O(1)$ allocations, completely eliminating the need for expensive bin searches or metadata headers.
supports a reverting mechanism through "Markers," allowing you to "pop" the memory back to a previous state or "reset" the entire pool in a single CPU cycle.

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/mman.h>
#include <unistd.h>

typedef struct {
    uint8_t *buffer;   // Memory to store value
    size_t capacity;   // Total size
    size_t offset;     // where the next allocation goes
} Arena;

// A Marker is a 'Bookmark' that lets us roll back the offset to a specific spot.
typedef size_t Marker;
//creating the memory unit so that you can access it
Arena arena_create(size_t size) {
    Arena a = {0};

    // Page alignment
    size_t page_size = sysconf(_SC_PAGESIZE);
    a.capacity = (size + page_size - 1) & ~(page_size - 1);

    a.buffer = mmap(NULL, a.capacity, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    if (a.buffer == MAP_FAILED) {
        perror("mmap failed");
        a.buffer = NULL;
    }
    return a;
}

void *arena_alloc(Arena *a, size_t size) {
    // standard memory alignment
    size = (size + 15) & ~15;

    if (a->offset + size > a->capacity) {
        return NULL; 
    }

    // The 'Bump': Take the current offset, then move it forward by 'size'.
    void *ptr = &a->buffer[a->offset];
    a->offset += size;
    return ptr;
}

// Stack Functionality: The 'Accordion' effect
Marker arena_get_marker(Arena *a) { return a->offset; }

void arena_pop_to_marker(Arena *a, Marker m) {
    if (m <= a->offset) {
        a->offset = m;
    }
}

//  Arena Functionality: The 'Nuke' Reset
void arena_reset(Arena *a) { a->offset = 0; }

// Cleanup
void arena_destroy(Arena *a) {
    if (a->buffer) {
        munmap(a->buffer, a->capacity);
    }
}

This is still not a production-grade allocator, but it mirrors many of the core ideas used in real-world malloc implementations such as size segregation, boundary tags, and page-aligned growth.

Summary:

In conclusion, implementing my own malloc completely changed how I think about memory allocation. What seems like a simple API hides a web of trade-offs involving performance, safety, and fragmentation.

This project not only strengthened my understanding of system memory architecture, but also gave me a deeper appreciation for the engineering creativity behind the abstractions we use every day.

DEV Community

Malloc-ation: How I Built My Own malloc from Scratch

Abstract:

FYI

Code snippets

Level 1: Bump allocator