Most of us use malloc() without thinking about what happens underneath.
I decided to implement my own memory allocator in C to understand it better. This wasn’t for production use, just to learn how allocation, fragmentation, and concurrency actually behave in practice.
I also benchmarked it against glibc’s malloc to see where it stands.
Implementation Overview
My allocator currently includes:
- Thread-local cache
- Free lists (bins) for different size ranges
- Direct
mmapfor larger allocations - A custom
realloc()implementation
Benchmark Results
glibc malloc
Single-threaded:
- alloc + free (1M iterations): ~26 ms
- batch alloc/free:3.40ms/0.95ms
- mixed sizes: ~2.5 ms
Multi-threaded:
- 8 threads: ~57 ms
My Allocator
Single-threaded:
- alloc + free (1M iterations): ~83 ms
- batch alloc/free:1.46ms/0.50ms (faster than glibc)
- mixed sizes: ~126 ms
Multi-threaded:
- 8 threads: ~791 ms
What Worked
Batch allocation and free operations were faster than glibc.
This likely comes from:
- simpler logic in the fast path
- low per-operation overhead
So in very controlled scenarios, a simple allocator can outperform a general-purpose one.
Where It Struggled
Mixed Allocation Sizes
Performance dropped heavily when handling mixed sizes.
The main issue was my bin design:
- limited number of bins
- coarse grouping of sizes
This leads to:
- poor fit for requested sizes
- more fragmentation
- additional overhead during allocation
glibc avoids this with more refined size classes.
Multithreading
This was the biggest weakness.
Even with thread-local caches, I ran into issues:
- shared access to heap structures
- contention when falling back to global data
I tried:
- global locks
- per-bin locks
Both increased complexity, and debugging became harder.
realloc() Bug
The most difficult issue I faced was in realloc().
I initially made a mistake:
- allocating a new block using the old size
- instead of handling cases where the new size is smaller
This caused:
- memory corruption
- segmentation faults later in execution
The correct behavior:
- if
new_size <= old_size, shrink in place - only allocate a new block when expansion is required
Fixing this resolved the crashes.
Debugging Experience
At one point, I removed locking entirely because debugging became too difficult.
The issue turned out not to be concurrency, but incorrect logic in realloc().
Using gdb helped identify the exact failure point.
One key takeaway:
Allocator bugs often don’t crash immediately.
They corrupt memory and fail later, which makes debugging harder.
Key Takeaways
- Simple designs can perform well in specific cases, but don’t scale
- Handling mixed allocation sizes efficiently requires better size class design
- Thread-local caching helps, but doesn’t eliminate shared state
- Concurrency adds complexity, especially when debugging
- Tools like
gdbare essential for low-level debugging
Next Steps
If I continue working on this allocator, I plan to:
- improve size class handling
- introduce per-thread arenas
- reduce contention in shared structures
Final Thoughts
This project gave me a much clearer understanding of:
- how allocators manage memory
- why fragmentation and contention matter
- why production allocators are complex
It’s one thing to read about memory allocation, and another to implement it and deal with its edge cases.
If you're interested in systems programming, building a memory allocator is a worthwhile exercise.
Top comments (0)