https://www.youtube.com/watch?v=kcsVhdHKupQ
Every time you call malloc, something happens that most programmers never think about. You asked for a hundred bytes. You got a pointer. But where did those bytes come from? And what happens when you give them back?
The Heap: Your Program's Scratch Space
Stack variables are easy. Allocate on function entry, gone on return. But malloc hands you memory from a completely different region — the heap. A giant pool where blocks get allocated and freed in any order.
Here's what most people miss: malloc doesn't just return a pointer to your data. It secretly stashes a header right before your allocation — a metadata block recording the size. That's how free knows how many bytes to reclaim. You never see it. But it's always there, eating a few extra bytes on every single allocation.
Where does the heap itself come from? On Linux, malloc calls one of two syscalls:
-
brk— pushes the heap boundary forward. Used for small allocations. -
mmap— grabs an entirely new region of virtual memory. Used for large allocations.
But here's the key insight: malloc doesn't call the OS every time. That would be painfully slow. Instead, it requests a large chunk upfront and carves it into smaller pieces. Malloc buys wholesale from the OS and sells retail to your program.
Free Lists: Where Freed Memory Goes
When you call free, your memory doesn't go back to the operating system. It goes onto a free list — a linked list of available blocks. Next time you call malloc, it walks the list looking for something that fits.
The search strategy matters:
| Strategy | How It Works | Tradeoff |
|---|---|---|
| First fit | Grab the first block big enough | Fast, but wastes large blocks on small requests |
| Best fit | Find the smallest block that works | Less waste, but scans the entire list |
When malloc finds an oversized block, it splits it — takes what it needs, puts the leftover back as a new smaller block.
The reverse happens on free. If the block next door is also free, malloc coalesces them — merges them into one larger block. Without coalescing, your memory shatters into thousands of tiny unusable fragments.
Split on allocate. Merge on free.
This is the heartbeat of every memory allocator.
Fragmentation: The Silent Killer
Even with splitting and merging, something terrible happens over time. After thousands of allocations and frees, your heap looks like Swiss cheese. Free blocks scattered everywhere.
This is external fragmentation. You might have two megabytes free total, but spread across a hundred tiny pieces. Need one contiguous megabyte? No single block is big enough. Two megs free, zero megs usable.
There's also internal fragmentation. You ask for 17 bytes, the allocator rounds up to 32 for alignment. Those 15 bytes? Wasted. Every allocation wastes a little. Across millions of allocations, it adds up.
This is why long-running programs — servers, databases, game engines — watch their memory usage slowly climb even when they're freeing everything correctly. The memory is technically free. It's just in the wrong shape.
Memory Arenas: Scaling to 32 Cores
The original malloc had one free list protected by one lock. Every thread that wanted memory waited in line. On a 32-core server, 31 threads sit idle while one allocates. This is lock contention, and it's a performance cliff.
Modern allocators solved this with arenas — multiple independent heap regions, each with its own free list and lock:
- ptmalloc (glibc) — gives each thread its own arena
- jemalloc — adds size classes (separate free lists for 16B, 32B, 64B, 128B blocks). Ask for 20 bytes, go straight to the 32-byte list. No searching. Constant time.
- tcmalloc (Google) — adds thread-local caches. The most common sizes are cached per-thread with zero locks. Only when the cache runs empty does it touch the shared arena.
This is why modern programs can do millions of allocations per second. Not because malloc is simple — because decades of engineering made it brutally fast.
Why You Should Care
Firefox switched from the system allocator to jemalloc and cut memory usage by 25%. Not by changing application code. Just by changing how memory blocks are managed.
Game engines pre-allocate everything at startup and use custom allocators during gameplay. One stray malloc in a render loop can cause a frame drop — at 60 FPS, each frame gets 16 milliseconds. A single lock contention can eat five of those.
And in languages like Python, Java, and Go? Malloc is still there, hidden under the garbage collector. Every object creation, every string concatenation, every list append is a malloc call underneath. The GC decides when to free. But malloc decides where things go.
Every variable you've ever created passed through something like this. Now you know what happens when it does.
Watch the full animated breakdown: malloc: Secret Memory Dealer in Your Code
Neural Download — visual mental models for the systems you use but don't fully understand.
Top comments (0)