DEV Community

moonlitpath
moonlitpath

Posted on

Implementing Bins - Phase 7 Mini Malloc

Before starting with phase 7, I would like to share this piece of knowlegde I gained that helped me connect with my my mini malloc project better-

Today, while I was discussing this project with my mentor, I realized a very important concept.

So, imagine this is the memory block:

+--------+--------+--------+--------+--------+--------+
| Alloc  |  Free  | Alloc  |  Free  | Alloc  |  Free  |
|  32B   |  64B   |  16B   | 128B   |  24B   |  48B   |
+--------+--------+--------+--------+--------+--------+
Enter fullscreen mode Exit fullscreen mode

Total memory = 312B

Let
A = Set of allocated blocks

F = Set of free blocks

Then
A ∪ F = Total memory chunk = Universal set
and
A ∩ F = ∅ = null set

Mathematically:
n(A) = 72
n(F) = 240
=> n(A U F) = 72 + 240 = 312
=> n(A ∩ F) = 312 - (72 + 240) = 0

I found this really interesting. We have all studies set theory in math. Seeing it being applied in real life is truly fascinating.

My mentor also taught me another important concept -- How do I ensure that the code I design is reliable or not?

My malloc code is reliable if and only if

A ∪ F = Total memory chunk AND A ∩ F = ∅

And this is correct. Logically, as well as mathematically. Think about it. A memory allocator code that does not fulfill this case leads to a segmentation fault (invalid memory access). There shouldn't be any lost memory, untracked by both sets. There shouldn't be an overlap between both sets. For all cases, whether the number of allocations was 1 or infinite, this condition must hold true.

Researchers, that design code, follow a similar mechanism.
They define-

  • Invariants: Conditions that must always hold
  • Preconditions: Conditions that must be true before the code runs
  • Postconditions: Conditions that must be true after the code runs

After that, they prove it mathematically or enforce them and then test them aggressively.

In systems programming, this is often called - reasoning with invariants.

So interesting right? :)

─────────𓇼 ⋆.˚ 𖦹 ⋆。°✩─────────



Let's start with Phase 7!

Issues with current freeing mechanism

Each time we want to free, we have to traverse the entire list.
Time Complexity: O(n)

Glibc maintans multiple free list based on size. These are called bins.

Bin type Size range Behavior
fastbins ≤80 bytes No coalescing (super fast), LIFO
smallbins 16-512 bytes Exact size buckets, FIFO
largebins >512 bytes Range-based buckets

Thus, when you do malloc(16) and then free it, you need to search one specific bin. Not entire LL.

─────────𓇼 ⋆.˚ 𖦹 ⋆。°✩─────────

Observing Issues:

To observe this issue in real life, I created a simple C program allocating small malloc blocks

int main()
{
        for(int i=0; i<1000000; i++) malloc(32);
        return 0;
}

Enter fullscreen mode Exit fullscreen mode

First, I ran it using my allocator

The program was taking O(n2) complexity.
Which means for 1 million iterations, it had to traverse a O(n) complexity each time. The amount of work to be done is crazy.

~/anu/programming/systems_progg/mini-malloc$ time LD_PRELOAD=./malloc_v6.so ./out
^C
real    32m17.768s
user    32m16.111s
sys 0m0.710s


Enter fullscreen mode Exit fullscreen mode

I checked it after 5 minutes, and decided to check it using top


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
  11926 anu       20   0   19328  17976   1124 R  99.7   0.1   5:00.83 out      
   3977 anu       20   0 5319380 336968 140652 S   9.3   2.2   5:02.18 gnome-s+ 

Enter fullscreen mode Exit fullscreen mode

I let it run for 30 mintutes to see if it finishes, and it didnt. Observe that the virtual memory size has increased from 19MB to 43 MB. And the CPU% stayed above 90% throughout.
I decided to kill my program using Ctrl + C.


    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND  
  11926 anu       20   0   43168  41816   1124 R  90.9   0.3  32:06.04 out      
   3977 anu       20   0 5343972 340916 142616 S  18.2   2.2   5:24.09 gnome-s+ 

Enter fullscreen mode Exit fullscreen mode

Then, I ran the same program using glibc, and the difference is massive.

anu@laptop:~/anu/programming/systems_progg/mini-malloc$ time ./out

real    0m0.075s
user    0m0.039s
sys 0m0.035s

Enter fullscreen mode Exit fullscreen mode

What my allocator couldn't achive in 30 minutes, glibc's malloc achived in milliseconds.

This experiment depicts the importance of bins.

glibc uses different bins for different types of allocations.
For small allocations like this, data is stored in a fastbin. Which is based on LIFO principle. Memory can instantly be allocated and freed without having to search through the list.

Using bins, the search time for find_free_bin() should be reduced from O(n) to O(1).

─────────𓇼 ⋆.˚ 𖦹 ⋆。°✩─────────

How glibc implements fastbins

Glibc has a fastbin index macro, that is the core logic behind fastbins.

#define fastbin_index(sz) \
  ((((unsigned int)(sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
Enter fullscreen mode Exit fullscreen mode

So, what's happening here?

  1. sz is an integer that contains the size to be allocated
  2. it is type casted to an unsigned bit
  3. then, we the bitwise right shift operator is used.
 Eg.   32 in binary              --> 0010 0000
       after bitwise right shift --> 2
Enter fullscreen mode Exit fullscreen mode

4.SIZE_SZ is malloc's variable that defines architecture type.

Architecture SIZE_SZ Bits shifted
64-bit 8 4
32-bit 4 3

5.So based on the architecture, bits are shifted
Now, shifting 4 bits to the right is exactly the same as dividing by 16 (24), but much faster since CPU does it in one clock cycle.

Why 16?
In 64 bit Linux, every chunk must be a multiple of 16 due to alignment requirement.
And all 16 bit numbers have last 4 numbers as 0s.
So, important information is never lost.

16  >> 4 = 1
32  >> 4 = 2
48  >> 4 = 3
64  >> 4 = 4
Enter fullscreen mode Exit fullscreen mode

Each time, you get clear integers.

6.And finally 2 is subtracted.
Why?
The minimum fastbin chunk size is 32, since it is the smallest meaningful allocation with metadata.
Thus, the index 0 must correspond to 32b, not 16b.

this is how indexes are stored

size  | binary       | >> 4 | - 2 | bin index | what's stored
------|--------------|------|-----|-----------|-------------------------
 32   | 0010 0000    |  2   |  0  |  bins[0]  | free list of 32B chunks
 48   | 0011 0000    |  3   |  1  |  bins[1]  | free list of 48B chunks
 64   | 0100 0000    |  4   |  2  |  bins[2]  | free list of 64B chunks
 80   | 0101 0000    |  5   |  3  |  bins[3]  | free list of 80B chunks
128   | 1000 0000    |  8   |  6  |  bins[6]  | free list of 128B chunks

Enter fullscreen mode Exit fullscreen mode

the -2 shifts the array so it starts at the correct place.

So, fastbins is just this in memory:

fastbins[0] → [32B chunk] → [32B chunk] → NULL
fastbins[1] → [48B chunk] → NULL
fastbins[2] → [64B chunk] → [64B chunk] → [64B chunk] → NULL
fastbins[3] → NULL
...
Enter fullscreen mode Exit fullscreen mode

What malloc(64) does:

int idx = fastbin_index(64);  // (64 >> 4) - 2 = 2
if (fastbins[idx] != NULL) {
    block = fastbins[idx];       // grab head instantly
    fastbins[idx] = block->next; // pop it off
    return block;                // done, O(1)
}
// otherwise sbrk...
Enter fullscreen mode Exit fullscreen mode

this is much much more faster than carding through every single node.

--

Smallbins work similar to fastbins, but they contain 63 bins, and range from size 128–1008 bytes

Large bins are more sneaky.
When you free a mid - large chunk, it goes to the unsorted bins area, and it is sorted in the correct bin by the next malloc. This lazy evaluation of heap defers sorting until it is actually needed.


Clues:

Here are your clues — no code, just the shape of the problem.

What needs to change

Three things touch the free list in your allocator. Find them all before writing a single line:

  1. Where free blocks are stored (currently head/tail globals)
  2. Where free blocks are looked up (find_free_block)
  3. Where free blocks are returned (free) Every change lives in exactly those three places. --- ##### Clue 1 — The new data structure Replace head/tail with an array. Each slot is an independent singly-linked list. Think about: how many slots do you need, and what size does each slot correspond to? ##### Clue 2 — The index You already understand (sz >> 4) - 2. But your block->size is the user size, not the total chunk size. Does that matter for which bin you pick? Also think about: what happens if someone calls malloc(9)? It's not a multiple of 16. Does your index still work cleanly? ##### Clue 3 — The fallback Fastbins only cover sizes up to 128 bytes. What should happen for sizes above that? You already have working code for that case. ##### Clue 4 — LIFO vs your current list Fastbins are LIFO (last freed = first reused). That's actually simpler than what you have now — pushing and popping from the head only, no tail pointer needed. Start with the data structure change and find_free_block. Show me when you have a first attempt. 🔥

#todo
I decided to post this simply because i was delaying everything, simply because I couldn't finish everything. I will come back to fastbins soon.

─────────𓇼 ⋆.˚ 𖦹 ⋆。°✩─────────



⟡ Author's Note:
I'd gotten into a bad slump post sem exams, so I could barely write or code (っ- ‸ - ς)
I researched a lot of stuff regarding this topic, so I decided to simply post it, and come back to it later.
I've been recovering, and will be back to my actual pace soon!
(૭ 。•̀ ᵕ •́。 )૭

Top comments (0)