Before starting with phase 7, I would like to share this piece of knowlegde I gained that helped me connect with my my mini malloc project better-
Today, while I was discussing this project with my mentor, I realized a very important concept.
So, imagine this is the memory block:
+--------+--------+--------+--------+--------+--------+
| Alloc | Free | Alloc | Free | Alloc | Free |
| 32B | 64B | 16B | 128B | 24B | 48B |
+--------+--------+--------+--------+--------+--------+
Total memory = 312B
Let
A = Set of allocated blocks
F = Set of free blocks
Then
A ∪ F = Total memory chunk = Universal set
and
A ∩ F = ∅ = null set
Mathematically:
n(A) = 72
n(F) = 240
=> n(A U F) = 72 + 240 = 312
=> n(A ∩ F) = 312 - (72 + 240) = 0
I found this really interesting. We have all studies set theory in math. Seeing it being applied in real life is truly fascinating.
My mentor also taught me another important concept -- How do I ensure that the code I design is reliable or not?
My malloc code is reliable if and only if
A ∪ F = Total memory chunkANDA ∩ F = ∅
And this is correct. Logically, as well as mathematically. Think about it. A memory allocator code that does not fulfill this case leads to a segmentation fault (invalid memory access). There shouldn't be any lost memory, untracked by both sets. There shouldn't be an overlap between both sets. For all cases, whether the number of allocations was 1 or infinite, this condition must hold true.
Researchers, that design code, follow a similar mechanism.
They define-
- Invariants: Conditions that must always hold
- Preconditions: Conditions that must be true before the code runs
- Postconditions: Conditions that must be true after the code runs
After that, they prove it mathematically or enforce them and then test them aggressively.
In systems programming, this is often called - reasoning with invariants.
So interesting right? :)
Let's start with Phase 7!
Issues with current freeing mechanism
Each time we want to free, we have to traverse the entire list.
Time Complexity: O(n)
Glibc maintans multiple free list based on size. These are called bins.
| Bin type | Size range | Behavior |
|---|---|---|
| fastbins | ≤80 bytes | No coalescing (super fast), LIFO |
| smallbins | 16-512 bytes | Exact size buckets, FIFO |
| largebins | >512 bytes | Range-based buckets |
Thus, when you do malloc(16) and then free it, you need to search one specific bin. Not entire LL.
Observing Issues:
To observe this issue in real life, I created a simple C program allocating small malloc blocks
int main()
{
for(int i=0; i<1000000; i++) malloc(32);
return 0;
}
First, I ran it using my allocator
The program was taking O(n2) complexity.
Which means for 1 million iterations, it had to traverse a O(n) complexity each time. The amount of work to be done is crazy.
~/anu/programming/systems_progg/mini-malloc$ time LD_PRELOAD=./malloc_v6.so ./out
^C
real 32m17.768s
user 32m16.111s
sys 0m0.710s
I checked it after 5 minutes, and decided to check it using top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11926 anu 20 0 19328 17976 1124 R 99.7 0.1 5:00.83 out
3977 anu 20 0 5319380 336968 140652 S 9.3 2.2 5:02.18 gnome-s+
I let it run for 30 mintutes to see if it finishes, and it didnt. Observe that the virtual memory size has increased from 19MB to 43 MB. And the CPU% stayed above 90% throughout.
I decided to kill my program using Ctrl + C.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11926 anu 20 0 43168 41816 1124 R 90.9 0.3 32:06.04 out
3977 anu 20 0 5343972 340916 142616 S 18.2 2.2 5:24.09 gnome-s+
Then, I ran the same program using glibc, and the difference is massive.
anu@laptop:~/anu/programming/systems_progg/mini-malloc$ time ./out
real 0m0.075s
user 0m0.039s
sys 0m0.035s
What my allocator couldn't achive in 30 minutes, glibc's malloc achived in milliseconds.
This experiment depicts the importance of bins.
glibc uses different bins for different types of allocations.
For small allocations like this, data is stored in a fastbin. Which is based on LIFO principle. Memory can instantly be allocated and freed without having to search through the list.
Using bins, the search time for find_free_bin() should be reduced from O(n) to O(1).
How glibc implements fastbins
Glibc has a fastbin index macro, that is the core logic behind fastbins.
#define fastbin_index(sz) \
((((unsigned int)(sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
So, what's happening here?
- sz is an integer that contains the size to be allocated
- it is type casted to an unsigned bit
- then, we the bitwise right shift operator is used.
Eg. 32 in binary --> 0010 0000
after bitwise right shift --> 2
4.SIZE_SZ is malloc's variable that defines architecture type.
| Architecture | SIZE_SZ | Bits shifted |
|---|---|---|
| 64-bit | 8 | 4 |
| 32-bit | 4 | 3 |
5.So based on the architecture, bits are shifted
Now, shifting 4 bits to the right is exactly the same as dividing by 16 (24), but much faster since CPU does it in one clock cycle.
Why 16?
In 64 bit Linux, every chunk must be a multiple of 16 due to alignment requirement.
And all 16 bit numbers have last 4 numbers as 0s.
So, important information is never lost.
16 >> 4 = 1
32 >> 4 = 2
48 >> 4 = 3
64 >> 4 = 4
Each time, you get clear integers.
6.And finally 2 is subtracted.
Why?
The minimum fastbin chunk size is 32, since it is the smallest meaningful allocation with metadata.
Thus, the index 0 must correspond to 32b, not 16b.
this is how indexes are stored
size | binary | >> 4 | - 2 | bin index | what's stored
------|--------------|------|-----|-----------|-------------------------
32 | 0010 0000 | 2 | 0 | bins[0] | free list of 32B chunks
48 | 0011 0000 | 3 | 1 | bins[1] | free list of 48B chunks
64 | 0100 0000 | 4 | 2 | bins[2] | free list of 64B chunks
80 | 0101 0000 | 5 | 3 | bins[3] | free list of 80B chunks
128 | 1000 0000 | 8 | 6 | bins[6] | free list of 128B chunks
the -2 shifts the array so it starts at the correct place.
So, fastbins is just this in memory:
fastbins[0] → [32B chunk] → [32B chunk] → NULL
fastbins[1] → [48B chunk] → NULL
fastbins[2] → [64B chunk] → [64B chunk] → [64B chunk] → NULL
fastbins[3] → NULL
...
What malloc(64) does:
int idx = fastbin_index(64); // (64 >> 4) - 2 = 2
if (fastbins[idx] != NULL) {
block = fastbins[idx]; // grab head instantly
fastbins[idx] = block->next; // pop it off
return block; // done, O(1)
}
// otherwise sbrk...
this is much much more faster than carding through every single node.
--
Smallbins work similar to fastbins, but they contain 63 bins, and range from size 128–1008 bytes
Large bins are more sneaky.
When you free a mid - large chunk, it goes to the unsorted bins area, and it is sorted in the correct bin by the next malloc. This lazy evaluation of heap defers sorting until it is actually needed.
Clues:
Here are your clues — no code, just the shape of the problem.
What needs to change
Three things touch the free list in your allocator. Find them all before writing a single line:
- Where free blocks are stored (currently
head/tailglobals) - Where free blocks are looked up (
find_free_block) - Where free blocks are returned (
free) Every change lives in exactly those three places. --- ##### Clue 1 — The new data structure Replacehead/tailwith an array. Each slot is an independent singly-linked list. Think about: how many slots do you need, and what size does each slot correspond to? ##### Clue 2 — The index You already understand(sz >> 4) - 2. But yourblock->sizeis the user size, not the total chunk size. Does that matter for which bin you pick? Also think about: what happens if someone callsmalloc(9)? It's not a multiple of 16. Does your index still work cleanly? ##### Clue 3 — The fallback Fastbins only cover sizes up to 128 bytes. What should happen for sizes above that? You already have working code for that case. ##### Clue 4 — LIFO vs your current list Fastbins are LIFO (last freed = first reused). That's actually simpler than what you have now — pushing and popping from the head only, notailpointer needed. Start with the data structure change andfind_free_block. Show me when you have a first attempt. 🔥
#todo
I decided to post this simply because i was delaying everything, simply because I couldn't finish everything. I will come back to fastbins soon.
⟡ Author's Note:
I'd gotten into a bad slump post sem exams, so I could barely write or code (っ- ‸ - ς)
I researched a lot of stuff regarding this topic, so I decided to simply post it, and come back to it later.
I've been recovering, and will be back to my actual pace soon!
(૭ 。•̀ ᵕ •́。 )૭
Top comments (0)