DEV Community

Cover image for Implementing malloc() and free() — memory alignment
Adam Brandizzi
Adam Brandizzi

Posted on • Originally published at suspensao.blog.br on

Implementing malloc() and free() — memory alignment

Our implementation of malloc() and free() has progressed significantly. In the last post, we saw, for example, how to unify smaller memory blocks to avoid fragmentation. To avoid even more problems in this area, we’re going to address an important performance detail that we’ve been neglecting for a while: memory alignment.

What is memory alignment?

Modern computers read fixed-size memory blocks, corresponding to the processor’s word size. A 32-bit processor, for example, reads four bytes at a time, while a 64-bit processor reads eight. However, this reading only occurs from an aligned address—that is, a 32-bit processor can only read its four bytes if the memory address is a multiple of four.

If the address is not a multiple of the word size, the processor will need to access memory twice: first to fetch the first part of the word and then for the second. This becomes even worse if the allocator returns misaligned pointers, as all subsequent nodes will also be misaligned, doubling the number of memory accesses on the heap.

Ensuring allocated blocks’ alignment

To avoid this problem, it is necessary to allocate a multiple of the processor’s word size. However, determining this size can be complicated. A simple and effective solution for our case is to adopt multiples of 8 bytes, since the most common bus currently is 64 bits. Thus, all allocated block addresses must be multiples of 8. Let’s then define a constant for the alignment size:

define _ABMALLOC_ALIGNMENT_SIZE 8
Enter fullscreen mode Exit fullscreen mode

In the abmalloc()function, we check if the requested size is aligned, that is, if it is a multiple of _ABMALLOC_ALIGNMENT_SIZE. If it is already aligned, we don’t need to do anything:

void *abmalloc(size_t size) {
  if (size == 0) {
    return NULL;
  }
  size_t rest = size % _ABMALLOC_ALIGNMENT_SIZE;
Enter fullscreen mode Exit fullscreen mode

If the requested size is not aligned, we need to increase it. First, we calculate the remainder of dividing the requested size by _ABMALLOC_ALIGNMENT_SIZE and subtract this value from the originally requested size. Then, we add _ABMALLOC_ALIGNMENT_SIZE. This ensures that the new size is the smallest possible multiple of the required alignment.

if (rest != 0) {
  size = size - rest + _ABMALLOC_ALIGNMENT_SIZE;
}
Enter fullscreen mode Exit fullscreen mode

Done! Now, abmalloc() always returns aligned addresses.

A question and a perk

A valid question is: won’t this result in wasted memory? On platforms with 32-bit buses, for example, this could lead us to allocate 7 unused bytes when we could allocate only 3 unused bytes. Here, we assume this isn’t a problem, but there’s a relatively simple solution: just change the value of the constant to 4, for example, and recompile our code.

This adjustment brings another advantage: very small blocks cease to be allocated. For example, if someone requests a block of only 1 byte, it would hardly be reused, since a single byte is rarely needed. (Ideally, such small blocks should never be allocated, but this is a user decision.) With a minimum size of 8 bytes, these smaller blocks, when released, will have a greater chance of being reused.

Conclusion

With that, we conclude our experiment. This is not a very good implementation of malloc(). For starters, it’s not thread-safe. Furthermore, modern implementations utilize much more efficient techniques and heuristics, such as binning. There are also some very interesting advancements in research, such as Mesh.

Our goal was simply to learn a little more, through practice and gradually evolving code, about how memory allocation works. Let us know in the comments if you gained a better understanding of the topic! And if you’d like to explore further, you can check out the final result in the GitHub repository.

Top comments (0)