naens

Posted on

# Generalized Fibonacci Memory Allocator

Description of the work on the memory allocator https://github.com/naens/mem_alloc

\pagebreak

# Introduction

Hello. In this post I will describe a memory allocator that I have
written. the main purpose for the memory allocator was to learn how it
works, and also to have a way to allocate and to deallocate memory for
assembly programs that I write for CCP/M-86 in assembly language. The
goal was to write it in assembly, but in order to be sure that I can
have something that works, I decided to make a test version in C in
order to be sure that I understand how it works.

# Memory Allocation

Now I will describe why I need a memory allocator. It is needed to be
able to get some part of memory in order to store a dynamic data
structure, like a string for example. If the user has the possibility
to input a string, we cannot know in advance its size, so we get a
large buffer. But once we know it, it would be better to store only
what is needed. So, if, for example we have many strings of different
lengths, it would be better to allocate the memory depending on the
length of the string.

There are, of course many other examples why a memory allocator might
be needed. Many data structures, like trees and lists are often
implemented by using dynamic memory allocation.

Other use is to be able to store temporary, intermediate data. For
example, we read a file and generate a list of strings from it. Then
we can filter or concatenate them and finally insert the results into
a hash table or write them into a file. During all these operations,
we might need to generate some intermediate data structures. So, in
order to be able to reuse the memory that had been used by these
structures again, it is useful to free memory if it's not used
anymore.

This is exactly what memory allocators do: you can allocate an amount
of memory space you need, and once you don't need it anymore, you can
return it back to the allocator, so it can be used for other
allocations.

# Generalized Fibonacci Allocation

decided that the allocator which uses the generalized Fibonacci
sequence would be alright for my purpose: it does not waste a lot of
memory (in average approximately 82.2% of the memory is used) and does
not look too complicated to code (just some additions and subtractions

So what is a generalized Fibonacci sequence? The usual Fibonacci sequence, which is very common in books about programming languages is the sequence described by the formula an = an-1 + an-2. So, a generalized Fibonacci sequence is when the last item has a larger number than 2. In my allocator I use the sequence generated by an = an-1 + an-4. The main advantage is that the difference between the number is not so big, which makes it use the memory space more efficiently. Another common memory allocator uses the formula an = 2 * an-1, which in some way is also a generalized Fibonacci sequence.

# Implementation Principles

## Blocks

My allocator specifies the size of the items it allocates in blocks of
8 bytes. The main reason is to be able to store three additional flag
bits in the size field in the item. It is important for space
efficiency: otherwise I would lose some bytes for each allocation and
I also wanted to have the returned area aligned for the needs of the
OS, we cannot know: if the user decides to use it in order to store a
structure or a multi-byte variable there, it would better be aligned.
So if the OS returns aligned memory, my allocator will also return
aligned memory. Otherwise it does not matter.

## Array of Free Lists

The memory allocator uses an array of free lists. A free list is a
linked list of items of the same size, which is a number from the
generalized Fibonacci sequence multiplied by the size of the block.
It is not the area usable by the user: it also includes the header,
which contains the total size of the item.

In the array the items are stored in the increasing order. The
structures contained in the array are called cells in my
implementation. They contain a field which specifies the size of the
corresponding free list, and a field which is a pointer to the head of
the free list.

## Buddies

This is a buddy memory allocator, which is handy with a generalized
Fibonacci sequence: we can create a larger item by simply combining it
with another item of a smaller size.

So this is how the buddy system works: we first allocate some memory
from the Operating System and put it into the array in an appropriate
free list. When the user wants some memory, we can split the memory
available into buddies, where each buddy is of the size a number of
the sequence multiplied bu the number of bytes in the block. After
splitting, we can insert the buddy which is not used anymore into its
free list, since it also has the appropriate size.

When freeing, it's the opposite that happens: we want to return the
buddy to the free list, but if it's buddy is not in use, we can first
merge it with its buddy and insert the merged block into the
corresponding free list. The generalized Fibonacci sequence
guarantees us that whenever we split or merge items, we have something
that has a corresponding free list. In order to know how to merge and
how to find the buddy, we use flags, which will be explained later.

## The Dynamic Array

The array that contains the free lists has to be able to grow if more
items need to be allocated. That's why I use a dynamic array. Its
capacity is increased when more items need to be added. In this
situation another area big enough to hold the array is allocated and a
new array is created in this area, which is a copy of the old array,
but with greater capacity. The old array is freed. The good thing
about the array is that it only use the allocator: it does not need
special Operating System calls.

# Implementation Details

In this section I will describe in more detail how everything works.

## Data Structures

### item

The item is a structure which is the node of the free list, a doubly
linked list. The item has two modes of existence: free and used.
When it's free, it is in the free list and contains pointers to the
previous and the next items. When it's used it contains the area

But whatever the mode is, there is always a header. It contains one
single field of the size of the pointer. This field contains the size
of the item in blocks and 3 flags: `lr_bit`, `inh_bit`, and `in_use`
bit. The `in_use` bit tells us whether the item is used or not. The
`ls_bit` and the `inh_bit` are needed in order to know how to merge
buddies: the buddy can be the left buddy or the right buddy, so we
might need to go to the right or to the left in order to find the
budd.

The item is not a C structure. It's an area of type `*void`, for
which accessor functions are used. So here are some examples of
accessor functions:

``````uintptr_t
item_get_size(void *item)
{
uintptr_t size_field = ((uintptr_t*)item)[0];
return size_field >> 3;
}

void
item_set_size(void *item, uintptr_t size)
{
uintptr_t size_field = ((uintptr_t*)item)[0];
uint8_t flags = size_field & 7;
uintptr_t new_size_field = flags | (size << 3);
((uintptr_t*)item)[0] = new_size_field;
}
``````

Here are functions for getting and setting the size of the item. As
you can see, it's located at the very beginning of the item, at index
0, and it occupies the `sizeof(uintptr_t) - 3` high-order bits of the
field.

Here's another example:

``````boolean
item_get_inh_bit(void *item)
{
return (((uintptr_t*)item)[0] & 1) != 0;
}

void
item_set_inh_bit(void *item, boolean in_use)
{
uintptr_t size_field = ((uintptr_t*)item)[0] & (~(uintptr_t)1);
uint8_t tmp = in_use ? 1 : 0;
uintptr_t new_size_field = size_field | tmp;
((uintptr_t*)item)[0] = new_size_field;
}
``````

Now we need to set one single bit. It's the smallest bit in the
field, so we use 1 here.

And another important thing, is the area. The area is the part of the
item which is returned to the user. This is how we do it:

``````void*
item_get_area(void *item)
{
return &((void**)item)[1];
}
``````

When the user returns the area to us, using the `mem_free` function,
we need to get the address of the item:

``````void*
item_from_area(void *area)
{
return &((void**)area)[-1];
}
``````

We assume that the user didn't cheat and didn't write beyond the area,
which means that we trust him, and our data is still the way how we
left it. This is the notion of cooperation, which comes very often in
the field of programming.

### The cell structure

Now let's talk about the next structure, which contains items, namely,
the cell. Here is its definition:

``````struct cell {
uintptr_t size;
void *items;
};
``````

It contains the size and the pointer to the first item of the free
list. As you can see, there is a size field in the cell structure, as
well as a size in the header of the items. It must match whenever the
items are in the free list. This duplication is needed because the
list might be empty, which means, we need a way to know the size.
Also, when items are not in the free list, when they are in use, we
need to know where to return them.

### The array

``````struct array {
struct cell *data;
unsigned int size;
unsigned int capacity;
};
``````

Actually, the array is also a structure, and the array is one of its
field: `data`. Here also nothing is complicated. As I already
explained, there is a capacity, which tells how many elements can be
stored in the array, and the `size`, which tells its current size.

### memlist

When we get some chunks of memory from the Operating System, we
organize them into a linked list in order to be able to free them when
needed. So, every time we use the first size-of-pointer bytes of the
memory area we receive to store the address of the next chunk. This
way, if the Operating System requires us to free all the allocated
memory before exiting the application, we free it by using this linked
list.

Here are the lines of code that do it:

``````while (mem_list != NULL)
{
void *tmp = mem_list;
mem_list = *((void**)mem_list);
free(tmp);
}
``````

## Allocation from the Operating System

We need a source for the memory in order to give it to the user. For
this we use the memory allocator of the Operating System. But we
don't know how good it is, which means, we should not rely on it too
much. We should only use it when we need some memory, and only for
large chunks of memory.

For this reason I decided to impose the following rules:

• There is a minimum amount that we should ask from the OS, which is 64 times the size of the pointer.
• When we ask, we ask for a larger amount than the previous time.
• We do not return memory to the Operating System until the very end, when we free everything at the end of the application.

This way we do not assume that the memory allocator of the Operating
System is good or efficient. It's our job to make the memory
allocation work. The memory allocator of the Operating system can be
extremely simple. It can even not be an allocator at all, but just a
pointer in a large amount of memory, since we don't need any complex
functionality from it. It can very well be the Transient Program Area
(TPA) of CP/M-80.

## The first Items in the Array

The first four items in the array cannot be split because there is
nothing smaller. This means, in case we need to allocate a lot of
small items, it's better to have these unsplittable items as small as
possible in order to no waste space.

So let's calculate the minimal size, which will be the size of items
of the first free list in the array.

For 64-bit we have blocks of 8 bytes and pointers also of 8 bytes. An
item has to contain a header, a next pointer and a previous pointer.
Together it makes 24 bytes, which can be stored in 3 blocks. Thus
the first size will be 3.

For 32 bits blocks are of 8 bytes (it doesn't change) and pointers are
of 4 bytes. Three pointer-sized fields need 12 bytes, which we can
put in 2 blocks (16 bytes). That is, the smallest size on a 32-bit
Operating System will be 2.

For a 16-bit Operating System it's similar. Pointers are 2 bytes and
blocks are 8 bytes. I use 2 bytes for pointers because I don't want
to make things complicated by using segments. So we can put 6 bytes
(3 fields) into a single block. Which makes the smallest size for
items 1.

## The Buddy System

When we allocate memory, we get a chunk of memory that me might have
to split into buddies, and only one of the buddies will be returned to
the user.

When we free memory, we insert an item into the array and it's
possible that we might have to merge it recursively with buddies if
they are not in use.

So the main problem is to find the buddy of an item. For this we use
two flags: the `lr_bit`, and the `inh_bit`. The `lr_bit` tells us if
the buddy is a left buddy or a right buddy. The `inh_bit` is used to
restore the `lr_bit` and the `inh_bit` of the parent buddy, so that if
we merge, we know if it's a left buddy or the right buddy.

When splitting an item, we set the `inh_bit` of the left child to the
`lr_bit` of the parent and the `inh_bit` of the right child to the
`inh_bit` of the parent. This allows us not to lose the information
when splitting: when we merge we just get this information back from
where we stored it.

Here is an example:

``````*--+--[/|/]
|
+--[L|95]--+--[L|69]--+--[L|50]
|          |
|          +--[R|19]--+--[R|14]
|                     |
|                     +--[L|5]
|
|
+--[L|26]
``````

In this picture we see some examples that illustrate how the
inheritance and the `lr_bit` is restored from children. The
inheritance bit is shown by the letter L or R.

Let's look at the node 69. It's a right child with left inheritance
bit. When it's split, the right property goes to the left child as
the inheritance bit and the left property goes to the right child as
inheritance bit. When the children are merged back, both the
`lr_bit` and the `inh_bit` can be restored.

The same thing happens when we split the node with size 19. It's a
left child and this property is kept by the left child as its
left-right bit. The node 14 is the right child, and it's keeping the
inheritance bit.

The root node on the picture actually does not exist. It's there in
order to show the connection to the fake right node, which does not
contain any area. It's `in_use` bit is set to true in order to stop
the recursion when the children of its left buddy are merging.

## Allocating and Freeing

### Allocation

Now this is how allocation works. The first step is to determine the
number of blocks needed. We are given bytes and we need the number of
blocks. Also we shouldn't forget about the header, the size of which
is also included in the size of the item.

Once we have the minimal number of blocks, we look for a suitable item
in the array. Perhaps we find it, perhaps we don't. If we don't find
it, we need to allocate more memory from the Operating System.

Then, after we have our item, which can come either from the array or
from the OS, we need to split it as much as possible in order to take
as little memory as possible for our needs.

And the last thing is to set the `in_use` flag and to calculate the
address of the area (which is actually the address of the item plus
the size of the header). It's important that the user does not access
the header! So we return the area.

### Freeing

Freeing is more or less the opposite of the allocation: we calculate
the address of the item, set the `in_use` bit to false, insert the
item into the array and merge the item with free buddies as much as
possible. It's important to guarantee that whenever we give an item
to the user, we have a place in the array where to put it when it's
freed.

## Array Initialization

The array is a dynamic array: when it reaches its maximal capacity, it
is extended by allocating a bigger array and copying the old data into
the new array, after which the area occupied by the old array is
freed.

One of the important things about the array is that our allocator is
used to extend the array. For this reason we have to be sure that the
array contains a free list which is able to "accept" our array when we
need to allocate a new one. After extending the array, we also
initialize the cells for the new array in order to be able to insert
it into its free list, but it's not really a problem, since if the old
array could contain enough bytes for the old array, a twice bigger
array is more likely to have a cell with size big enough because the
size of the array grows linearly, whereas the sizes of free list have
a Fibonacci-like growth, which is exponential.

Now I will show how I calculated the defines for the initialization
of the array. It's the same for 64 bits, 32 bits and 16 bits, so I'll
only show the 64 bits.

index flsz capacity array-bytes area-bytes store-itself
0 3 1 16 16 true
1 4 2 32 24 false
2 5 4 64 32 false
3 7 4 64 48 false
4 10 8 128 72 false
5 14 8 128 104 false
6 19 8 128 144 true
7 26 8 128 200 true
8 36 16 256 280 true
9 50 16 256 392 true
10 69 16 256 544 true

As I have already described, the first size in the array will be 3.
The column name flsz stands for free list item size in blocks.

The capacity column says how many cells the array contains. And the
array-bytes column is the same thing in bytes.

The area-bytes column is how many bytes a free list at the given
index can contain. And when this value is greater or equal to the
area-bytes value, the column store-itself indicates true.
Otherwise it's false.

So, after the index 6, the array consistently can store itself. And
we can be sure of that because the growth of the size of the free
lists is greater than the growth of the capacity.

But I have decided to avoid allocating small amounts of memory space.
For 64 bits it's 512, which corresponds to the row with index 10.
As we see from this table, allocating 544 bytes for the initial array
is completely possible and that's what I do. Here is the code form
the file `mem.c`:

``````/* 64-bit OS */
#if defined(__x86_64__)
#define MIN_SIZE 3
#define SIZE_1 4
#define SIZE_2 5
#define SIZE_3 7
#define DATA_INIT_BLOCKS 69
#define ARRAY_INIT_SIZE 11
#define ARRAY_INIT_CAPACITY 16
``````

Just for info, this is what I do for 32-bit and 16-bit systems:

``````/* 32-bit OS */
#elif defined(__386__) || defined(__i386__) || defined(__DJGPP__)
#define MIN_SIZE 2
#define SIZE_1 3
#define SIZE_2 4
#define SIZE_3 5
#define DATA_INIT_BLOCKS 36
#define ARRAY_INIT_SIZE 10
#define ARRAY_INIT_CAPACITY 16

/* 16-bit OS */
#elif defined(__I86__) || defined(__86__)
#define MIN_SIZE 1
#define SIZE_1 2
#define SIZE_2 3
#define SIZE_3 4
#define DATA_INIT_BLOCKS 19
#define ARRAY_INIT_SIZE 9
#define ARRAY_INIT_CAPACITY 16

#else
#error Unsupported Operating System, sorry.
#endif
``````

# Portability and Testing

In order not to be completely dependent on only one Operating System,
and in order to know that the fact that I the defines in `mem.c`
actually serve their purpose, and because I intend to rewrite this
program in assembly for a 16-bit OS, I decided to compile and to test
this allocator on different Operating Systems.

Luckily Linux can compile both 64-bit and 32-bit binaries and run them
through Valgrind. This is very important because it allows me to be
more or less sure that at least 64-bit and 32-bit versions work
correctly.

I have also added some generated content to the allocated areas with a
checksum in order to be sure that it does not get corrupted. This
way, even if I can not use Valgrind on a 16-bit OS, the fact that it
does not report an error is already a good sign.

## OS tested

So these are the Operating Systems on which I have tested my code:

• 64-bit Linux with GCC
• 32-bit Linux with GCC
• 32-bit Linux with OpenWatcom
• 32-bit Hurd with GCC
• 32-bit ArcaOS with OpenWatcom
• 32-bit DOS with OpenWatcom
• 16-bit DOS with OpenWatcom

## Makefiles

I have used two compilers in order to compile this project: GCC and
OpenWatcom. The OpenWatcom compiler uses an archaic version of C,
where variable declarations have to be first in the block, before any
code. So in order to compile, I had to modify a lot of things in a
lot of places.

In the end I have decided to have two makefiles, for both compilers,
because they are very different and in order to preserve some
consistency in each file. The first file is GNUmakefile. It's for
OpenWatcom does not have a makefile name for itself, so I had to use
"makefile".

# Conclusion and Goodbye

So, this was my little project where I tried to learn a little bit
about memory allocation and to implement a generalized Fibonacci
memory allocator. Unfortunately it's not enough to be considered a
real memory allocator. A memory allocator has to be able to work with
multiple threads, which was not the main purpose of this project. For
this reason I could not compare it to real memory allocator nor find a
test suite which would tell how bad or how good it is. But for my
purposes, namely, to become more familiar with memory allocations, and
to be ready to write a memory allocator for a 16-bit single-threaded
environment it's exactly what is needed.

See you next time, bye.