Lautaro Jayat

Posted on Jul 5, 2023 • Originally published at lautarojayat.github.io

Implementing a hash table with separate chaining for collisions resolution

#algorithms #programming #tutorial #beginners

Table of contents

What is a hash table?
A collision resolution strategy
Implementing the hash table
Linked list related operations
Hash table related operations

What is a hash table?

A hash table can be described as a collection of key-value pairs where:

Each key appears at most once.
Each key maps to only one value.
A function is used to consistently map the potentially infinite space of possible keys to finite domain of "hash codes" that we can handle.

In respect to the third bullet point mentioned above, the underlying problem the hash function solves is as follows:

If we have a limited number of keys, and time is not a problem, we could store key-value pairs in a linked list or an array. Then, if we want to look for a value, we just traverse the whole collection, comparing our input key with each key we have stored.

Although the approach is simple, it implies that the number of lookups needed to find a value might grow as the number of stored keys also grows. To avoid this, a function is used to convert a key into something that gives us a hint about where the key is located in memory, without needing to traverse the entire collection.

The functions that make this work are called "hashing functions."

A useful hashing function should be fast to compute without wasting too many resources. If it's not, each lookup may have a consistent time but could be slower than the entire traversing and comparing approach mentioned above.

It should also minimize the probability of outputting the same value for two different inputs, which is commonly called a "collision."

As we just mentioned, we are talking about "minimizing" collisions rather than trying to avoid them completely. This is because we are mapping an infinite set of possible keys to a finite set of possible outputs. Collisions will be inevitable at some point.

A collision resolution strategy

Collisions can be minimized, but they cannot be avoided. This means we need a way to handle these collisions.

For this article, we have chosen to start with what is called "separate chaining", which consists on using linked lists to store all key-value pairs where different key maps to the same output after being passed to our hash function. This way, every time we want to store a key-value pair, we first compute the hash, then we look for the head node corresponding to that hash and traverse the list, comparing each stored key with the input key.

The approach still implies a linear time complexity for the step where we traverse the entire linked list in search of the provided key, but now the universe of possible keys is reduced to a fraction of the original.

Furthermore, in the design, we can decide how many elements can be stored in a linked list or the limit for the ratio between available linked lists and overall elements stored, and resize the entire data structure accordingly.

Implementing the hash table

To present this data structure, we will first introduce the interfaces we are going to use. Then we will move on to the linked list-related operations. Each of them will have comments to guide the reader through the code, along with some quick tests to understand how they behave.

Finally, we will proceed to the hash table-related functions, providing implementation details and additional tests. The intention is to progress from bottom to top, from the lower level to the integration level.

Defining interfaces

Our hash table will consist of a struct containing:

the number of elements already stored
an array of pointers that leads to the head of a linked list
a number representing the size of the array above

typedef struct {
    Node** collection;
    unsigned int capacity;
    unsigned int storedElements;
} HashTable;

For the collection, we are using a pointer-to-pointer semantic just to simplify the process of changing the head of the linked list.

Then, for the nodes, we will use structs containing the following:

a pointer to where we will store the key
a pointer to where we will store the value
a pointer to the next element in the list

typedef struct Node_T {
    char key[MAX_KEY_LEN];
    char value[MAX_VALUE_LEN];
    struct Node_T* next;
} Node;

Apart from that, we will define certain numbers to constrain the behavior of the linked list:

#define MAX_KEY_LEN 256
#define MAX_VALUE_LEN 256
#define INITIAL_CAPACITY 10
#define GROWTH_FACTOR 2

For the API of our hash table, we will support the following operations:

creating a hash table
storing a key-value pair
removing a key-value pair
looking up a value for a given key.

The function signatures for these operations will be as follows:

HashTable* CreateHashTable(unsigned int capacity);
bool Store(HashTable** hashTable, char* key, char* value);
char* Get(HashTable* hashTable, char* key);
bool Remove(HashTable* hashTable, char* key);

For convenience, we will create a separate set of functions that handle linked list operations such as creating a node, removing a node, getting the value of a node, and clearing a whole list.
The function signatures for these operations will be as follows:

Node* CreateNode(char* key, char* value);
bool RemoveNode(Node** head, char* key);
unsigned int ClearList(Node** headNode);
char* GetNodeValue(Node* head, char* key);

Lastly, there are a couple of additional "internal" functions that will be needed for determining when to resize the hash table, computing the hash, and performing the actual resizing:

unsigned int _computeHash(char* key, unsigned int capacity);
bool _needsToResize(HashTable* hashTable);
HashTable* _resize(HashTable* hashTable);

Linked list related operations

We won't be providing a comprehensive explanation of linked lists here, as we have covered this topic in a previous chapter.
If you have never heard about this data structure, please follow this link.

Here, we will only cover the necessary operations to integrate linked lists within our hash table.

Creating a node

When creating a node, we want to check that both the key and the value are valid pointers to a string, and also that their lengths are within the bounds we want to handle.
Then, we will allocate memory for the struct and copy the key and value to the corresponding fields in the struct.
At this point, we don't care if the keys already exist in a node of our linked list. That will be handled at a higher scope.

// We accept the pointers to the strings and return a pointer to a node
Node* CreateNode(char* key, char* value) {
    // are the pointers valid?
    if (key == NULL || value == NULL) {
        printf("error: key and value must be string\n");
        return NULL;
    }

    // are the lengths of the strings valid?
    if (strlen(key) > MAX_KEY_LEN || strlen(value) > MAX_VALUE_LEN) {
        printf("error: key and value should not exceed max lengths\n");
        return NULL;
    }

    // At this point we can start asking for memory
    Node* newNode = malloc(sizeof(Node));
    if (newNode == NULL) {
        printf("error: could not allocate memory for new node\n");
        return NULL;
    }
    // by default we set the next pointer to null
    // the calling function will take care of setting the correct value
    newNode->next = NULL;

    // then we copy the key and the value to their buffers in the struct
    strcpy(newNode->key, key);
    strcpy(newNode->value, value);

    // And we return the pointer to the new node
    return newNode;
};

Notice that we are using functions declared in string.h for string manipulation. Here we are using strlen to get the length of a string. It accepts a pointer to a string and returns a number representing the length. Then we are using strcpy to copy a string into a buffer.

Testing node creation

To test the function described above we will do the following:

void _testNewNode() {
    // we will use this as a temp variable to store a string
    char* s;
    // this will be used as a good input
    char* goodString = "hey how are you?";
    // here we will create a string that exceeds the limits for
    // our string length
    int i;
    char badString[MAX_KEY_LEN + 3000];
    for (i = 0; i < MAX_KEY_LEN + 20; i++) {
        badString[i] = *"!";
    }

    // We create a new node with correct inputs
    // and check everything was stored correctly
    Node* newNode = CreateNode(goodString, goodString);
    s = newNode->key;
    assert(*s == *goodString);
    s = newNode->value;
    assert(*s == *goodString);
    free(newNode);
    printf("Testing with good string: PASS\n");

    // Now we provide an invalid poiner and expect
    // the function to return NULL
    Node* newNode2 = CreateNode(NULL, goodString);
    assert(newNode2 == NULL);
    free(newNode2);
    printf("Testing with null string pointer: PASS\n");

    // Now we provide a long string and expect a NULL pointer
    Node* newNode3 = CreateNode((char*)badString, goodString);
    assert(newNode3 == NULL);
    free(newNode3);
    printf("Testing with bad string: PASS\n");
}

Clearing the list

To clear the list, we will take a pointer to a head node and then iterate over all its members. Each node will be freed, and then the head will be set to NULL.
To provide feedback, we will return the number of nodes that were removed.

// we receive the pointer of the head node
// and return an integer with the number of nodes that were deleted.
unsigned int ClearList(Node** headNode) {
    // we access the head node
    Node* head = *headNode;
    // and set a temporary node that will help us in the iteration
    Node* tmp;
    // we create a counter
    unsigned int deletedNodes = 0;
    // and start a loop that will end when the current node is NULL
    while (head != NULL) {
        // we store the current node ina  tmp variable
        tmp = head
        // we assign the next node as the current node
        head = tmp->next;;
        // free the current node
        free(head);
        // and update the counter
        deletedNodes++;
    }

    // finaly we update where the caller pointer is pointing to
    *headNode = NULL;
    printf("deleted %d nodes, linked list is now empty\n", deletedNodes);
    // and we return the counter
    return deletedNodes;
};

Testing list deletion

A simple test of the happy path can be done by creating a list with an arbitrary number of nodes and then passing the list into our function.
After that, we can assert if the returned value corresponds to the number of nodes we created.

void _testClearList() {
    // we create a valid stirng for input and output
    char goodString[] = "to delete";
    // we set the number of values we want our list to have
    unsigned int expectedNodes = 10;
    // start a counter
    unsigned int i;
    // we create the first node
    Node* head = CreateNode(goodString, goodString);
    // and we add loop to generate the rest of the nodes
    for (i = 1; i < expectedNodes; i++) {
        // we store the current head
        Node* tmp = head;
        // assign which should be the new head
        head = CreateNode(goodString, goodString);
        // and update the next node with the old head
        head->next = tmp;
    };
    // we clear the list and store the result
    unsigned int clearedNodes = ClearList(&head);
    // the result should be equal to the nodes we created
    assert(clearedNodes == expectedNodes);
}

Removing a node from the list

To remove a node from the list, we need to access the list using the head node. Then, we traverse the list until we find the key we are looking for.
When we find the key, we free that node and make the previous node point to the node that comes after the deleted one.
If the target node is the head node, we simply delete the head and assign the next node as the new head.
In our implementation, we will return a boolean value to indicate whether the process succeeded or not.

// we accept a pointer to a pointer to the head node
// and a pointer that holds a reference to the key we
// are loking for
bool RemoveNode(Node** headP, char* key) {
    // first we check if the pointers are valid
    if (headP == NULL || *headP == NULL) {
        printf("could not remove node, the list is empty\n");
        return false;
    }
    // then we dereference to access the actual head
    Node* currentNode = *headP;

    // we check if the head node is the one we are looking for
    if (strcmp(currentNode->key, key) == 0) {
        // in case it is, we assign the next node as the new head
        *headP = currentNode->next;
        // we free the one we want to remove
        free(currentNode);
        // and return a success
        return true;
    }
    // at this point we know we need to traverse the list
    // so we store a reference to the next node
    Node* nextNode = currentNode->next;
    // and start looping and checking until we reach the end
    while (nextNode != NULL) {
        // if the next node is the node we are looking for
        if (strcmp(nextNode->key, key) == 0) {
            // make the previous (currentNode) point to
            // the next node of the one we are going to remove
            currentNode->next = nextNode->next;
            // then we remove the one we found
            free(nextNode);
            // and return
            return true;
        }
        // if not, we just move one step
        // and update the pointers for the next iteration
        currentNode = nextNode;
        nextNode = nextNode->next;
    }
    // if we reach this point, that means the key was not in any
    // of the stored pointers
    printf("node of key %s could not be found\n", key);
    return false;
};

Testing node removal

To test if the node removal is working, we will assert for success or failure by using the boolean output of our function.

void _testRemoveNode() {
    // We will use this string for key-values we dont want to delete
    char notToDelete[] = "do not delete me";
    // we will use this to mark a node to delete
    char toDelete[] = "to delete";

    int i;
    printf("Testing deletion of the last node\n");
    // We create the head node, this will end up at the tail
    Node* head = CreateNode(toDelete, toDelete);
    // we prepend 3 more nodes that we dont want to delete
    for (i = 1; i < 4; i++) {
        Node* tmp = head;
        head = CreateNode(notToDelete, notToDelete);
        head->next = tmp;
    };
    // we decide to remove the last node
    bool success = RemoveNode(&head, toDelete);
    // and assert
    assert(success == true);

    printf("Testing deletion of node in the middle\n");
    // now we will delete a node in the middle
    Node* deleteMeNode = CreateNode(toDelete, toDelete);
    deleteMeNode->next = head;
    head = deleteMeNode;
    // and make a sandwich prepending 3 more nodes
    for (i = 1; i < 4; i++) {
        Node* tmp = head;
        head = CreateNode(notToDelete, notToDelete);
        head->next = tmp;
    };
    // we remove the node
    success = RemoveNode(&head, toDelete);
    // and everything went OK
    assert(success == true);

    printf("Testing deletion of the head node\n");
    // now we prepend one node and delete it
    deleteMeNode = CreateNode(toDelete, toDelete);
    deleteMeNode->next = head;
    head = deleteMeNode;
    // we try to remove it
    success = RemoveNode(&head, toDelete);
    // and everything
    assert(success == true);

    printf("Testing early return when providing empty lists\n");
    Node* badPointer = NULL;
    success = RemoveNode(NULL, toDelete);
    assert(success == false);
    success = RemoveNode(&badPointer, toDelete);
    assert(success == false);
    while (head != NULL) {
        Node* tmp = head->next;
        free(head);
        head = tmp;
    }
}

If desired, we could also traverse the list to assert that there is no node with the key we wanted to remove.

Getting a node value

To traverse the linked list and find a value by a key, we will accept a pointer to the head node and a key. We will validate that those pointers are not null pointers. If everything is fine, we will loop over each node, comparing the input string with each key of each node.
If we find a match, we will allocate memory for that string and return a pointer to that buffer.
If no match is found, we will return null.
It will be the responsibility of a higher scope to free that allocated memory.

// We receive a pointer to to the head
// a pointer to a string
// and we return a pointer to a copy of the string
char* GetNodeValue(Node* head, char* key) {
    // we check if the pointers are valid
    if (head == NULL) {
        printf("could not find node because the list was empty\n");
        return NULL;
    }

    if (key == NULL) {
        printf("cant check for node value due to nil key\n");
        return NULL;
    }
    Node* tmp = head;

    // if so, we start iterating over each node
    // we will stop if the key the same as our input
    // or if we reach the last
    while (tmp != NULL && strcmp(tmp->key, key) != 0) {
        tmp = tmp->next;
    }

    // at this point, if tmp is null is because we exited the loop
    // after checking the last node and we didn't find anything
    if (tmp == NULL) {
        // so we return
        return NULL;
    }

    // if we reach this line, this mean we found something
    // so we ask for a block of memory of the size we need
    char* buff = malloc(strlen(tmp->value) * sizeof(char*));
    // but we exit if we could not allocate that memory
    if (buff == NULL) {
        return NULL;
    }
    // in the best case scenario we copy the key to our buffer
    strcpy(buff, tmp->value);
    // and return the buffer
    return buff;
}

Hash table related operations

To simplify things, we can say that our hash table is essentially a dynamic array of pointers to the head of a linked list.
To determine in which linked list a key-value pair should be stored, we pass the key to a hash function that outputs a number between 0 and the number of slots in our array.
Here's a conceptual example:

# each key maps to a given index
hash(k1) -> 0
hash(k2) -> 2
hash(k3) -> 3

[   0   ,   1   ,   2   ,   3   ,   4   ]
    |       |       |       |       |
  k1-v1    NULL   k2-v2   k3-v3    NULL

Then, if a new key maps to an index where the linked list already has a value, we link the new node to it:

# a new value maps to an index where
# the linked list already has a value
hash(k4) -> 0

# so we bind the new node
[   0   ,   1   ,   2   ,   3   ,   4   ]
    |       |       |       |       |
  k4-v4    NULL   k2-v2   k3-v3    NULL
    |
  k1-v1

Creating a hash table

As we presented above in the section on defining interfaces, our hash table is implemented as a struct that contains an array of pointers to linked lists, along with two numbers: one to keep track of the elements stored and the other to indicate the array capacity.

Creating an instance of this struct is straightforward. We provide a number representing the desired capacity, attempt to allocate the necessary memory, and set the defaults as shown in the following example:

// we return a pointer to the hash table
// and accept an int for the capacity
HashTable* CreateHashTable(unsigned int capacity) {
    // if capacity is less than an arbitrary number
    // we set a default capacity
    if (capacity < 4) {
        capacity = 10;
    }

    // we try to allocate an array of pointers to
    // linked lists head pointers
    Node** collection = calloc(capacity, sizeof(Node*));
    // And return if the operation failed
    if (collection == NULL) {
        printf("error: could not initialize underlying collection\n");
        return NULL;
    }
    // At this point we can try to allocate the memory for the hash table.
    HashTable* hashTable = malloc(sizeof(HashTable));

    // If we fail we free the array previously allocated
    if (hashTable == NULL) {
        printf("error: could not initialize hash table\n");
        free(collection);
        return NULL;
    }

    // now we set all the fields and return
    hashTable->collection = collection;
    hashTable->capacity = capacity;
    hashTable->storedElements = 0;
    return hashTable;
}

Testing hash table creation

We will include some simple assertions for the previous function. We want to verify if the capacity and defaults are correctly set.

void _testCreateHashTable() {
    // first we create a hash table with a capacity
    // that is lower than our min accepted value
    HashTable* hashTable = CreateHashTable(3);
    // so we expect it to be the default value
    assert(hashTable->capacity == 10);

    // we free everything
    free(hashTable->collection);
    free(hashTable);
    // and now we create one with an OK capacity
    hashTable = CreateHashTable(15);

    // and we assert it is using our input
    // to set this field
    assert(hashTable->capacity == 15);

    // as a final step we can iterate and see that
    // all slots are NULL pointers
    unsigned int i;
    for (i = 0; i < 15; i++) {
        assert(hashTable->collection[i] == NULL);

    }
    // and we clean everything up
    free(hashTable->collection);
    free(hashTable);
}

Storing a key-value pair in our hash table

Now we come to the part where we start performing useful operations with our hash table!
Please note that there are three steps that will be somewhat abstract for us at this point. These steps involve checking if we need to resize our hash table, the actual resizing process, and the operation where we compute the hash. These "private" functions will be presented after this section.

Note: Just for presentation purposes, we will include tests for storing, getting, and removing values at the end of the article.

To store a value, we need to follow the following flow:

check that our provided inputs are valid
check if we need to resize the hash table, and if so, perform the resizing
Compute the hash for the key to determine which linked list should store our value
Traverse the list and check if a node with the same key exists. If a matching node is found, update its value; otherwise, append a new node to the list.

Please find some guidance in the comments:

// we will return boolean to signal if the process succeeded
// and receive a pointer to a hash table, a key and a value
bool Store(HashTable** hashTableP, char* key, char* value) {
    // if some of the inputs is a NULL pointer, we return
    if (hashTableP == NULL || *hashTableP || key == NULL || value == NULL) {
        printf("error: bad values provided\n");
        return false;
    }
    // get our actual pointer in a handy variable
    HashTable* hashTable = *hashTableP;

    // we check if the hash table needs to be resized
    // and we do it if so
    if (_needsToResize(hashTable)) {
        printf("needs to resize\n");
        // try to resize
        HashTable* newHashtable = _resize(&hashTable);
        if (newHashTable != NULL) {
            // we replace everything if succeeded
            *hashTableP = newHashTable;
            hashTable = newHashTable;
        } // or print and error if not
        else printf("error: could not resize hash table, we will try on next Store operation\n");
    }

    // and now we compute the hash using the capacity of the array
    // used to store the number of buckets for our linked lists
    unsigned int position = _computeHash(key, hashTable->capacity);

    // and we get the head node of our hash table
    Node* head = hashTable->collection[position];

    // we traverse it and check our node key
    // with the provided key
    while (head != NULL) {
        if (strcmp(head->key, key) == 0) {
            // in case we find a node with the
            // same key, we exit the loop
            break;
        }
        // if not, we go to the next node
        head = head->next;
    }

    // at this point we migth have a node stored in
    // the "head" variable.
    if (head != NULL) {
        // if so, we copy our value to the "value"
        // field in the found node
        strcpy(head->value, value);
        return true;
    }

    // if we didn't find it, we create a new node
    Node* newNode = CreateNode(key, value);

    // if the node couldn't be created, we return
    if (newNode == NULL) {
        printf("error: could not alocate memory for node\n");
        return false;
    }
    // if everything went ok, we prepend the new node
    newNode->next = hashTable->collection[position];
    hashTable->collection[position] = newNode;
    return true;
}

Implementing a simple hash function

NNow, let's focus on the function that takes a key and produces a number that will serve as an index for the underlying array used to store references to the heads of our linked lists. As the array grows, we need to be able to update the range within which this number should fall.
For our implementation we will be using a simplified version of a rolling hash strategy In this strategy, we iterate over each character of the key, producing a hash, and then use that hash to compute the next one. In each iteration, we will take the modulo operation to ensure that our results fall within the desired bounds.

// we accept a pointer to where our key lies and
// a number for the upper limit four our hash
unsigned int _computeHash(char* key, unsigned int capacity) {
    // our hash starts with 0
    unsigned int hash = 0;
    // as our counter
    unsigned int counter = 0;
    // and we select a multiplier
    unsigned int primeNumber = 31;
    //then we loop over the string
    while (key[counter] != '\0') {
        // we multiply our hash with and add the curren character
        hash = (hash * primeNumber) + key[counter];
        // and update the hash
        hash = hash % capacity;
        counter++;
    }
    // finally we return a number that will be between 0
    // and our upper limit (which happens to be our array capacity)
    return hash;
}

Testing our hashing function

Now, our goal is to test whether our hash function consistently produces the same output for a given key. We'll conduct a simple test to gauge the consistency of the results when we manipulate the upper limit of the hash value and repeat the operation multiple times with same key.

However, it's important to note that this test is not an absolute proof of consistency or a demonstration of homogeneous distribution of the hash values. To provide such proof, we would need a formal, logical, and mathematical demonstration, which exceeds the scope of this article.

void _testCreateHashConsistency() {
    // we will use a simple string
    char* string = "hey, how are you";
    // and two counters
    int i, j;
    // we will compute a hash for 990 different capacities
    for (j = 10; j < 1000; j++) {
        unsigned int hash = _computeHash(string, j);
        // and then generate the hash 100 times
        for (i = 0; i < 100; i++) {
            // and check those 100 times we received the same hash
            assert(hash == _computeHash(string, j));
        }
    }
}

Checking if we need to resize

Determining when to resize our hash table is somewhat arbitrary. A common approach is to calculate a ratio between the number of buckets or linked lists available and the number of elements stored in our data structure.

To implement this operation, we will accept a hash table as input and calculate whether adding one more element would cause the ratio to exceed our predetermined limit.

// We return a boolean and accept a hash table
bool _needsToResize(HashTable* hashTable) {
    // we calculate the ratio between the actual stored elements plus one
    // and the current capacity (the number of buckets for our linked lists)
    // as our numbers were int, we need to cast them to float first
    float ratio = (float)(hashTable->storedElements + 1) / (float)hashTable->capacity;
    // if the ratio is greater than 1.5 we know we need to resize
    if (ratio > (float)1.5) {
        return true;
    }
    // if less, we can continue adding nodes
    return false;
};

Doing the actual resizing

To resize our hash table, we follow a process of creating a new hash table with a larger capacity, which means more buckets for storing linked lists. We then traverse all the existing linked lists and attempt to store their elements in the new hash table. To simplify this process, we can use the same Store method that we implemented for our hash tables.

Once all elements have been copied to the new hash table, we can free the memory occupied by the old hash table and return a pointer to the new hash table. If we encounter any issues during the allocation process, we need to clean up all allocated memory and inform the upper scope to handle the failure appropriately.

// we accept a pointer to a hash table and return a pointer to a hash table
HashTable* _resize(HashTable* hashTable) {
    // We store the pointer to avoid mutating parameters
    HashTable* oldHashTable = hashTable;
    // we try to allocate memory for the new hash table
    HashTable* newHashTable = CreateHashTable(oldHashTable->capacity * GROWTH_FACTOR);
    // And return if we failed
    if (newhashtable == NULL){
        return NULL
    }

    unsigned int i;
    bool success = true;
    // then we will loop over all linked lists
    for (i = 0; i < oldHashTable->capacity; i++) {
        Node* currentNode = oldHashTable->collection[i];
        // And for each linked list, we will loop for all its nodes
        while (currentNode != NULL) {
            // we will try to store the node in the new linked lists
            // while keeping track if we succeeded
            success = Store(&newHashTable, currentNode->key, currentNode->value);
            // we return if there was a problem
            if (!success) break;
            // and continue with the next node if everything went ok
            currentNode = currentNode->next;
        }
        // if we exited the while loop with an error, we stop the outer loop too
        if (!success) {
            printf("error: could not complete sesizing\n");
            break;
        }
    }

    // if we succeeded we can clear the old hash table
    if (success) {
        int i;
        for (i = 0; i < oldHashTable->capacity; i++) {
            ClearList(&hashTable->collection[i]);
        }
        free(oldHashTable->collection);
        free(oldHashTable);
        printf("success!!\n");
        return newHashTable;
    }
    printf("cleaning up after resizing attempt\n");
    // at this point we know we failed and we need to clean the new hash table
    for (i = 0; i < newHashTable->capacity; i++) {
        Node* currentNode = newHashTable->collection[i];
        while (currentNode != NULL) {
            Node* tmp = currentNode->next;
            free(currentNode);
            currentNode = tmp;
        }
    }
    free(newHashTable->collection);
    free(newHashTable);
    return NULL;
};

Testing the resizing

To verify if the resizing operation is being performed correctly, we can do a simple check by attempting to store fifty key-value pairs in the hash table and then checking if the capacity has changed. This check can help us confirm if the resizing process is triggered and if the capacity of the hash table is updated accordingly.

void _testResizing() {
    // we create a buffer for our strings
    char input[256];
    // initialize some ints
    int i, INITIAL_CAP;
    INITIAL_CAP = 5;
    // create a little hash table
    HashTable* hashTable = CreateHashTable(INITIAL_CAP);
    // and assert the initial conditions
    assert(hashTable->capacity == INITIAL_CAP);
    // then we procedurally create some key-values
    // and try to store them in the hash table
    for (i = 0; i < 50; i++) {
        sprintf(input, "string_%d", i);
        bool success = Store(&hashTable, input, input);
        assert(success == true);
    }
    // at this point the capacity should be pretty higher
    // than te initial one
    assert(hashTable->capacity > INITIAL_CAP);
    // finally we clean everything up
    for (i = 0; i < hashTable->capacity; i++) {
        ClearList(&hashTable->collection[i]);
    }
    free(hashTable->collection);
    free(hashTable);
}

Getting a value from the hash table

To retrieve a value by its key from the hash table, we need to perform the following steps:

Validate that both the hash table and the key are valid inputs.
Calculate the hash of the key using our hash function. This will give us the index of the linked list where the value may be stored.
If the slot for that linked list is empty, it means the key is not present in our hash table. We return null to indicate the absence of a value.
If we find a linked list in that slot, we need to search for the key within the linked list. We delegate this search operation to our GetNodeValue function.
The GetNodeValue function will traverse the linked list, comparing each node's key with the input key. If a matching key is found, we return the corresponding value.
If we reach the end of the linked list without finding a matching key, we return null to indicate that the key is not present in our hash table.

//we get a pointer to a hash table, and a pointer to a string
char* Get(HashTable* hashTable, char* key) {
    // if the pointers are NULL, or the string is empty, we consider that an error
    if (hashTable == NULL || key == NULL || strlen(key) == 0) {
        printf("error: bad values were provided:\n%p\n%s\n%ld\n", hashTable, key, strlen(key));
        return NULL;
    }
    // at this point we can compute the hash
    unsigned int position = _computeHash(key, hashTable->capacity);
    // and get the correct linked list
    Node* head = hashTable->collection[position];
    // then we check if we need to return
    if (head == NULL) return NULL;
    // If not, we delegate the rest of the flow
    char* value = GetNodeValue(head, key);
    // and return what the GetNodeValue give us
    return value;
};

Remving a key-value pair from the hash table

To remove a key-value pair from the hash table, we need to follow these steps:

Validate the inputs to ensure they are valid (hash table and key).
Calculate the hash of the key using our hash function to determine the linked list where the pair may be stored.
If there is no linked list associated with the calculated hash, it means the key is not present in the hash table. We can return from the function at this point.
If a linked list is found at the calculated hash, we pass a pointer to the head pointer of the linked list to our RemoveNode function.
The RemoveNode function will traverse the linked list, searching for the node with a matching key. If found, it will remove the node from the linked list and update the head pointer if necessary.

//we get a pointer to a hash table, and a pointer to a string
bool Remove(HashTable* hashTable, char* key) {
    // if the pointers are NULL, or the string is empty, we consider that an error
    if (hashTable == NULL || key == NULL || strlen(key) == 0) {
        printf("error: bad values were provided:\n%p\n%s\n%ld\n", hashTable, key, strlen(key));
        return false;
    }
    // at this point we can compute the hash
    unsigned int position = _computeHash(key, hashTable->capacity);
    // if there is no linked list, we can return
    if (hashTable->collection[position] == NULL) return true;
    // if we find a linked list, we delegate the node removal
    bool success = RemoveNode(&hashTable->collection[position], key);
    // and output the result
    return success;
};

Testing a complete flow

Now we get to test a complete flow!
In it, we will try to:

create a hash table
store some key-value pairs
change their values
try to get a value for a non present key
remove a key and hen check it is not present in the hash table

void _testStoreGetAndRemove() {
    // we create some keys
    char* keys[] = { "key1","key2","key3","key4","key5","key6" };
    // we initialize a hash table
    HashTable* hashTable = CreateHashTable(5);
    int i;
    // we store all the keys
    for (i = 0; i < 6; i++) {
        bool success = Store(&hashTable, keys[i], keys[i]);
        // and check everything went ok
        assert(success == true);
    }
    // then we loop over all the keys and check they have a correct value
    for (i = 0; i < 6; i++) {
        char* value = Get(hashTable, keys[i]);
        assert(strcmp(value, keys[i]) == 0);
        free(value);
    }
    // we create a new expected string for all the values
    char* defaultValue = "default";
    // and change the values for all the keys
    for (i = 0; i < 6; i++) {
        bool success = Store(&hashTable, keys[i], defaultValue);
        assert(success == true);
    }
    // Then we get all the keys and assert their new values
    for (i = 0; i < 6; i++) {
        char* value = Get(hashTable, keys[i]);
        assert(strcmp(value, defaultValue) == 0);
        free(value);
    }
    // we try to get a value for a non present key
    assert(Get(hashTable, "missingKey") == NULL);
    // we remove one key
    bool success = Remove(hashTable, keys[1]);
    assert(success == true);
    // and assert there is no value for it now
    assert(Get(hashTable, keys[1]) == NULL);
    // and now we clean everything up
    for (i = 0; i < hashTable->capacity; i++) {
        Node* n = hashTable->collection[i];
        ClearList(&n);
    }
    free(hashTable->collection);
    free(hashTable);
}

DEV Community