One of the first things you learn how to do in a programming language is how to create arrays: a continuous data segment in memory.
If you have taken an introductory CS course, without fail, the first data structure you will learn is linked lists.
In this article, we will discuss the differences and when you want to employ one over the other.
Arrays
Arrays can be considered data structures. While being extremely simple, they permit random access. What this means is accessing any position in the array is extremely fast.
But arrays definitely do have several shortcomings.
Reallocation
This isn't so bad when using a language that handles memory for you like in Python, but in a language like C, it is a bit more tedious.
We only can increase the storage capacity of an array if it was first allocated using malloc()
or calloc()
. Arrays placed onto the stack cannot be resized. To resize an array, we need to allocate a new array with a bigger capacity then copy from the old array over to the new array and free the old one.
int *arr = malloc(sizeof(int) * 8);
// Fill in array until we run out of space
int *new_arr = malloc(sizeof(int) * 16);
memcpy(new_arr, arr, sizeof(int) * 8);
free(arr);
// Continue on...
In the example, we allocate an array on the heap with the storage capacity of eight integers. Later, when we run out of space, we allocate a second array with twice the capacity as the old one. Next, we copy over the integers into the new, bigger array and free old array.
This set of operations can be performed by the C standard library function realloc()
. It accepts two arguments: the first being a pointer that points to heap allocated memory by malloc()
or calloc()
and the second is the new size for the array. It returns a pointer to the newly allocated memory and the pointer passed in is invalidated.
int *arr = malloc(sizeof(int) * 8);
// ...
int *new_arr = realloc(arr, 16);
// Carry on with our operations...
Insertion and removal
What if we need to place some data into the middle of the array opposed to appending it to the end?
To begin with, we need to shift all the elements from the spot where we will place the new data all ways to the end of the array.
Example:
| 1 | 2 | 3 | 4 | 5 | |
Shift the array from 3rd index
| 1 | 2 | 3 | | 4 | 5 |
Place new data into freed space
| 1 | 2 | 3 | 6 | 4 | 5 |
For small arrays, it's not a huge deal, but when dealing with arrays with the sizes of hundreds, thousands or even millions of elements this can be an extremely costly operation to do.
Removing isn't much better either. Instead of shifting the elements to the end, we now shift all elements from the end to the spot where the deleted element was.
Linked Lists
A linked list node consists at the minimum two things: the data it holds and a pointer or reference to the next node. Linked lists shine where arrays do not.
Reallocation
Nodes in linked lists are created on a need be bases. Sure, while holding an extra pointer in conjunction with the data increases the total size of the linked list, we get the benefit not having to reallocate increasing more massive arrays. In arrays, when reallocating, we double its capacity. If you have an array of a thousand elements and need to reallocate, then you will have to allocate two thousand elements. Assuming an integer is 8 bytes on a 64-bit platform, then the number of bytes you have just allocated is 16000 bytes. This gets worse as the array continue to grow massive.
The number of bytes you allocate per new node is always the size of the node structure. If the node only contains an integer and a pointer, and not considering padding, then the size in bytes of a node will be 16 bytes.
Inserting and removal
Inserting elements into a linked list is considerably faster than performing the same operation on an array.
First, we need to traverse the list until we hit the "index" of the linked list node that we want. Linked lists do not have indices, but we pretend they do because it makes doing operations like insertion easier.
Next, we invalidate the pointer from the node before the insertion spot and have it point to the new node and the new node will now point to where the previous node did.
So if we have node1, node2, and new_node it would like something like the following:
node1 ---> node2
new_node
node1 -x-> node2
new_node
^ |
| V
node1 node2
A lot better than shifting all the elements in an array.
Removing an element is just as simple. The node after the one that is to be deleted will now be pointed by the node before the removed node.
So, node1's pointer will point from new_node to node2.
Accessing elements
This is where arrays have linked lists bested. Arrays have random access, but in linked lists, getting the node you want requires you to go over each node until finding the requested "index". Again, linked lists do not indices but it helps with the conversation.
Linked lists are also not cache friendly. When creating a linked list node, you are creating it on the heap. When the previous node points to it, there is no guarantee that the new node is close to the previous one. So, when you are traveling over each node, you are jumping all over your computer's memory.
This definitely doesn't help the with CPU being able to cache your nodes. When the CPU detects that a segment of memory is being constantly accessed, then it is cached into a special place for faster access.
Which one to use?
Now knowing the differences between the two can help you make a more informed decision on which one to use for a specific need.
If the number of elements isn't increasing rapidly and you are more concerned with faster reading and writing, then I would suggest using arrays.
If you know that the number of elements will be increasing rapidly, then a linked list might be a more suitable option.
Top comments (0)