gulc: writing a clone of C++’s vector in C

Disclaimer: This article is the second of the series about the library ‘gulc’. If you want a complete overview of the project, I recommend to read the first article here.

In this article, gulc takes a step forward with the implementation of a core data structure, the vector (or, if you prefer, dynamic array). As a programmer that works with both C and C++, I always felt that C needed something like C++’s std::vector (and many other things, but we’ll get to those in the next articles) and, more in general, a dynamic array data structure.

Vector clones in C

Implementing a vector clone in C, apart from the basic layout of its metadata (length, capacity, pointer to stored data) is typically done in two different ways.

The first typically consist in making a sort of template with macros and then provide the user the ability to generate the code for different types with a macro like VECTOR_IMPL(type). While this is definitely more type safe, it leads to more code being generated and thus a bigger binary size.

The second way instead revolves around using void pointers to keep it the most generic possible and have a special field in the vector’s metadata structure: the element size, which allows for correct offsetting operations.

We are going to take a sort of a hybrid approach (with some things inspired by glib’s GArray) in a certain sense, combining the void pointers solution (with some adjustments) with macros that help with the usage. Furthermore, this vector clone will also support custom element destruction.

The interface

First of all, we are going to design the interface of gulc’s vector in its dedicated header file (gulc_vector.h) using the helper macros (defined in gulc_common.h) like GULC_TYPE_DECLARE, GULC_FN_DECLARE and GULC_NAME, which help declaring and deducing types and functions according to the user’s preference on whatever to use or not the prefix gulc_ before them.

The first thing we are going to declare is the vector’s metadata and, to simplify development, an alias for the vector metadata struct used in the implementation so we don’t have to write GULC_NAME(Vector) everytime.

GULC_PTR_FN_DECLARE(VectorElementDestructor, void, void*)  

typedef GULC_NAME(VectorElementDestructor) gulc_TypeVectorElemDestr;  

GULC_TYPE_DECLARE(  
  Vector,  
  struct  
  {  
    const size_t length;  
    const size_t capacity;  
    uint8_t data[];  
  }  
)  

typedef GULC_NAME(Vector) gulc_TypeVector;

You may be asking: “why don’t we use a void pointer to keep track of the stored data?” and the answer is actually pretty simple: arithmetic operations (needed for offsetting when accessing data) on void pointers are not defined in the C standard because we don’t know how many bytes we have to move the pointer forward or backwards. This forces us to either convert the pointer to another pointer type or use a pointer to a certain type, and uint8_t, being a type with a width of 1 byte (so 8 bits), supports arithmetic operations and the value of a uint8_t pointer changes by only 1 byte when incremented or decremented, making it a “byte pointer”.

This approach still has a problem: with a pointer the stored data is not located next to the other attributes and thus operations on the vector would be a little slower.

To fix this, we use the flexible array member, a special type of array that allows you to define the size of the array at runtime rather than compile time. It’s a feature introduced in the C99 standard and it allows us to have the vector’s stored data right after the length and the capacity, removing the indirection to another part of memory that we had with the pointer.

We also mark the length and capacity attributes as const so that they can’t be modified (or at least easily modified) by users.

Then we have the functions that create/destroy the vector and make basic operations on it, like adding space for /deletion of elements at the end of or at a certain position and resizing/clear/shrinking functions. There are also some support macros that allow a friendlier vector creation and element access.

GULC_FN_DECLARE(gulc_TypeVector*, VectorCreate, size_t elementSize, size_t capacity, gulc_TypeVectorElemDestr elementDestructor)  
GULC_FN_DECLARE(void, VectorDestroy, gulc_TypeVector** vectorP)  

GULC_FN_DECLARE(void*, VectorEmplaceBack, gulc_TypeVector** vectorP)  
GULC_FN_DECLARE(void, VectorPopBack, gulc_TypeVector** vectorP)  
GULC_FN_DECLARE(void*, VectorEmplace, gulc_TypeVector** vectorP, size_t index)  
GULC_FN_DECLARE(void, VectorErase, gulc_TypeVector** vectorP, size_t index)  

GULC_FN_DECLARE(void, VectorResize, gulc_TypeVector** vectorP, size_t newSize)  
GULC_FN_DECLARE(void, VectorClear, gulc_TypeVector** vectorP)  
GULC_FN_DECLARE(void, VectorShrinkToFit, gulc_TypeVector** vectorP)  

#define GULC_VECTOR_OF(type) GULC_NAME(VectorCreate)(sizeof(type), 0, NULL)  

#define GULC_VECTOR_OF_DESTR(type, destructor) GULC_NAME(VectorCreate)(sizeof(type), 0, destructor)  

#define GULC_VECTOR_OF_SIZED(type, cap) GULC_NAME(VectorCreate)(sizeof(type), cap, NULL)  
#define GULC_VECTOR_OF_SIZED_DESTR(type, cap, destructor) GULC_NAME(VectorCreate)(sizeof(type), cap, destructor)  

#define GULC_VECTOR_AT(vec, type, i) (((type*)(vec)->data)[i])

You can find every function documented on Github here.

The implementation

Now it’s time for the actual implementation. As you may have noticed, the fields for the element destructor and size are not present in the metadata showed earlier. This is because they are not supposed to be easily modifiable by the end user of the library, so we hide them in the implementation file (gulc_vector.c) in an another struct, which contains the vector’s metadata plus the element size and destructor. To go back and forth from one “form” to another, I defined some utility macros.

typedef struct  
{  
    size_t elementSize;  
    gulc_TypeVectorElemDestr elementDestructor;  
    size_t length;  
    size_t capacity;  
    uint8_t data[];  
} gulc_VectorImpl;  

#define TO_GULC_VECTOR_IMPL(vec) ((gulc_VectorImpl*)((vec) + 1) - 1)  
#define TO_GULC_VECTOR(vec) ((gulc_TypeVector*)((vec) + 1) - 1)

Since we are using the flexible array member, I think it’s better to spend a little to talk about how we actually use it rather than the vector implementation, which is pretty standard.

When we allocate memory dynamically for a struct with a flexible array member, we ask for the amount of bytes required for the struct plus the size of the flexible array, which in our case will be the capacity of the vector times the element size in bytes of each element.

When we instead free the memory, we simply free the previously allocated memory. No need to free anything else since we have our data all compact.

Here I leave just a peek of the implementation for the create and destroy function.

GULC_FN_IMPL(gulc_TypeVector*, VectorCreate, size_t elementSize, size_t capacity, gulc_TypeVectorElemDestr elementDestructor)  
{  
    GULC_VERIFY(elementSize != 0, "A vector's element size can't be 0!");  

    gulc_VectorImpl* vector = GULC_NAME(SafeAlloc)(sizeof(*vector) + elementSize * capacity);  

    vector->elementSize = elementSize;  
    vector->elementDestructor = elementDestructor;  
    vector->length = 0;  
    vector->capacity = capacity;  

    return TO_GULC_VECTOR(vector);  
}  

GULC_FN_IMPL(void, VectorDestroy, gulc_TypeVector** vectorP)  
{  
    if(vectorP == NULL || *vectorP == NULL)  
        return;  

    gulc_VectorImpl* vec = TO_GULC_VECTOR_IMPL(*vectorP);  

    GULC_NAME(VectorClear)(vectorP);  

    GULC_NAME(Free)(&vec);  
    *vectorP = NULL;  
}

Notice how in the destroy function we get the vector “extended” metadata hidden in the implementation and free that (the pointer to the memory we allocated in VectorCreate), not the metadata the user sees.

The usage

Finally, we see some actual usage of our fresh vector clone. Considering the obstacles we had to deal during the implementation I think it turned out pretty good with a quite easy to use interface.

#include <gulc/gulc_vector.h>  
#include <stdio.h>  

int main(void)  
{  
  gulc_Vector* intVector = GULC_VECTOR_OF(int);  

  // emplace back makes room for another element  
  // and returns the pointer to that space  
  for(int i = 0; i < 10; i++)  
  {  
    int* insertedElement = gulc_VectorEmplaceBack(&intVector);  
    *insertedElement = i;  
  }  

  printf("My int vector:\n");  

  for(size_t i = 0; i < intVector->length; i++)  
    printf("[%zu]: %d\n", i, GULC_VECTOR_AT(intVector, int, i));  

  gulc_VectorPopBack(&intVector);  
  GULC_VECTOR_AT(intVector, int, 0) = 100;  

  printf("Vector length: %zu | First element: %d\n", intVector->length, GULC_VECTOR_AT(intVector, int, 0));  

  gulc_VectorDestroy(&intVector);
  return 0;
}

As you can see, nothing too difficult goes on here. The only thing worth noting is the way we insert values in the vector by simply assigning the content pointed by the return value of VectorEmplaceBack to the value we want to insert.

This is it for this article about writing a vector clone in C, as part of the gulc project. I hope you enjoyed reading this article and maybe you even learned something new. More articles about the gulc project are coming very soon so stay tuned for more!

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

DEV Community

gulc: writing a clone of C++’s vector in C

Vector clones in C

The interface

The implementation

The usage

Hands-on debugging session: instrument, monitor, and fix

Top comments (0)

The Next Generation Developer Platform

Read next

I built a tool to automate CUDA kernel profiling and optimization, would love your feedback:D

Relembrando allSettled aqui com os colegas

Building a CI/CD Pipeline for Terraform with GitHub Actions (Step-by-Step Guide)

Web Dev Day 7: JavaScript Guide Part 2

Okay