DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on

A Simple Dynamic Array for C

#c

Introduction

When many programmers get inspired to implement a dynamic array (meaning an array that grows automatically as new elements are appended) in C, they often try to incorporate the element’s type into the array. Since C lacks a template facility like C++, that often means (ab)using macros like:

#define TYPED_ARRAY(TYPE)                            \
  typedef struct TYPE##_array TYPE##_array;          \
  struct TYPE##_array {                              \
    TYPE  *elements;                                 \
    size_t len;                                      \
    size_t cap;                                      \
  };                                                 \
  static void TYPE##_array_init( TYPE##_array *a ) { \
    *a = (TYPE##_array){ 0 };                        \
  }                                                  \
  /* ... and several more functions ... */
Enter fullscreen mode Exit fullscreen mode

The problems with (ab)using macros like this for typed arrays in C are:

  1. They’re difficult to write and maintain.
  2. The entire implementation must be in the header (leading to increased compilation times and code bloat).
  3. Error messages involving macros are often very difficult to read.

Instead of trying to shoehorn typed arrays via macros like this into C, let’s try to implement the simplest possible dynamic array in C.

A Simple Dynamic Array

Here’s a fairly simple structure for a dynamic array:

typedef struct array array_t;
struct array {
  void   *elements;  // Pointer to array of elements.
  size_t  esize;     // Size of an element.
  size_t  len;       // Length of array.
  size_t  cap;       // Capacity of array.
};
Enter fullscreen mode Exit fullscreen mode

Instead of somehow trying to incorporate element type information, we’re just going to do the simplest thing and use an element’s size in bytes and treat elements as opaque chunks of memory. We’ll leave how to interpret those bytes as a type to the user. The rest of the members should be self explanatory.

The primary functions needed are for initialization, clean-up, appending elements, and accessing elements.

Initialization & Clean-Up

Initialization is trivial:

inline void array_init( array_t *array, size_t esize ) {
  *array = (array_t){ .esize = esize };
}
Enter fullscreen mode Exit fullscreen mode

The designated initializer .esize sets the element’s size; the other members are initialized to zero (or equivalent). If you wanted an array of int:

array_t int_array;
array_init( &int_array, sizeof(int) );
Enter fullscreen mode Exit fullscreen mode

Clean-up, while not trivial, is still straightforward:

typedef void (*array_free_fn_t)( void *element );

void array_cleanup( array_t *array, array_free_fn_t free_fn ) {
  if ( array == NULL )
    return;
  if ( free_fn != NULL ) {
    char *element = array->elements;
    for ( size_t i = 0; i < array->len; ++i ) {
      (*free_fn)( element );
      element += array->esize;
    }
  }
  free( array->elements );
  array_init( array, array->esize );
}
Enter fullscreen mode Exit fullscreen mode

A clean-up function is needed for elements that require their own clean-up, e.g., strings. The conversion to char* is needed since arithmetic can’t be done on pointers in standard C.

Accessing Elements

To access an element at a given index, we can implement:

inline void* array_at_nc( array_t const *array, size_t index ) {
  return (char*)array->elements + index * array->esize;
}

inline void* array_at( array_t const *array, size_t index ) {
  return index < array->len ? array_at_nc( array, index ) : NULL;
}
Enter fullscreen mode Exit fullscreen mode

The _nc version is for “no check,” i.e., does no bounds-checking. If you’re iterating, the _nc version is fine since the index is guaranteed to be < len:

for ( size_t i = 0; i < int_array.len; ++i ) {
  int n = *(int*)array_at_nc( &int_array, i );
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Appending Elements

To append an element, we can implement:

inline void* array_push_back( array_t *array ) {
  array_reserve( array, 1 );
  return array_at_nc( array, array->len++ );
}
Enter fullscreen mode Exit fullscreen mode

We’ll get to array_reserve shortly, but, in the mean time, notice that array_push_back does not take the element to be appended; instead, for simplicity, it merely makes room for the element and returns a pointer to the raw memory for it at which point the user can assign to it:

*(int*)array_push_back( &int_array ) = 42;
Enter fullscreen mode Exit fullscreen mode

After all, the core functionality of a dynamic array is only to grow when necessary; we leave dealing with the typed elements to the user. This is similar in spirit to the standard library functions of qsort and bsearch that implement only core functionality (sorting and searching, respectively) leaving the elements to the user.

The array_reserve function is where most of the “meat” is. It reserves space for res_len additional elements. A naive implementation would grow the array by exactly res_len elements. The problem with that is if you (a) don’t know how many elements there will be in advance (which is often the case) and (b) append elements one by one, it would call realloc that, not only has to reallocate the array, but also memcpy the elements every time — which is really expensive, specifically, runs in O(n2) time.

Instead, we grow the array by at least res_len additional elements, specifically 1.5n:

#define ARRAY_CAP_MIN  4  /* Minimum array capacity. */

bool array_reserve( array_t *array, size_t res_len ) {
  assert( array != NULL );
  if ( res_len <= array->cap - array->len )
    return false;
  if ( array->cap == 0 )
    array->cap = ARRAY_CAP_MIN;
  size_t const min_cap = array->len + res_len;
  while ( array->cap < min_cap )
    array->cap += array->cap >> 1;      // grow by ~1.5x
  array->elements = realloc( array->elements, array->cap * array->esize );
  return true;
}
Enter fullscreen mode Exit fullscreen mode

While this has the potential to waste memory (more in a bit), the benefit is that the expense of reallocation exponentially decreases over time and so each append is amortized to O(1).

Why 1.5 instead of 2? The problem with 2 is that doubling the size of each new allocation is always greater than the sum of all previous allocations combined, which means realloc can’t reuse even a block coalesced from previous allocations. For example, assuming previous allocations of 4, 8, and 16 (summing to 28), the next allocation will be 32, but 32 > 28, so realloc can’t reuse that block.

In contrast, growing by 1.5x yields allocations 4, 6, and 9 (summing to 19), and the next allocation will be 13, and 13 ≤ 19, so realloc can reuse that block; hence 1.5x is easier on the memory allocator by reducing fragmentation. Additionally, growing by 1.5x wastes at most 33% of allocated memory whereas 2x wastes at most 50%.

The use of ARRAY_CAP_MIN having a value of 4 eliminates the expensive initial size increases (that would start out as 2, 3, and 4).

Other Functions

The functions array_init, array_reserve, array_push_back, and array_cleanup are all you really need for a dynamic array. For convenience, you could also add functions like array_back, array_front, array_pop_back, array_qsort, and array_bsearch. Those, if wanted, are left as exercises for the reader.

Adding Type Information

Now that we’ve got a nice, small, and efficient implementation of a dynamic array, it turns out that it is possible to add type information thus making the dynamic array type-safe by using just a few small macros.

As far as I know, the following technique was invented by Daniel Hooper. The trick is to put a generic data type of interest (here, array) inside an anonymous union with a pointer to the desired element type:

#define typed_array(TYPE) \
  union { array_t array; TYPE *ptr_type; }

typed_array(int) int_array;
Enter fullscreen mode Exit fullscreen mode

By using a union, ptr_type takes up no additional space. It’s also never written to nor read from: it exists only to provide type information at compile-time.

Why TYPE* and not just TYPE? Because (a) TYPE* is sufficient and (b) sizeof(TYPE*) is just the size of a pointer that’s always less than sizeof(array_t).

Given such a union, we can now implement additional wrapper macros for array:

#define typed_array_init(ARRAY) \
  array_init( &(ARRAY)->array, sizeof( *(ARRAY)->ptr_type ) )

#define typed_array_push_back(ARRAY) \
  (typeof((ARRAY)->ptr_type))array_push_back( &(ARRAY)->array )

#define typed_array_at_nc(ARRAY,INDEX) \
  (typeof((ARRAY)->ptr_type))array_at_nc( &(ARRAY)->array, (INDEX) )
Enter fullscreen mode Exit fullscreen mode

Obviously, these macros require typeof from C23, but typeof has also been supported by gcc and clang as an extension to older C versions for some time.

Given those macros, we can now do:

typed_array(int) int_array;
typed_array_init( &int_array );
*typed_array_push_back( &int_array ) = 42;
Enter fullscreen mode Exit fullscreen mode

that eliminates the explicit use of sizeof as well as all the casts. Optionally, you can add even more macros for pointer iteration:

#define typed_array_type(ARRAY)  typeof( *(ARRAY)->ptr_type )

#define typed_array_begin(ARRAY) (ARRAY)->array.elements

#define typed_array_end(ARRAY) \
  typed_array_at_nc( (ARRAY), (ARRAY)->array.len )
Enter fullscreen mode Exit fullscreen mode

Then:

for ( typed_array_type(&ia) *e = typed_array_begin(&ia);
      e < typed_array_end(&ia); ++e ) {
  printf( "%d\n", *e );
}
Enter fullscreen mode Exit fullscreen mode

Caveat

One caveat with typed_array, specifically with its use of an anonymous union, is that C doesn’t consider anonymous unions (or structures) the same type even when their members are identical:

typed_array(int) i;
typed_array(int) j;
// ...
i = j;  // error: assigning incompatible type
Enter fullscreen mode Exit fullscreen mode

However, you can use typedef to make them the same type:

typedef typed_array(int) int_array;

int_array i;
int_array j;
// ...
i = j;  // OK now
Enter fullscreen mode Exit fullscreen mode

Verbosity

If you think having to spell out typed_array all the time is too verbose, you could rename array to be something like vp_array (void pointer array) and then just name typed_array as array, especially if you always plan to use the typed version.

Conclusion

Personally, I find that the dynamic array implementation shown here exemplifies the less is more principle and keeps in the spirit of C. I’ve used array in my include-tidy project and it works just fine.

If you really want a type-safe version, it’s easily added on as an extra. Keeping it as an extra rather than trying to incorporate type information directly into array keeps the two layers nicely separated and makes each simpler.

Top comments (0)