DEV Community

Noah11012
Noah11012

Posted on

What I would like to see in containers for C

#c

Unlike C++ which has a rich assortment of containers for basic and more demanding usage, the standard makes no mention of containers for C. Most of the time, you will have to create your own or find a library that will fulfill all your project's needs.

If you need a simple map to hold the frequency of a word from an input like a file or the contents of a website, then a simple implementation specifically for a key to value of string to integer will suffice.

But instead, let's say you need a map that can be more than a string to integer mapping. You need something more general or at the very least, more types of maps like, for example, the reverse: a mapping of an integer value to a string.

This post is an opinionated piece. These suggestions are what I would like to see a container library. What I might say in the subsequent paragraphs may disagree on what you may have in mind for a container library.

With that being said, let's get to it.

Generic vs. Type Safety

What is more important? Generic maps that allow the user to create their custom mapping or type safety focused containers to keep their users safer? Each approach has its own benefits and downsides.

Type Safety

If we're going down the safety lane, one implementation is to create several predefined mappings that a user can be pick from. The obvious downside on the user's perspective is they only have so much to pick from. If they need a mapping for their own types then unfortunately for them they will have to create a map for their specific needs.

On the library maintainer's side, it isn't great either. To save time, copying and pasting are used and slight changes to the code are made to fit with what is needed.

This is error-prone and not scalable or maintainable. If you haven't heard this, then I'm telling this to you now: if you're copying and pasting code, most likely there is something awry with your code.

If you need to change the code for one mapping then to keep consistent you need to change all the implementations. This on its own can allow bugs to creep into your codebase.

Choosing to keep your users safer isn't something you necessarily shouldn't do, but you need to know that are definite downsides to this approach.

Generic

Container libraries that use the generic way of doing things have their immediate appeal: I can map any type to another of any type.

On the library maintainers' side, much of the burden is placed onto their shoulders to make sure the library is generic and isn't totally unsafe to use. An implementation that could be made is to use a void pointer. Pointers in C are an important aspect of the language. Any amount of time spending with C and at some point, you will have used a pointer.

Simply put, a pointer in C is a variable that holds the address of some memory. The only information a pointer has of what type it's pointing to is the base type before the asterisk.

int *ptr = ...;

In this example, the memory this pointer is pointing to is an integer.

If anybody has told you this, they were either protecting you from the more finicky or messy part about pointers or were lying to you whether it was intentional or not.

The base type of a pointer declaration is not necessarily the actual thing being pointed to. For all we know, it could be a floating point number. When we come to deference the pointer, the base type tells the compiler how to interpret the memory.

double number = 123.45;
int *ptr = &number;

printf("%d", *ptr);

Now I'm sure this revelation will cause some distrust between the next pointer you use.

Most of the time, the base type is the actual type being pointed at. However, this is not a guarantee.

What is so special about a void pointer? Well, for one, it has no base type. This usually means we don't care what the type of the memory is being pointed to. Second, because we don't know the type upfront, we can use this to our advantage to create generic containers.

The problem with generic containers using void pointers is we need to know the size of the object we are pointing to. This could mean putting an additional variable to hold this information.

Sticking with the map theme, a simple structure holding this information could look something like the following:

typedef struct
{
     int size_of_a_key;
     int size_of_a_value

     void *keys;
     void *value;
} Map;

On the user's side, there is a small inconvenience. At the creation of a map instance, they will have to provide the size of both the key and value. A small price to pay for getting a generic container.

Final

Personally, what would I prefer? For a project counting the frequency of a word on a website will probably go for a container library focused on type safety. Others may need to make a mapping from one custom type to another and this is something a type safety oriented library will not have. This is the job of a generic container.

With that being said, I would go for a generic container library regardless of what I was doing.

API Design

Besides having an efficient implementation of a container, you need to have an intuitive API for your users to use. Ease of use isn't just one factor you should consider when creating your API.

A few questions come into mind when deciding on how to report errors back to the calling code. If you have a return type, a special value can be designated as an error code to be returned to express something erroneous has occurred and please take care of it.

You can't always do this. For example, if you have a function to return the value that is paired with a specific key, it's not going to be possible to have a special value for an error code.

Another important factor to keep in mind when creating an API is to be consistent. Returning an error code from one function but not doing the same for the other is a bit annoying. Consistency is important.

Output parameters can be of use for us. Have a pointer to an integer and fill it with the error code if an error happened or zero if successful.

void map_get(void *key, void *value, int *error);

Passing in NULL or 0 could express indifference to an error happening.

But what about the function used to construct the container? A first choice might look like the following:

Map *map_new(int size_of_key, int size_of_value);

Nothing fancy with this. Takes a few arguments and returns a value. You will have noticed the function returns a pointer and not an actual object. Most of the time, especially with constructor functions, this is used to indicate the use of heap allocated memory. Creating a map object on the stack then returning a pointer to it will yield garbage results when the stack frame used to hold the local variables for the function is destroyed when the function returns.

Heap allocation is expensive compared to simply push and popping of the stack frame. For cases when it is not necessary, I would want to construct an object on the stack. To make this happen, the interface for the constructor function may look something like this:

void map_new(Map *map, int size_of_key, int size_of_value);

In this version, we have an output parameter for a map object. We handle how the object is created and the function takes care of the internal state and other necessary allocations.

This has another benefit we can take advantage of. Now that the return part of the function is freed up, we can use it for returning an error code.

typedef enum
{
    M_OK,
    M_BAD_MEM_ALLOC
} MapCode;

MapCode map_new(Map *map, int size_of_key, int size_of_value);

Summary

Every project has their own needs and will require different libraries that offer much of the same features as others provide but have differing philosophies. It is up to the library maintainers' to make the decision that will fulfill all the needs of the project.

Top comments (2)

Collapse
 
curtisfenner profile image
Curtis Fenner • Edited

If they need a mapping for their own types then unfortunately for them they will have to create a map for their specific needs.

... copying and pasting are used and slight changes to the code are made to fit with what is needed.

This isn't needed - C has macros! While large macros aren't usually a great idea, because a concept like a map is extremely well-defined and unlikely to need continuous development, the boost of having less indirection and type-safety is probably worth it.

Here's how it can end up looking

// maps.h
#include "stdlib.h"

#define DEFINE_MAP_IMPLEMENTATION(keyType, valueType, MapType)                 \
  struct MapType##_struct {                                                    \
    size_t count;                                                              \
    keyType* keys;                                                             \
    valueType* values;                                                         \
  };                                                                           \
  MapType* MapType##_create() {                                                \
    MapType* map = (MapType*)malloc(sizeof(MapType));                          \
    map->count = 0;                                                            \
    map->keys = NULL;                                                          \
    map->values = NULL;                                                        \
    return map;                                                                \
  }                                                                            \
  void MapType##_destroy(MapType* map) {                                       \
    free(map->keys);                                                           \
    free(map->values);                                                         \
    free(map);                                                                 \
  }

#define DEFINE_MAP_INTERFACE(keyType, valueType, MapType)                      \
  typedef struct MapType##_struct MapType;                                     \
  MapType* MapType##_create();                                                 \
  void MapType##_destroy(MapType*);

Here's a use of the above to make a InvertedIndex type:

// inverted_index.h
#include "maps.h"

typedef struct {
  size_t documentID;
  size_t offset;
} IndexUse;

DEFINE_MAP_INTERFACE(char const*, IndexUse, InvertedIndex)

and its implementation file:

// inverted_index.c
#include "inverted_index.h"

DEFINE_MAP_IMPLEMENTATION(char const*, IndexUse, InvertedIndex)
Collapse
 
noah11012 profile image
Noah11012

Using macros instead of void pointers for generic containers is a better idea, however, I find a huge macro for the implementation to be cumbersome.