DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on

Avoid the Temptation of Header-Only Libraries

#c

Introduction

One thing that occurs in most programming languages is the desire to have type-generic code, that is code whose purpose or algorithms don’t depend on the type of data. Most commonly, this manifests as the desire to have generic “containers,” e.g., arrays, lists, or sets of some type T where T can not only be any type built into the language, but user-defined types as well, e.g., list of integers, sets of strings, etc.

Different languages that support generic containers do so differently, e.g., C++ uses “templates” where type parameters are instantiated at compile-time with specific types, e.g., a list<T> can be instantiated with T = int yielding list<int>.

C has only minimal support for generic code via _Generic and preprocessor macros. Some people try to use these things to implement generic containers by using very long and elaborate macros, e.g.:

#define LIST(T)                        \
  struct list_node_##T {               \
    struct list_node_##T *next;        \
    T data;                            \
  };                                   \
                                       \
  struct list_##T {                    \
    struct list_node_##T *head, *tail; \
  };                                   \
                                       \
  static void list_init_##T( void ) {  \
  // ...
Enter fullscreen mode Exit fullscreen mode

That is the type T, a macro parameter, forms part of the names of the structures and functions, e.g., list_int.

The benefit of using macros like this is that the code is “header only,” that is all the code is in a single .h file. This makes the code easy to use: simply #include the header. There’s no corresponding .c file that needs to be compiled separately. (Of course, you should still list the .h file as a dependency.) While this works, it has a number of problems.

In the earliest days of C++ (C with Classes), macros were used to implement generic containers. The problem with macros in general is that they don’t obey either scope or type rules, nor work well with tools. Hence the addition of templates to C++.

Problems

Using macros to define generic containers has a number of problems:

  1. All functions must be declared static, otherwise every .o file whose corresponding .c file includes the header will contain definitions for all functions. That would result in the linker complaining about duplicate symbols.

  2. While static declarations solve the duplicate symbols problem, every .o file still contains the definitions for all functions that are all present within the final executable. This results in code bloat increasing the executable’s size, sometimes dramatically, both on disk and in memory.

  3. It’s hard to debug the code in the macros themselves since it expands to be a single line of code.

For example, even if you use only one type for T, say, int, hence LIST(int), but include the header into, say, five .c files, then the final executable will have five copies of all the code.

Note that if and only if all functions are marked inline in addition to static and the functions are trivial enough to actually be inlined by the compiler, then the code bloat problem goes away. However, any useful containers library will invariably have non-trivial functions that can’t be inlined.

Mitigation Tactics

For generic code as described, there’s no standard way (meaning, there’s no compiler-independent way) to solve the code bloat problem.

If you’re using gcc or clang, you can use the weak attribute; if you’re using MSVC, you’re out of luck since no equivalent attribute exists.

If you’re working on code only for a specific system or you’re the only user, then fine: you can use compiler-specific solutions; but if you want your code to be widely cross-platform, then you shouldn’t use compiler-specific solutions.

Conclusion

The hard reality is that C doesn’t really support generic header-only libraries. You should implement a library using the generic void* in a conventional .h and .c pair. Since you have to add the .h to your dependencies anyway, also adding the corresponding .c is trivial.

As an alternative to void*, you can use flexible array members to implement generic containers in C, but that’s a story for another time.

Epilogue

You might be wondering, “Don’t C++ templates have the same problems?” Only partially. First, since templates are part of the language proper and not the preprocessor, a C++ compiler can mark all instantiated functions as weak in whatever manner a given platform needs — but it’s the compiler’s problem, not your problem.

Second, the library implementation can use factorization tricks. For example, all std::list<T*> can be implemented in terms of std::list<void*>, that is the single instantiation of the latter can be used for all pointers regardless of T. There are other tricks possible as well.

Note that with a lot more work, you could probably do some factorization tricks in C as well, e.g., have PTR_LIST(T) that’s implemented in terms of LIST(void*).

References

  • The Design and Evolution of C++, Bjarne Stroustrup, AT&T Bell Laboratories, Addison-Wesley, Reading, Massachusetts, 1994, §15.1.

Top comments (0)