DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on • Updated on

_Generic in C

#c

Introduction

Among other things, C11 added the _Generic keyword that enables compile-time selection of an expression based on the type of its argument.

Personally, I think _Generic is too, well, generic of a name. It should have been called something like _Typeswitch.

The motivating use-case is the ability for library authors to provide a veneer of C++ function overloading in C.

Motivating Example

The motivating example is the ability of the standard C math library in math.h to provide specialized functions for different floating point types yet only a single function in the API. For example, the library provides these three functions:

double      cbrt ( double );       // cube root of double
float       cbrtf( float );        // ... of float
long double cbrtl( long double );  // ... of long double
Enter fullscreen mode Exit fullscreen mode

While you certainly can use those functions as they are, it would be nice if you could always use just cbrt and have the compiler select the right function automatically based on the type of its argument:

double d;
float f;
long double l;

double      rv_d = cbrt( d );      // calls cbrt()
float       rv_f = cbrt( f );      // calls cbrtf()
long double rv_l = cbrt( l );      // calls cbrtl()
Enter fullscreen mode Exit fullscreen mode

To make this work in C++, you’d simply overload the functions; to make this work in C, the math library defines a macro using _Generic:

#define cbrt(N)         \
  _Generic( (N),        \
    float      : cbrtf, \
    long double: cbrtl, \
    default    : cbrt   \
  )( (N) )
Enter fullscreen mode Exit fullscreen mode

The _Generic keyword is part of the C language proper, not part of the preprocessor. However, the only way you can practically use _Generic is via a preprocessor macro.

For the above example, the ( (N) ) at the end is just calling whichever function _Generic selected and passing N as its argument.

_Generic works as follows:

  • It takes a single controlling expression followed by an association list of one or more type/expression pairs.
  • If the type (not the value) of the controlling expression matches a particular type (before the :) in the association list, then the result of the _Generic expression is the expression for that type (after the :).
  • There can be at most one occurrence of any particular type. (Remember that a typedef type is not a distinct type.)
  • Optionally, one “type” may instead be default that will match only if no other type does.
  • Expressions are not evaluated; only their type is considered.
  • Hence, _Generic is strictly compile-time and has zero run-time overhead.

Additionally, when comparing types:

  1. Top-level const, volatile, restrict, and _Atomic qualifiers are discarded. (For example, an expression of type int const will match int.)
  2. Array-to-pointer and function-to-pointer conversions happen as usual.
  3. However, no other conversions occur, including the usual arithmetic conversions. (For example, short is not promoted to int.)

For the above example, if you’re wondering how function names like cbrtf, cbrtl, and cbrt are expressions, remember that the name of a function is a shorthand for a pointer to itself. That is for any function f, f is equivalent to &f. Hence for the above example, the function names are expressions whose value is their own pointer.

Also for the above example, a keen observer might notice that the name of the macro cbrt is the same as the default function cbrt and wonder why that doesn’t cause an infinite preprocessor macro expansion loop. It doesn’t because the preprocessor will not expand a macro that references itself. Hence, if the resulting expression of _Generic is cbrt, that will not be expanded again.

Additionally for the above example, the reason default is used rather than double is that you want cbrt (the macro) when called with any other type, say int, to call cbrt (the function) and the int will be promoted to double via the usual arithmetic conversions.

Lastly for the above example, the resulting pointer-to-function followed by ( actually calls the function because, for any pointer-to-function pf, pf() is a shorthand for (*pf)().

A printf Example

When using printf, it’s sometimes difficult to remember the correct format specifier for particular types. Using _Generic, you can implement a helper macro:

#define PRINTF_FORMAT(T)        \
  _Generic( (T),                \
    _Bool             : "%d",   \
    char              : "%c",   \
    signed char       : "%hhd", \
    unsigned char     : "%hhu", \
    short             : "%hd",  \
    int               : "%d",   \
    long              : "%ld",  \
    long long         : "%lld", \
    unsigned short    : "%hu",  \
    unsigned int      : "%u",   \
    unsigned long     : "%lu",  \
    unsigned long long: "%llu", \
    float             : "%f",   \
    double            : "%f",   \
    long double       : "%Lf",  \
    char*             : "%s",   \
    char const*       : "%s",   \
    wchar_t*          : "%ls",  \
    wchar_t const*    : "%ls",  \
    void*             : "%p",   \
    void const*       : "%p"    \
  )

#define PRINT(X)  printf( PRINTF_FORMAT(X), X )

PRINT(42);       // printf( "%d", 42 )
PRINT(-273.15);  // printf( "%f", -273.15 )
PRINT("hello");  // printf( "%s", "hello" )
Enter fullscreen mode Exit fullscreen mode

One problem with this macro is that it won’t work for any other kind of pointer. A way to fix this is:

#define PRINTF_FORMAT(T)               \
  _Generic( (T),                       \
    /* ... */                          \
    char*             : "%s",          \
    char const*       : "%s",          \
    wchar_t*          : "%ls",         \
    wchar_t const*    : "%ls",         \
    default           : PTR_FORMAT(T), \
  )

#define PTR_FORMAT(P)       \
  _Generic( TO_VOID_PTR(P), \
    void const*: "%p",      \
    void*      : "%p"       \
  )

#define TO_VOID_PTR(P)      (1 ? (P) : (void*)(P))
Enter fullscreen mode Exit fullscreen mode

That is, in PRINT_FORMAT, change the void* and void const* cases to a default case that calls PTR_FORMAT(T) to handle the pointer cases.

The macro TO_VOID_PTR seems strange since 1 always evaluates to true so the result is always (P). That may seem pointless, but we want the side-effect of the ?: operator which is:

  • If either of the if-true or if-false expressions of ?: is void*, then the type of the result shall also be void* (plus const if either is const).

Since we’ve explicitly cast P to void* for the if-false expression, that forces the type of the result also to be void* (or void const*) regardless of the pointer type.

Note that a simple (void*)(P) (cast to void*) by itself won’t work because that would cast any type to void*. We want the type of the result to be void* only if P is a pointer.

A Typename Example

You can implement a macro similar to PRINTF_FORMAT to get the name of a type:

#define TYPENAME(T)                    \
  _Generic( (T),                       \
    _Bool             : "_Bool",       \
    char              : "char",        \
    /* ... */                          \
    void const*       : "void const*", \
    default           : "other"        \
  )

size_t s = 0;
printf( "Real type of size_t is %s\n", TYPENAME(s) );
Enter fullscreen mode Exit fullscreen mode

_Generic with size_t

The type size_t is a standard type commonly used to represent the size in bytes of an object or to index into an array. Invariably, size_t is an implementation defined typedef for either unsigned long or unsigned long long.

Because size_t a typedef, you can’t list it as a distinct type in a _Generic association list (assuming you include other unsigned types) because it would be a duplicate type. But what if you really want to treat size_t differently? The trick (as with many other problems in software) requires an extra level of indirection, specifically by checking for size_t first and using default for all other types:

#define F(X)                   \
  _Generic( (X),               \
    size_t : f_size_t,         \
    default: F_NOT_SIZE_T      \
  )( (X) )

#define F_NOT_SIZE_T(X)        \
  _Generic( (X),               \
    /* ... */                  \
    unsigned char     : f_uc,  \
    unsigned short    : f_us,  \
    unsigned int      : f_ui,  \
    unsigned long     : f_ul,  \
    unsigned long long: f_ull, \
    /* ... */                  \
  )
Enter fullscreen mode Exit fullscreen mode

The caveat, of course, is that whatever type size_t is typedef’d to will never be selected independently from F_NOT_SIZE_T.

The same trick can be used for any typedef’d type, e.g., uintmax_t, uintptr_t, etc., or your own typedefs.

const Overloading

Consider a singly linked list:

typedef struct slist slist;
struct slist {
  slist *next;
  void  *data;
};
Enter fullscreen mode Exit fullscreen mode

and a function to scan the list looking for a node for which a given predicate function returns true:

typedef _Bool (*slist_pred_fn)( void const* );

slist* slist_find( slist *start, slist_pred_fn pred );
Enter fullscreen mode Exit fullscreen mode

A problem you can encounter is if you try to pass a const slist:

void f( slist const *list ) {
  // ...
  slist const *found = slist_find( list, &my_pred );
Enter fullscreen mode Exit fullscreen mode

That will generate a “discards const” warning because you’re passing a const slist to a function that takes a non-const slist.

In C++, this would be an error.

While you could ignore the warning, it’s always best to write warning-free code. But how can it be fixed? You could cast away the const, but that’s ugly. In C++, you could overload slist_find() that takes slist const*; in C, you’d have to write a distinctly named function:

inline
slist const* const_slist_find( slist const *start,
                               slist_pred_fn pred ) {
  return slist_find( (slist*)start, pred );
}
Enter fullscreen mode Exit fullscreen mode

and call that instead for a const slist. While it works, it’s also ugly. However, _Generic can be used to hide the ugliness:

#define slist_find(LIST,PRED)      \
  _Generic( (LIST),                \
    slist*      : slist_find,      \
    slist const*: const_slist_find \
  )( (LIST), (PRED) )
Enter fullscreen mode Exit fullscreen mode

Now, you can always call slist_find() and it will “just work” for either a const or non-const slist.

As with the earlier example, this works because the preprocessor will not expand a macro that references itself.

Assuming the above declarations are in a .h file and the actual implementation of slist_find() is in a .c file (which includes the .h file), then you’ll run into the problem where the slist_find() (at this point, a macro) in the definition will get expanded by the preprocessor resulting in syntax errors. There are a few different fixes for this:

  1. #undef slist_find just prior to the definition (but then if it’s also used later in the .c file, you won’t get the benefit of const overloading).
  2. Use a different name for the non-const function, e.g., nonconst_slist_find.
  3. Use extra parentheses like (slist_find) in the definition.

An example of #3 is:

// slist.c
slist* (slist_find)( slist *start, slist_pred_fn pred ) {
   // ...
}
Enter fullscreen mode Exit fullscreen mode

This fix works for two reasons:

  1. In C, any declaration of the form T x (declare x of type T) can have extra parentheses added like T (x) without changing its meaning.
  2. The preprocessor will expand a function-like macro only if it’s followed by (.

Since the slist_find in the definition is followed by ) and not (, the preprocessor will not expand it.

Static if

We can also use _Generic to implement a “static if,” that is an if that’s evaluated at compile-time (similar to if constexpr in C++):

#define STATIC_IF(EXPR,THEN,ELSE)     \
  _Generic( &(char[1 + !!(EXPR)]){0}, \
    char (*)[2]: (THEN),              \
    char (*)[1]: (ELSE)               \
  )
Enter fullscreen mode Exit fullscreen mode

This works by:

  1. Converting EXPR to either 0 or 1 via !!.
  2. Creating a compound literal array having one element plus a second element only if EXPR is true.
  3. Taking the compound array’s address via & at which point its type is either “pointer to array 2 of char” (i.e., char(*)[2] if true) or “pointer to array 1 of char” (i.e., char(*)[1] if false).
  4. If the type is char(*)[2], the result is THEN; else:
  5. If the type is char(*)[1], the result is ELSE.

Reminder: in C, a “pointer to array N of T” (for some size N of some type T) is not the same as the “pointer to T” that results from the name of an array regardless of its size “decaying” into a pointer to its first element (e.g., array A being a shorthand for &A[0]). Pointers to arrays of different sizes are distinct types.

We can also build on TO_VOID_PTR to make IS_PTR_TO_CONST:

#define IS_PTR_TO_CONST(P)  \
  _Generic( TO_VOID_PTR(P), \
    void const* : 1,        \
    default     : 0         \
  )
Enter fullscreen mode Exit fullscreen mode

Given those macros, we can write a generalized macro that can const overload any function:

#define CONST_OVERLOAD(FN, PTR, ...) \
  STATIC_IF( IS_PTR_TO_CONST(PTR),   \
    const_ ## FN,                    \
    (FN)                             \
  )( (PTR) __VA_OPT__(,) __VA_ARGS__ )

#define slist_find(LIST,PRED) \
  CONST_OVERLOAD( slist_find, (LIST), (PRED) )
Enter fullscreen mode Exit fullscreen mode

Some Type Traits

Using _Generic, we can define macros similar to some type traits functions in C++.

Get whether T is a C string:

#define IS_C_STR(T)   \
  _Generic( (T),      \
    char*       : 1,  \
    char const* : 1,  \
    default     : 0   \
  )
Enter fullscreen mode Exit fullscreen mode

Get whether T is a signed, unsigned, or any integral type:

#define IS_SIGNED(T)                      \
  _Generic( (T),                          \
    char       : IS_CHAR_SIGNED,          \
    signed char: 1,                       \
    short      : 1,                       \
    int        : 1,                       \
    long       : 1,                       \
    long long  : 1,                       \
    default    : 0                        \
  )

#define IS_UNSIGNED(T)                    \
  _Generic( (T),                          \
    _Bool             : 1,                \
    char              : !IS_CHAR_SIGNED,  \
    unsigned char     : 1,                \
    unsigned short    : 1,                \
    unsigned int      : 1,                \
    unsigned long     : 1,                \
    unsigned long long: 1,                \
    default           : 0                 \
  )

#define IS_INTEGRAL(T)  (IS_SIGNED(T) || IS_UNSIGNED(T))
Enter fullscreen mode Exit fullscreen mode

As a reminder, in C, it’s implementation defined whether char is signed or unsigned. To determine which:

#define IS_CHAR_SIGNED  STATIC_IF( (char)-1 < 0, 1, 0 )
Enter fullscreen mode Exit fullscreen mode

That is, if -1 cast to char is actually < 0, then char is signed; otherwise it’s unsigned.

Get whether T is a floating-point type:

#define IS_FLOATING_POINT(T)  \
  _Generic( (T),              \
    float               : 1,  \
    double              : 1,  \
    long double         : 1,  \
    float _Complex      : 1,  \
    double _Complex     : 1,  \
    long double _Complex: 1,  \
    default             : 0   \
  )
Enter fullscreen mode Exit fullscreen mode

Gets whether T is any arithmetic type:

#define IS_ARITHMETIC(T) \
  (IS_INTEGRAL(T) || IS_FLOATING_POINT(T))
Enter fullscreen mode Exit fullscreen mode

As of C23 and typeof, gets whether T is an array (as opposed to a pointer):

#define IS_ARRAY(A)       \
  _Generic( &(A),         \
    typeof(*A) (*)[]: 1,  \
    default         : 0   \
  )
Enter fullscreen mode Exit fullscreen mode

This works because if A is actually an array:

  1. The &(A) yields “pointer to array of type T.”
  2. The A (inside typeof) “decays” into a pointer to its first element yielding “pointer to T,” i.e., T*.
  3. The *A dereferences T* yielding the element type T.
  4. Finally, T (*)[] yields “pointer to array of type T” which matches 1 above and _Generic returns 1 (true).

If A isn’t an array, e.g., a pointer, then none of the above works and _Generic matches the default case and returns 0 (false).

If you’re using a version of C prior to C23, both gcc and clang support typeof (or __typeof__) as an extension.

Gets whether T is a pointer (as opposed to an array):

#define IS_POINTER(P)         \
  _Generic( &(typeof(P)){0},  \
    typeof(*P) ** : 1,        \
    default       : 0         \
  )
Enter fullscreen mode Exit fullscreen mode

This works similarly to STATIC_IF and IS_ARRAY. The reason the &(typeof(P)){0} is necessary instead of simply &(P) is for the case where you take the address of an object via & to yield a pointer rather than pass a pointer directly, e.g.:

#define MEM_ZERO(P) do {                                   \
  static_assert( IS_POINTER(P), #P " must be a pointer" ); \
  memset( (P), 0, sizeof( *(P) ) );                        \
} while (0)

struct S { /* ... */ };
struct S s;
MEM_ZERO( &s );
Enter fullscreen mode Exit fullscreen mode

If &(P) were used, passing &s (an rvalue) would result in &(&s) which is illegal. However, using &(typeof(P)){0}, results in a compound literal of type pointer to S and compound literals are lvalues that you can take the address of.

No SFINAE (Substitution Failure is not an Error)

Consider a string buffer type:

struct strbuf {
  char   *str;
  size_t  len;
  size_t  cap;
};
typedef struct strbuf strbuf_t;
Enter fullscreen mode Exit fullscreen mode

Suppose you want to implement a macro STRLEN() that will get the length of either an ordinary C string or a strbuf_t. You might write something like:

#define STRLEN(S)               \
  _Generic( (S),                \
    char const* : strlen((S)),  \
    strbuf_t*   : (S)->len      \
  )
Enter fullscreen mode Exit fullscreen mode

That is, if the type of S is:

  • char const*, call strlen(S); or:
  • strbuf_t*, return (S)->len.

That seems fairly straightforward. There’s just one problem: it won’t compile. Instead, you’ll get:

  1. strlen(S): warning: incompatible pointer types passing strbuf_t* to a parameter of type const char*.
  2. (S)->len: error: type const char is not a structure or union.

The problem with _Generic is that all expressions must be valid — even the expressions that are not selected. Specifically for this example:

  1. You can’t call strlen() on a strbuf_t*; and:
  2. You can’t refer to ->len on a char const*.

In C++ with SFINAE, something that isn’t valid when substituted is not an error: it’s simply ignored; unfortunately, not so in C.

The way to fix this is to make every _Generic expression similar. In this case, we can add a function:

static inline size_t strbuf_len( strbuf_t const *sbuf ) {
  return sbuf->len;
}
Enter fullscreen mode Exit fullscreen mode

Then rewrite STRLEN():

#define STRLEN(S)             \
  _Generic( (S),              \
    char const* : strlen,     \
    strbuf_t*   : strbuf_len  \
  )( (S) )
Enter fullscreen mode Exit fullscreen mode

This works because each expression is the name of a function to call and each is passed a single pointer of the type it expects. Note that it’s necessary to put the argument S outside the _Generic: if it were inside, then one function call would always be passing the wrong type.

If for whatever reason you don’t want to add an inline function, there is an alternative fix:

#define STRLEN(S)                                 \
  _Generic( (S),                                  \
    char const* : strlen( (char const*)(S) ),     \
    strbuf_t*   : ONLY_IF_T(strbuf_t*, (S))->len  \
  )

#define ONLY_IF_T(T,X)          \
  _Generic( (X),                \
    T       : (X),              \
    default : ((T)only_if_t())  \
  )

void* only_if_t( void );
Enter fullscreen mode Exit fullscreen mode

Similar to an earlier example, this fix works by using an extra level of indirection. For STRLEN, if the type of S is char const*:

  • Then for the char const* case, the call to strlen() is already the right type. (The cast to char const* may seem unnecessary, but wait.)
  • But the strbuf_t* case must still be valid, so it calls ONLY_IF_T.

ONLY_IF_T treats the expression X as type T, but only if it really is of type T, in this case strbuf_t*:

  • If the type of X is actually of type strbuf_t*, then the result is just X.
  • However, if the type of X is any other type, it casts the result of only_if_t() to strbuf_t*.

The result of all this is that the strbuf_t* case of STRLEN compiles because it treats the type of S as if it were of type strbuf_t*, so the reference to ->len is valid.

But wait! Since we’re currently considering the case where the type of S is char const*, then we’re effectively casting that pointer to strbuf_t* and attempting to refer to ->len that will crash since the pointer doesn’t point to a strbuf_t — or it would crash if that line of code were actually ever executed, but it never is.

Why not? Remember: we’re currently considering the case where the type of S is char const* which means the char const* case in STRLEN will be selected. All the contortions for the strbuf_t* case are only to make the expression valid for the compiler’s sake. Once the expression passes a validity check, it’ll be discarded anyway.

A keen observer might be wondering what the only_if_t() function is for and what its definition is. Its declaration exists only to be something that can be cast to T. It’s intentionally not defined for two reasons:

  1. It’s never called since it only ever ends up in cases that the _Generic macros above discard anyway.
  2. If you were to make a mistake in the implementation of the macros resulting in the function actually being called, you’ll get an “undefined symbol” error at link-time to make you aware of your mistake.

For completeness, let’s consider the other case for STRLEN where the type of S is strbuf_t*:

  • For the char const* case, the call to strlen() passes a strlen_t* — the wrong type — which is why that cast to char const* is there: to make the code valid. This case will be discarded anyway, so it doesn’t matter.
  • For the strbuf_t* case, the call to ONLY_IF_T will simply return S.

This fix is more convoluted that the initial fix (such is life in C without SFINAE), but, depending on what you’re doing, might be a better fit.

Conclusion

_Generic allows you to implement a veneer of function overloading (including const overloading) in C and do a limited form of compile-time type introspection.

Top comments (0)