DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on

Advanced C Preprocessor Macros for a Type-Safe Varargs Substitute

#c

Introduction

As you may know, variadic functions in C are those that can take a varying number of arguments, the most well-known of which is printf and related functions. As you may also know, variadic functions have a number of caveats as I previously explained. Can type-safe variadic arguments be implemented in standard C? Yes!

A Type-Safe Variadic Argument

First, we need a type that can store a value for any one of C’s built-in types — a type-safe variadic argument (TSVA):

enum tsva_type {
  TSVA_BOOL,
  TSVA_CHAR,
  TSVA_SCHAR,
  TSVA_SHORT,
  TSVA_INT,
  TSVA_LONG,
  TSVA_LONG_LONG,
  TSVA_UCHAR,
  TSVA_USHORT,
  TSVA_UINT,
  TSVA_ULONG,
  TSVA_ULONG_LONG,
  TSVA_FLOAT,
  TSVA_DOUBLE,
  TSVA_PTR_CHAR,
  TSVA_PTR_CONST_CHAR,
  TSVA_PTR_VOID,
  TSVA_PTR_CONST_VOID
};
typedef enum tsva_type tsva_type_t;

struct tsva_value {
  union {
    bool                b;

    char                c;
    signed char         sc;
    short               s;
    int                 i;
    long                l;
    long long           ll;

    unsigned char       uc;
    unsigned short      us;
    unsigned int        ui;
    unsigned long       ul;
    unsigned long long  ull;

    float               f;
    double              d;

    char               *pc;
    char const         *pcc;

    void               *pv;
    void const         *pcv;
  };
  tsva_type_t           type;
};
typedef struct tsva_value tsva_value_t;
Enter fullscreen mode Exit fullscreen mode

That is tsva_value uses an anonymous union to store an argument’s value and type to store its type.

For char and void, the union has pointers to those types. Ideally, the union should have T* and T const* for all the built-in types. They were elided here for for brevity.

The union omits the long double type since it might be double the size of double that would double sizeof(tsva_value_t) for a type that’s rarely used.

The union also omits the _Complex floating-point types as well as the new C23 _Decimal32, _Decimal64, and _Decimal128 types since those would also increase sizeof(tsva_value_t) for types that are rarely used.

That said, sizeof(tsva_value_t) doesn’t really matter (as we’ll see). If you need any of the omitted types, feel free to add them.

Using Type-Safe Variadic Arguments

An example of using type-safe variadic arguments might be a function that can print its arguments like printf, but in a type-safe way:

void tsva_print( unsigned n, tsva_value_t const value[n] ) {
  for ( unsigned i = 0; i < n; ++i ) {
    switch ( value[i].type ) {
      case TSVA_CHAR:
        printf( "%c", value[i].c );
        break;
      case TSVA_INT:
        printf( "%d", value[i].i );
        break;

      // Other types ...

      case TSVA_PTR_CHAR:
      case TSVA_PTR_CONST_CHAR:
        fputs( value[i].pc, stdout );
        break;
    } // switch
  }
}
Enter fullscreen mode Exit fullscreen mode

That is, type-safe variadic arguments will be passed as an “array” (really, a pointer). Just as with standard variadic arguments, type-safe variadic arguments leave it to the user to determine how to know how many arguments there are. In this example, the number of elements, n, is passed as an argument also.

But how is such a function called? Specifically, how are the tsva_value_t elements created? We can use some macros:

#define TSVA_INIT(TYPE, FIELD, VALUE) \
  (tsva_value_t){ .type = TSVA_##TYPE, .FIELD = (VALUE) }

#define TSVA_BOOL(VALUE)            TSVA_INIT( BOOL, b, (VALUE) )
#define TSVA_CHAR(VALUE)            TSVA_INIT( CHAR, c, (VALUE) )
#define TSVA_SCHAR(VALUE)           TSVA_INIT( SCHAR, sc, (VALUE) )
#define TSVA_SHORT(VALUE)           TSVA_INIT( SHORT, s, (VALUE) )
#define TSVA_INT(VALUE)             TSVA_INIT( INT, i, (VALUE) )
#define TSVA_LONG(VALUE)            TSVA_INIT( LONG, l, (VALUE) )
#define TSVA_LONG_LONG(VALUE)       TSVA_INIT( LONG_LONG, ll, (VALUE) )
#define TSVA_UCHAR(VALUE)           TSVA_INIT( UCHAR, uc, (VALUE) )
#define TSVA_USHORT(VALUE)          TSVA_INIT( USHORT, us, (VALUE) )
#define TSVA_UINT(VALUE)            TSVA_INIT( UINT, ui, (VALUE) )
#define TSVA_ULONG(VALUE)           TSVA_INIT( ULONG, ul, (VALUE) )
#define TSVA_ULONG_LONG(VALUE)      TSVA_INIT( ULONG_LONG, ull, (VALUE) )
#define TSVA_FLOAT(VALUE)           TSVA_INIT( float, f, (VALUE) )
#define TSVA_DOUBLE(VALUE)          TSVA_INIT( double, d, (VALUE) )
#define TSVA_PTR_CHAR(VALUE)        TSVA_INIT( PTR_CHAR, pc, (VALUE) )
#define TSVA_PTR_CONST_CHAR(VALUE)  TSVA_INIT( PTR_CONST_CHAR, pcc, (VALUE) )
#define TSVA_PTR_VOID(VALUE)        TSVA_INIT( PTR_VOID, pv, (VALUE) )
#define TSVA_PTR_CONST_VOID(VALUE)  TSVA_INIT( PTR_CONST_VOID, pcv, (VALUE) )
Enter fullscreen mode Exit fullscreen mode

Given those, we can call tsva_print like this:

tsva_print( 2, (tsva_value_t[]){
  TSVA_PTR_CONST_CHAR("Answer is: "), TSVA_INT(42)
} );
Enter fullscreen mode Exit fullscreen mode

While that works, it’s a bit ugly and verbose. Let’s see if we can clean that up.

Counting Variadic Arguments

One of the things that makes the current code ugly, not to mention error-prone, is having to count (correctly!) the number of arguments and pass that. Can the number of variadic arguments be counted? Yes:

#define VA_ARGS_COUNT(...) \
  ARG_11(__VA_ARGS__ __VA_OPT__(,) 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)

#define ARG_11(_1,_2,_3,_4,_5,_6,_7,_8,_9,_10,_11,...) _11
Enter fullscreen mode Exit fullscreen mode

As first shown in an earlier article, the ARG_11 macro always returns its 11th argument. This time, however, it’s being used to implement VA_ARGS_COUNT. Just as before, __VA_ARGS__ will “slide” the correct answer (in this case, the number of arguments) into the 11th argument position. Given that, we can now do:

#define tsva_print(...)                     \
  tsva_print( VA_ARGS_COUNT( __VA_ARGS__ ), \
              (tsva_value_t[]){ __VA_ARGS__ } )
Enter fullscreen mode Exit fullscreen mode

Since we’re defining a macro anyway, might as well have it insert the (tsva_value_t[]) compound literal boilerplate code. Now explicitly specifying the count and the boilerplate can be eliminated:

tsva_print( TSVA_PTR_CONST_CHAR("Answer is: "), TSVA_INT(42) );
Enter fullscreen mode Exit fullscreen mode

That’s better, but the required use of the TSVA_ macros is still quite verbose. Can those be eliminated? As it happens, yes (with a caveat).

Deducing the Type of Variadic Arguments

What we need is a way to deduce the type of each argument automatically. As I described in a previous article, _Generic can be used.

However, as noted in that article, each type case for _Generic must be valid C. Here, that means we can’t use the TSVA_ macros because it would attempt to assign a given VALUE to every union field that would either cause warnings (e.g., assigning a floating-point value to .i) or errors (e.g., assuming a pointer value to .f). Instead, we can use helper functions instead of macros:

static inline tsva_value_t tsva__Bool( _Bool value ) {
  return (tsva_value_t){ .type = TSVA_BOOL, .b = value };
}

static inline tsva_value_t tsva_char( char value ) {
  return (tsva_value_t){ .type = TSVA_CHAR, .c = value };
}

static inline tsva_value_t tsva_signed_char( signed char value ) {
  return (tsva_value_t){ .type = TSVA_SCHAR, .sc = value };
}

static inline tsva_value_t tsva_short( short value ) {
  return (tsva_value_t){ .type = TSVA_SHORT, .s = value };
}

static inline tsva_value_t tsva_int( int value ) {
  return (tsva_value_t){ .type = TSVA_INT, .i = value };
}

static inline tsva_value_t tsva_long( long value ) {
  return (tsva_value_t){ .type = TSVA_LONG, .l = value };
}

static inline tsva_value_t tsva_long_long( long long value ) {
  return (tsva_value_t){ .type = TSVA_LONG_LONG, .ll = value };
}

static inline tsva_value_t tsva_unsigned_char( unsigned char value ) {
  return (tsva_value_t){ .type = TSVA_UCHAR, .uc = value };
}

static inline tsva_value_t tsva_unsigned_short( unsigned short value ) {
  return (tsva_value_t){ .type = TSVA_USHORT, .us = value };
}

static inline tsva_value_t tsva_unsigned_int( unsigned int value ) {
  return (tsva_value_t){ .type = TSVA_UINT, .ui = value };
}

static inline tsva_value_t tsva_unsigned_long( unsigned long value ) {
  return (tsva_value_t){ .type = TSVA_ULONG, .ul = value };
}

static inline tsva_value_t tsva_unsigned_long_long( unsigned long long value ) {
  return (tsva_value_t){ .type = TSVA_ULONG_LONG, .ull = value };
}

static inline tsva_value_t tsva_float( float value ) {
  return (tsva_value_t){ .type = TSVA_FLOAT, .f = value };
}

static inline tsva_value_t tsva_double( double value ) {
  return (tsva_value_t){ .type = TSVA_DOUBLE, .d = value };
}

static inline tsva_value_t tsva_ptr_char( char *value ) {
  return (tsva_value_t){ .type = TSVA_PTR_CHAR, .pc = value };
}

static inline tsva_value_t tsva_ptr_const_char( char const *value ) {
  return (tsva_value_t){ .type = TSVA_PTR_CONST_CHAR, .pcc = value };
}

static inline tsva_value_t tsva_ptr_void( void *value ) {
  return (tsva_value_t){ .type = TSVA_PTR_VOID, .pv = value };
}

static inline tsva_value_t tsva_ptr_const_void( void const *value ) {
  return (tsva_value_t){ .type = TSVA_PTR_CONST_VOID, .pcv = value };
}
Enter fullscreen mode Exit fullscreen mode

Given that, we can implement TSVA_VALUE using _Generic:

#define TSVA_VALUE(VALUE)                        \
  _Generic( (VALUE),                             \
    _Bool             : tsva__Bool,              \
    char              : tsva_char,               \
    signed char       : tsva_signed_char,        \
    short             : tsva_short,              \
    int               : tsva_int,                \
    long              : tsva_long,               \
    long long         : tsva_long_long,          \
    unsigned char     : tsva_unsigned_char,      \
    unsigned short    : tsva_unsigned_short,     \
    unsigned int      : tsva_unsigned_int,       \
    unsigned long     : tsva_unsigned_long,      \
    unsigned long long: tsva_unsigned_long_long, \
    float             : tsva_float,              \
    double            : tsva_double,             \
    char*             : tsva_ptr_char,           \
    char const*       : tsva_ptr_const_char,     \
    void*             : tsva_ptr_void,           \
    void const*       : tsva_ptr_const_void      \
  )( (VALUE) )
Enter fullscreen mode Exit fullscreen mode

Now, each type case is the same by returning a pointer to a function. Hence, now you could instead do:

tsva_print( TSVA_VALUE("Answer is: "), TSVA_VALUE(42) );
Enter fullscreen mode Exit fullscreen mode

and the type of each argument is automatically deduced. That’s better in that only a single macro is used, but it’s still verbose. Can those macros be eliminated completely?

Transforming Variadic Arguments

What’s needed is a way for each argument x, to wrap it as TSVA_VALUE(x). Is there a way to have the preprocessor iterate over variadic arguments? Not directly, but you can via more macro voodoo:

#define APPLY_1(MACRO, ARG)        MACRO(ARG)
#define APPLY_2(MACRO, ARG, ...)   MACRO(ARG), APPLY_1(MACRO, __VA_ARGS__)
#define APPLY_3(MACRO, ARG, ...)   MACRO(ARG), APPLY_2(MACRO, __VA_ARGS__)
#define APPLY_4(MACRO, ARG, ...)   MACRO(ARG), APPLY_3(MACRO, __VA_ARGS__)
#define APPLY_5(MACRO, ARG, ...)   MACRO(ARG), APPLY_4(MACRO, __VA_ARGS__)
#define APPLY_6(MACRO, ARG, ...)   MACRO(ARG), APPLY_5(MACRO, __VA_ARGS__)
#define APPLY_7(MACRO, ARG, ...)   MACRO(ARG), APPLY_6(MACRO, __VA_ARGS__)
#define APPLY_8(MACRO, ARG, ...)   MACRO(ARG), APPLY_7(MACRO, __VA_ARGS__)
#define APPLY_9(MACRO, ARG, ...)   MACRO(ARG), APPLY_8(MACRO, __VA_ARGS__)
#define APPLY_10(MACRO, ARG, ...)  MACRO(ARG), APPLY_9(MACRO, __VA_ARGS__)

#define APPLY_FOR_EACH(MACRO, ...) \
  NAME2(APPLY_, VA_ARGS_COUNT(__VA_ARGS__))(MACRO, __VA_ARGS__)

#define NAME2(A,B)                NAME2_HELPER(A,B)
#define NAME2_HELPER(A,B)         A ## B
Enter fullscreen mode Exit fullscreen mode

That is, we can use VA_ARGS_COUNT to count the number of arguments (say, 4), then construct the corresponding APPLY_ macro (say, APPLY_4) that recursively calls APPLY_n for n < 4 in turn. Hence, the APPLY_FOR_EACH macro applies MACRO to each variadic argument. Given that, we can define:

#define tsva_print(...)                      \
  tsva_print( VA_ARGS_COUNT( __VA_ARGS__ ),  \
              (tsva_value_t[]){ APPLY_FOR_EACH(TSVA_VALUE, __VA_ARGS__) } )
Enter fullscreen mode Exit fullscreen mode

and finally do:

tsva_print( "Answer is: ", 42 );
Enter fullscreen mode Exit fullscreen mode

and it just works.

Conclusion

As this extended example once again shows, the C preprocessor has its own weird text processing language inside of C. The highlighted techniques of:

  1. Argument “sliding” as shown in ARG_11; and:
  2. Argument counting via VA_ARGS_COUNT; and:
  3. Transforming arguments via APPLY_FOR_EACH.

are often used in advanced macros. In this case, they can provide type-safe variadic arguments.

Caveats

While the version of this code that uses the explicit macros of TSVA_BOOL, TSVA_CHAR, etc., has no caveats, the version that uses _Generic does, specifically:

  1. char literals are deduced to int, not char; and:
  2. _Bool literals of false and true are also deduced to int, not _Bool. (However, this has been fixed in C23; see below.)

Hence:

char c = '*';
_Bool b = false;

type_print(c);      // prints "*"
type_print(b);      // prints "false"

type_print('*');    // prints "42", not "*"
type_print(false);  // prints "0", not "false"
Enter fullscreen mode Exit fullscreen mode

This isn’t a problem with _Generic. In C, the type of a char literal is simply defined to be int, not char. Prior to C11 when _Generic was added, this never mattered; but now it matters.

The C++ Committee fixed it in C++ because they wanted the ability to overload functions by char and you can’t do that consistently if char variables are of type char but char literals are of type int. The C Committee should have fixed the type of char literals in C11.

Now in C23 with both auto and typeof, the problem is even worse:

auto x = '*';   // deduces int, not char
typeof('*') y;  // type is int, not char
Enter fullscreen mode Exit fullscreen mode

Similarly, when _Bool was added in C99, it was a transitional type. Specifically, false and true were not added as keywords, but only macros as 0 and 1 (integers), respectively, via stdbool.h. At least this has finally been fixed in C23 now that false and true are proper keyword literals for bool.

Top comments (0)