Introduction
As you may know, variadic functions in C are those that can take a varying number of arguments, the most well-known of which is printf and related functions. As you may also know, variadic functions have a number of caveats as I previously explained. Can type-safe variadic arguments be implemented in standard C? Yes!
A Type-Safe Variadic Argument
First, we need a type that can store a value for any one of C’s built-in types — a type-safe variadic argument (TSVA):
enum tsva_type {
TSVA_BOOL,
TSVA_CHAR,
TSVA_SCHAR,
TSVA_SHORT,
TSVA_INT,
TSVA_LONG,
TSVA_LONG_LONG,
TSVA_UCHAR,
TSVA_USHORT,
TSVA_UINT,
TSVA_ULONG,
TSVA_ULONG_LONG,
TSVA_FLOAT,
TSVA_DOUBLE,
TSVA_PTR_CHAR,
TSVA_PTR_CONST_CHAR,
TSVA_PTR_VOID,
TSVA_PTR_CONST_VOID
};
typedef enum tsva_type tsva_type_t;
struct tsva_value {
union {
bool b;
char c;
signed char sc;
short s;
int i;
long l;
long long ll;
unsigned char uc;
unsigned short us;
unsigned int ui;
unsigned long ul;
unsigned long long ull;
float f;
double d;
char *pc;
char const *pcc;
void *pv;
void const *pcv;
};
tsva_type_t type;
};
typedef struct tsva_value tsva_value_t;
That is tsva_value uses an anonymous union to store an argument’s value and type to store its type.
For
charandvoid, the union has pointers to those types. Ideally, the union should haveT*andT const*for all the built-in types. They were elided here for for brevity.The union omits the
long doubletype since it might be double the size ofdoublethat would doublesizeof(tsva_value_t)for a type that’s rarely used.The union also omits the
_Complexfloating-point types as well as the new C23_Decimal32,_Decimal64, and_Decimal128types since those would also increasesizeof(tsva_value_t)for types that are rarely used.That said,
sizeof(tsva_value_t)doesn’t really matter (as we’ll see). If you need any of the omitted types, feel free to add them.
Using Type-Safe Variadic Arguments
An example of using type-safe variadic arguments might be a function that can print its arguments like printf, but in a type-safe way:
void tsva_print( unsigned n, tsva_value_t const value[n] ) {
for ( unsigned i = 0; i < n; ++i ) {
switch ( value[i].type ) {
case TSVA_CHAR:
printf( "%c", value[i].c );
break;
case TSVA_INT:
printf( "%d", value[i].i );
break;
// Other types ...
case TSVA_PTR_CHAR:
case TSVA_PTR_CONST_CHAR:
fputs( value[i].pc, stdout );
break;
} // switch
}
}
That is, type-safe variadic arguments will be passed as an “array” (really, a pointer). Just as with standard variadic arguments, type-safe variadic arguments leave it to the user to determine how to know how many arguments there are. In this example, the number of elements, n, is passed as an argument also.
But how is such a function called? Specifically, how are the tsva_value_t elements created? We can use some macros:
#define TSVA_INIT(TYPE, FIELD, VALUE) \
(tsva_value_t){ .type = TSVA_##TYPE, .FIELD = (VALUE) }
#define TSVA_BOOL(VALUE) TSVA_INIT( BOOL, b, (VALUE) )
#define TSVA_CHAR(VALUE) TSVA_INIT( CHAR, c, (VALUE) )
#define TSVA_SCHAR(VALUE) TSVA_INIT( SCHAR, sc, (VALUE) )
#define TSVA_SHORT(VALUE) TSVA_INIT( SHORT, s, (VALUE) )
#define TSVA_INT(VALUE) TSVA_INIT( INT, i, (VALUE) )
#define TSVA_LONG(VALUE) TSVA_INIT( LONG, l, (VALUE) )
#define TSVA_LONG_LONG(VALUE) TSVA_INIT( LONG_LONG, ll, (VALUE) )
#define TSVA_UCHAR(VALUE) TSVA_INIT( UCHAR, uc, (VALUE) )
#define TSVA_USHORT(VALUE) TSVA_INIT( USHORT, us, (VALUE) )
#define TSVA_UINT(VALUE) TSVA_INIT( UINT, ui, (VALUE) )
#define TSVA_ULONG(VALUE) TSVA_INIT( ULONG, ul, (VALUE) )
#define TSVA_ULONG_LONG(VALUE) TSVA_INIT( ULONG_LONG, ull, (VALUE) )
#define TSVA_FLOAT(VALUE) TSVA_INIT( float, f, (VALUE) )
#define TSVA_DOUBLE(VALUE) TSVA_INIT( double, d, (VALUE) )
#define TSVA_PTR_CHAR(VALUE) TSVA_INIT( PTR_CHAR, pc, (VALUE) )
#define TSVA_PTR_CONST_CHAR(VALUE) TSVA_INIT( PTR_CONST_CHAR, pcc, (VALUE) )
#define TSVA_PTR_VOID(VALUE) TSVA_INIT( PTR_VOID, pv, (VALUE) )
#define TSVA_PTR_CONST_VOID(VALUE) TSVA_INIT( PTR_CONST_VOID, pcv, (VALUE) )
Given those, we can call tsva_print like this:
tsva_print( 2, (tsva_value_t[]){
TSVA_PTR_CONST_CHAR("Answer is: "), TSVA_INT(42)
} );
While that works, it’s a bit ugly and verbose. Let’s see if we can clean that up.
Counting Variadic Arguments
One of the things that makes the current code ugly, not to mention error-prone, is having to count (correctly!) the number of arguments and pass that. Can the number of variadic arguments be counted? Yes:
#define VA_ARGS_COUNT(...) \
ARG_11(__VA_ARGS__ __VA_OPT__(,) 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
#define ARG_11(_1,_2,_3,_4,_5,_6,_7,_8,_9,_10,_11,...) _11
As first shown in an earlier article, the ARG_11 macro always returns its 11th argument. This time, however, it’s being used to implement VA_ARGS_COUNT. Just as before, __VA_ARGS__ will “slide” the correct answer (in this case, the number of arguments) into the 11th argument position. Given that, we can now do:
#define tsva_print(...) \
tsva_print( VA_ARGS_COUNT( __VA_ARGS__ ), \
(tsva_value_t[]){ __VA_ARGS__ } )
Since we’re defining a macro anyway, might as well have it insert the (tsva_value_t[]) compound literal boilerplate code. Now explicitly specifying the count and the boilerplate can be eliminated:
tsva_print( TSVA_PTR_CONST_CHAR("Answer is: "), TSVA_INT(42) );
That’s better, but the required use of the TSVA_ macros is still quite verbose. Can those be eliminated? As it happens, yes (with a caveat).
Deducing the Type of Variadic Arguments
What we need is a way to deduce the type of each argument automatically. As I described in a previous article, _Generic can be used.
However, as noted in that article, each type case for _Generic must be valid C. Here, that means we can’t use the TSVA_ macros because it would attempt to assign a given VALUE to every union field that would either cause warnings (e.g., assigning a floating-point value to .i) or errors (e.g., assuming a pointer value to .f). Instead, we can use helper functions instead of macros:
static inline tsva_value_t tsva__Bool( _Bool value ) {
return (tsva_value_t){ .type = TSVA_BOOL, .b = value };
}
static inline tsva_value_t tsva_char( char value ) {
return (tsva_value_t){ .type = TSVA_CHAR, .c = value };
}
static inline tsva_value_t tsva_signed_char( signed char value ) {
return (tsva_value_t){ .type = TSVA_SCHAR, .sc = value };
}
static inline tsva_value_t tsva_short( short value ) {
return (tsva_value_t){ .type = TSVA_SHORT, .s = value };
}
static inline tsva_value_t tsva_int( int value ) {
return (tsva_value_t){ .type = TSVA_INT, .i = value };
}
static inline tsva_value_t tsva_long( long value ) {
return (tsva_value_t){ .type = TSVA_LONG, .l = value };
}
static inline tsva_value_t tsva_long_long( long long value ) {
return (tsva_value_t){ .type = TSVA_LONG_LONG, .ll = value };
}
static inline tsva_value_t tsva_unsigned_char( unsigned char value ) {
return (tsva_value_t){ .type = TSVA_UCHAR, .uc = value };
}
static inline tsva_value_t tsva_unsigned_short( unsigned short value ) {
return (tsva_value_t){ .type = TSVA_USHORT, .us = value };
}
static inline tsva_value_t tsva_unsigned_int( unsigned int value ) {
return (tsva_value_t){ .type = TSVA_UINT, .ui = value };
}
static inline tsva_value_t tsva_unsigned_long( unsigned long value ) {
return (tsva_value_t){ .type = TSVA_ULONG, .ul = value };
}
static inline tsva_value_t tsva_unsigned_long_long( unsigned long long value ) {
return (tsva_value_t){ .type = TSVA_ULONG_LONG, .ull = value };
}
static inline tsva_value_t tsva_float( float value ) {
return (tsva_value_t){ .type = TSVA_FLOAT, .f = value };
}
static inline tsva_value_t tsva_double( double value ) {
return (tsva_value_t){ .type = TSVA_DOUBLE, .d = value };
}
static inline tsva_value_t tsva_ptr_char( char *value ) {
return (tsva_value_t){ .type = TSVA_PTR_CHAR, .pc = value };
}
static inline tsva_value_t tsva_ptr_const_char( char const *value ) {
return (tsva_value_t){ .type = TSVA_PTR_CONST_CHAR, .pcc = value };
}
static inline tsva_value_t tsva_ptr_void( void *value ) {
return (tsva_value_t){ .type = TSVA_PTR_VOID, .pv = value };
}
static inline tsva_value_t tsva_ptr_const_void( void const *value ) {
return (tsva_value_t){ .type = TSVA_PTR_CONST_VOID, .pcv = value };
}
Given that, we can implement TSVA_VALUE using _Generic:
#define TSVA_VALUE(VALUE) \
_Generic( (VALUE), \
_Bool : tsva__Bool, \
char : tsva_char, \
signed char : tsva_signed_char, \
short : tsva_short, \
int : tsva_int, \
long : tsva_long, \
long long : tsva_long_long, \
unsigned char : tsva_unsigned_char, \
unsigned short : tsva_unsigned_short, \
unsigned int : tsva_unsigned_int, \
unsigned long : tsva_unsigned_long, \
unsigned long long: tsva_unsigned_long_long, \
float : tsva_float, \
double : tsva_double, \
char* : tsva_ptr_char, \
char const* : tsva_ptr_const_char, \
void* : tsva_ptr_void, \
void const* : tsva_ptr_const_void \
)( (VALUE) )
Now, each type case is the same by returning a pointer to a function. Hence, now you could instead do:
tsva_print( TSVA_VALUE("Answer is: "), TSVA_VALUE(42) );
and the type of each argument is automatically deduced. That’s better in that only a single macro is used, but it’s still verbose. Can those macros be eliminated completely?
Transforming Variadic Arguments
What’s needed is a way for each argument x, to wrap it as TSVA_VALUE(x). Is there a way to have the preprocessor iterate over variadic arguments? Not directly, but you can via more macro voodoo:
#define APPLY_1(MACRO, ARG) MACRO(ARG)
#define APPLY_2(MACRO, ARG, ...) MACRO(ARG), APPLY_1(MACRO, __VA_ARGS__)
#define APPLY_3(MACRO, ARG, ...) MACRO(ARG), APPLY_2(MACRO, __VA_ARGS__)
#define APPLY_4(MACRO, ARG, ...) MACRO(ARG), APPLY_3(MACRO, __VA_ARGS__)
#define APPLY_5(MACRO, ARG, ...) MACRO(ARG), APPLY_4(MACRO, __VA_ARGS__)
#define APPLY_6(MACRO, ARG, ...) MACRO(ARG), APPLY_5(MACRO, __VA_ARGS__)
#define APPLY_7(MACRO, ARG, ...) MACRO(ARG), APPLY_6(MACRO, __VA_ARGS__)
#define APPLY_8(MACRO, ARG, ...) MACRO(ARG), APPLY_7(MACRO, __VA_ARGS__)
#define APPLY_9(MACRO, ARG, ...) MACRO(ARG), APPLY_8(MACRO, __VA_ARGS__)
#define APPLY_10(MACRO, ARG, ...) MACRO(ARG), APPLY_9(MACRO, __VA_ARGS__)
#define APPLY_FOR_EACH(MACRO, ...) \
NAME2(APPLY_, VA_ARGS_COUNT(__VA_ARGS__))(MACRO, __VA_ARGS__)
#define NAME2(A,B) NAME2_HELPER(A,B)
#define NAME2_HELPER(A,B) A ## B
That is, we can use VA_ARGS_COUNT to count the number of arguments (say, 4), then construct the corresponding APPLY_ macro (say, APPLY_4) that recursively calls APPLY_n for n < 4 in turn. Hence, the APPLY_FOR_EACH macro applies MACRO to each variadic argument. Given that, we can define:
#define tsva_print(...) \
tsva_print( VA_ARGS_COUNT( __VA_ARGS__ ), \
(tsva_value_t[]){ APPLY_FOR_EACH(TSVA_VALUE, __VA_ARGS__) } )
and finally do:
tsva_print( "Answer is: ", 42 );
and it just works.
Conclusion
As this extended example once again shows, the C preprocessor has its own weird text processing language inside of C. The highlighted techniques of:
- Argument “sliding” as shown in
ARG_11; and: - Argument counting via
VA_ARGS_COUNT; and: - Transforming arguments via
APPLY_FOR_EACH.
are often used in advanced macros. In this case, they can provide type-safe variadic arguments.
Caveats
While the version of this code that uses the explicit macros of TSVA_BOOL, TSVA_CHAR, etc., has no caveats, the version that uses _Generic does, specifically:
-
charliterals are deduced toint, notchar; and: -
_Boolliterals offalseandtrueare also deduced toint, not_Bool. (However, this has been fixed in C23; see below.)
Hence:
char c = '*';
_Bool b = false;
type_print(c); // prints "*"
type_print(b); // prints "false"
type_print('*'); // prints "42", not "*"
type_print(false); // prints "0", not "false"
This isn’t a problem with _Generic. In C, the type of a char literal is simply defined to be int, not char. Prior to C11 when _Generic was added, this never mattered; but now it matters.
The C++ Committee fixed it in C++ because they wanted the ability to overload functions by char and you can’t do that consistently if char variables are of type char but char literals are of type int. The C Committee should have fixed the type of char literals in C11.
Now in C23 with both auto and typeof, the problem is even worse:
auto x = '*'; // deduces int, not char
typeof('*') y; // type is int, not char
Similarly, when _Bool was added in C99, it was a transitional type. Specifically, false and true were not added as keywords, but only macros as 0 and 1 (integers), respectively, via stdbool.h. At least this has finally been fixed in C23 now that false and true are proper keyword literals for bool.
Top comments (0)