Introduction
C++ a language designed to be written in a human-readable format and then compiled directly into a bunch of CPU-readable instructions. The only information we could get in touch within a runtime environment is those values stored on heap and stack since after an executable file was loaded into main memory and started to execute, the available scope of the program itself is bound by its own virtual mapped memory space. This model assumes that we’d write the human-readable source code, and later let the compiler translate them into whatever a CPU understands. It is totally fine for quite a lot of usage cases.
But, there is a wide range of computer programming tasks that involve the execution of the same algorithm on a set of types defined by an application or on instances of these types, accessing member variables, calling free or member functions in an uniform manner, converting data between the language’s intrinsic representation and external formats, etc. We can use as example the serialization of persistent data in a custom binary format or in XML,JSON, etc. In this kind of task, you’d want the program to know itself in a human-readable way and feedback the user these pieces of information in runtime. An way to automate these tasks is to use reflection.
The ability of a program to examine the type or properties of an object at runtime is called introspection, and if it could furthermore modify its own structure and behaviour at runtime then it’s called intercession, the combination of these two abilities is named reflection. The ability of a program to exame the type or properties of an object at runtime is called introspection, and if it could furthermore modify itself then it’s called intercession, the combination of these two abilities is named reflection. To be clear: reflection is the ability examine, introspect, and modify its own structure and behavior at runtime. Python, Java, Ruby, Typescript and a bunch of other languages come with reflection baked in the language. But, and what about C++?
As we’ve known, unfortunately, C++ doesn't outstand when it comes to reflection features in runtime. It's designed to be statically built but with the ability to perform dynamic behaviour. All the execution procedures are pre-defined by the programmer, and we’d use the conditional branch and the polymorphism to achieve dynamic in runtime. But when we want some introspection capabilities, the best ability provided by default is Run-Time Type Identification (RTTI), nevertheless, not only RTTI isn't always available, once that it's compiler-specific,
but the RTTI also gives you barely more than the current type of the manipulated object. As we can noticed in the following code snippet:
#include <iostream>
#include <typeinfo>
struct Base {
virtual ~Base() = default; // polymorphic
};
struct Derived : Base {};
Derived d;
Base& b = d;
//NOTE: The string returned by typeid::name is implementation-defined
std::cout << typeid(b).name() << '\n';
Thus, C++ doesn’t provide us facilities to get runtime reflection easy; and it is often criticized for this, but it doesn’t mean it doesn’t have any reflection capabilities.
In this post, we’re gonna explore what introspection features are currently available to us and what is possible to achieve given its limitations. This post is based on Jean Guegant
Introspection Broken Down
To shortly recap, type introspection is the feature of reflection to ask the object something about something in particular. For example, you could ask an object if it has a serialize member function in order to call it, or you could query the object to know if it has a given data member. What we’re doing here is basically inspect the object to check if it fulfils a contract or a set of criteria - concept feelings, I know but this is subject to another post.
C++ offers a quite powerful way to inspect whether an object has a specific member or not: SFINAE
. Before explaining what is SFINAE
and what this acronym stands for, let's explore one of main motivation example to reflection: serialization. For instance, in Python, using reflection off course, one can do the following:
class PyA(object):
def __str__(self):
return "I'm a A"
class PyB(object):
# Specialize method for serialization.
def serialize(self):
return "I'm a B"
class PyC(object):
def __init__(self):
# NOTE: 'serialize' is not a method.
self.serialize = ""
def __str__(self):
return "I'm a C"
def serialize(obj):
# Let's check if obj has an attribute called 'serialize'.
if hasattr(obj, "serialize"):
# Let's check if this 'serialize' attribute is a method.
if hasattr(obj.serialize, "__call__"):
return obj.serialize()
# Else we call the __str__ method.
return str(obj)
a = PyA()
b = PyB()
c = PyC()
print(serialize(a)) # output: I am a A.
print(serialize(b)) # output: I am a B.
print(serialize(c)) # output: I am a C.
The Python code above show us that introspection comes pretty handy during serialization process. Once that we can check if an object has an attribute and to query the type of this attribute. In our Python example, introspection permits us to use the serialize method if available and fall back to the more generic method otherwise. Great job! We can do it in plain C++ too!. So, our goal now is to bring the fowolling code a life.
struct A {
virtual ~A() = default;
};
struct B {
std::string serialize() const { return "I'm a B!"; }
};
struct C {
std::string serialize;
};
// Function overloads to A and B types
std::string to_string(const A&){ return "I'm a A!";}
std::string to_string(const C&){ return "I'm a C!";}
std::cout << serialize(a)) // output: I am a A.
std::cout << serialize(b)) // output: I am a B.
std::cout << serialize(c) // output: I am a C.
I'm going to start with the using the C++98 to present the C++ evolution during last years. I pretend to combat the wrong idea that C++ don´t evolves I'm going to start with the using the C++98, but don´t worry, I will present the modern form too. My secondary goal with this post is combat the wrong idea that C++ doesn't evolves. So, exposing the reader to the old forms and new ones gives to him a view about the language progress
The C++98-way
The solution presented below relies on 3 key concepts: overload resolution
, the static behavior of sizeof
and SFINAE
.
Overload resolution:
Overload resolution is the process that selects the function to call for a given call expression. Consider the following simple example:
void display_num(int); // #1
void display_num(double); // #2
int main()
{
display_num(399); // #1 matches better than #2
display_num(3.99); // #2 matches better than #1
}
In this example, the function name display_num() is said to be overloaded. When this name is used in a call, a C++ compiler must therefore distinguish between the various candidates using additional information; mostly, this information is the types of the call arguments. The rule of thumb in this case is the compiler picks the candidate function whose parameters match the arguments most closely is the one that is called. So far, so go, but in C++ we also have some sink-hole functions that accept everything: the variadic functions. Variadic functions are functions (e.g. printf) which take a variable number of arguments of any type. How does this work? Nothing is better than an example:
void print(...); // 1
template <typename T> void print(const T& t); // 2
print(1); // Call the templated function version of f.
I need that you keep in mind that C++ prefer non-templates and templates functions over variadic functions! Finally, a picture speaks a thousand words:
SFINAE (Substitution Failure Is Not An Error)
Let’s start with a C++ principle behind this concept: The compiler can reject code that "would not compile" for a given type to provide protection only against attempts to create invalid types but not against attempts to evaluate invalid expressions. We call this principle SFINAE
(pronounced like sfee-nay), which stands for "substitution failure is not an error". In rough terms, SFINAE
is a rule that applies during overload resolution for templates. If substituting the template parameter with the deduced type fails, the compiler won’t report an error; it’ll ignore that particular overload. Let me show an example again:
// number of elements in a raw array:
template<typename T, unsigned N>
std::size_t len (T(&)[N])
{
return N;
}
// number of elements for a type having size_type:
template<typename T>
typename T::size_type len (T const& t)
{
return t.size();
}
int a[10];
std::cout << len(a); // OK: only len() for array matches
std::cout << len("tmp"); //OK: only len() for array matches
std::vector<int> v = {1, 2, 3};
std::cout << len(v); // OK: only len() for a type with
Here, we define two function templates len() taking one generic argument:
- The first function template declares the parameter as T(&)[N], which means that the parameter has to be an array of N elements of type T.
- The second function template declares the parameter simply as T and requires that the passed argument type has a corresponding member size_type and return it.
According to its signature, the second function template also matches when substituting (respectively) int[10] and char const[4] for T, but those substitutions lead to potential errors in the return type size_type. The second template is therefore ignored for these calls. Analogously, when passing a std::vector, only the second function template matches and the first one is ignored.
The operator sizeof:
There is a surprising amount of power in sizeof
; this is because you can apply sizeof
to any expression, no matter how complex, and sizeof
returns its size, without actually evaluating that expression at runtime. This means that sizeof
is aware of overloading, template instantiation, conversion rules — everything that can take part in a C++ expression. In fact, sizeof
is a complete facility for deducing the type of an expression; eventually, sizeof
throws away the expression and returns you only the size of its result. To remember: sizeof
returns the size of the object of the type that would be returned by expression, if evaluated.
The real power with sizeof
comes in when we start using function overloads. If we have 2 versions of the same function, we can pass some parameters to that function, and the compiler will figure out which function is the best match. If each function has differently sized return types, we can use sizeof
to discriminate which one the compiler chose for any given parameters. Are you ready? Let's go:
typedef char no;
typedef char yes[2];
template<typename T>
yes test(const T&);
template<typename T>
no test(...);
int main(){
std::cout<< (sizeof(f<int>(1, 1)) == sizeof(f<int>(1))) << '\n'; // output: 0
std::cout<< (sizeof(f<int>(1)) == sizeof(f<int>(1))) << '\n'; // output: 1
}
Calling a function with ellipsis with a C++ object has undefined results, but who cares? Nobody actually calls the function.
It’s not even implemented!
What's the point here? We found a way to exploit the sizeof
operator to detect whether an arbitrary type T has the same signature as another arbitrary type U! Thus, we can pass a type and use this technique to check if satisfies the expected signature. Here we are going again.....
NOTE: Here is one little problem. What if T makes its default constructor private? In this case, the expression T fails to compile and so does all of our scaffolding. Fortunately, there is a simple solution — just use a strawman function returning a T.
The working serialize
First, I would like to show you a tricky implemention developed by Jean Guegant
template <typename T>
struct has_serialize
{
// For the compile time comparison.
typedef char no;
typedef char yes[2];
// 1 - This helper struct permits us to check two properties of a template argument.
template <typename U, U u> struct has_member;
// 2 - Two overloads for yes: one if for the signature of a normal method,
// one is for the signature of a const method.
template <typename C>
static yes& test(has_member<std::string (C::*)(), &C::serialize>* /*unused*/);
template <typename C>
static yes& test(has_member<std::string (C::*)() const, &C::serialize>* /*unused*/);
// 3 - The C++ sink-hole for failback.
template <typename>
static no& test(...);
// 4 - The test is actually done here, thanks to the sizeof compile-time evaluation.
static const bool value = sizeof(test<T>(0)) == sizeof(yes);
};
//
std::cout << has_serialize<A>::value << '\n'; // output: 0 - A hasn't a serialize method
std::cout << has_serialize<B>::value << '\n'; // output: 1 - B has a serialize method
std::cout << has_serialize<C>::value << '\n'; // output: 0 - C has't a serialize method.
Note that here we're using the size of the return value to check how the overloaded has_member function is resolved. It is tricky, I know. To aid clarity: the helper struct has_member
checks whether &C::serialize
has the same signature as the first argument! For example, for the has_serialize<B>
call, the has_member<std::string (C::*)(), &C::serialize>
should be substituted by has_member<std::string (C::*)(), std::string (C::*)() &C::serialize>
and work!
As we restrict ourselves to C++98, we lose decltype and declval, which are the main driver of this language in C++11 and beyond. Don´t be panic; We can emulate this by abusing sizeof
again.
template <typename T>
T declval();
template <typename T>
struct has_serialize
{
// the size of the array, is determined by our sizeof expression.
template <typename C>
static yes& test(int (*)[sizeof(declval<U>().serialize(), 1)]);
// 2 - The C++ sink-hole for failback.
template <typename>
static no& test(...);
// 3 - The test is actually done here, thanks to the sizeof compile-time evaluation.
static const bool value = sizeof(test<T>(0)) == sizeof(yes);
};
Here, we're passing a pointer to a fixed size array int (*) [x]
, where x, this will SFINAE
out if our type does not have the
method serialize
, just like the previous ones, and will return 1 otherwise.
Now you would think that it will be handle to use our has_serialize
to create a serialize function like the Python one:
template <class T>
std::string serialize(const T& obj) {
if (has_serialize<T>::value) { // Dead branch for a?
return obj.serialize();
}
else {
return std::to_string(obj);
}
}
//
A a;
serialize(a); // ERROR: no member named 'serialize' in 'A'.
But, what's wrong with this solution? Why the compiler reclaims? If you consider the code that you will obtain after substitution and compile-time evaluation, we can understand the reason the error raised by your compiler is absolutely normal:
std::string serialize(const A& obj)
{
if (0) { // Dead branch
return obj.serialize(); // error: no member named 'serialize' in 'A'.
}
else {
return to_string(obj);
}
}
The compilers won't drop any dead-branch, and obj
must therefore have both a serialize
method and a to_string
overload in this case.
We need a different technique, we need a way to force compilers to behave as if a particular template didn’t exist. Such templates are said to be disabled. Since, by default, all templates are enabled.
The solution is to apply an SFINAE
mechanism to ensure that function templates are ignored for certain constraints by instrumenting the template code to result in invalid code for these constraints. So, using SFINAE
we can construct a type that will allow us to guide overload resolution and discard candidate functions based on conditions known at compile-time. I bring to life the last piece of the puzzle called enable_if.
//1 - Default template version.
template <bool, typename T = void>
struct enable_if
{}; // This struct doesn't define "type" and the substitution will fail if you try to access it.
// 2 - A partial-specialisation recognizes if the expression is true.
template <typename T>
struct enable_if<true, T> {
typedef T type; // This struct do have a "type" and won't fail on access.
};
// 3
enable_if<true, int>::type t1; // OK: The first argument is true so type is int.
enable_if<false, int>::type t2; // ERROR: The fisr argument is false so no type named 'type'
// 4
enable_if<has_serialize<B>::value, int>::type t3; // OK: B has a serialize method and t3::type is int.
enable_if<has_serialize<A>::value, int>::type t4; // ERROR: A hasn't a serialize method and no type named 'type'.
In 1, the base template does not define any member types, but the partial specialization on true does in 2. This means, if the condition evaluates to false, the substitution fails and the candidate will be discarded. Do you know what this means? We can trigger a substitution failure according to a compile time expression with enable_if.
template <class T>
typename enable_if<has_serialize<T>::value, std::string>::type serialize(const T& obj)
{
return obj.serialize();
}
template <class T>
typename enable_if<not has_serialize<T>::value, std::string>::type serialize(const T& obj)
{
return to_string(obj);
}
A a;
B b;
C c;
// The following lines work like a charm!
std::cout << serialize(a) << std::endl;
std::cout << serialize(b) << std::endl;
std::cout << serialize(c) << std::endl;
Note SFINAE
at work here. When we make the serialize(B b)
, the compiler selects the first overload: since the condition has::serialize<B>
is true, the specialization of struct enable_if
for true
is used, and its internal type is set to int
. The second overload is omitted because without the true
specialization (not has_serialize<A>::value
is false) the general form of struct enable_if
is selected, and it doesn't have a type, so the type of the argument results in a substitution failure.
Thus, we want one of the two functions to be instantiated for a given type T. In other words, we explicitly manage the overload set at compile-time.
NOTE: enable_if is so important that it was introduced at C++11 in STL. For C++11 and beyond, you can use std::enable_if
.
LINKS
[1] Jean Guegant: An introduction to C++'s SFINAE concept: compile-time introspection of a class member
[2] Jean Guegant: How C++ Resolves a Function Call
[3] Guillaume Racicot: Reflection in C++ Part 1: The Present
Top comments (0)