Diego Saraiva

Posted on Jun 5, 2021

Reflection in C++

Introduction

C++ a language designed to be written in a human-readable format and then compiled directly into a bunch of CPU-readable instructions. The only information we could get in touch within a runtime environment is those values stored on heap and stack since after an executable file was loaded into main memory and started to execute, the available scope of the program itself is bound by its own virtual mapped memory space. This model assumes that we’d write the human-readable source code, and later let the compiler translate them into whatever a CPU understands. It is totally fine for quite a lot of usage cases.

But, there is a wide range of computer programming tasks that involve the execution of the same algorithm on a set of types defined by an application or on instances of these types, accessing member variables, calling free or member functions in an uniform manner, converting data between the language’s intrinsic representation and external formats, etc. We can use as example the serialization of persistent data in a custom binary format or in XML,JSON, etc. In this kind of task, you’d want the program to know itself in a human-readable way and feedback the user these pieces of information in runtime. An way to automate these tasks is to use reflection.

The ability of a program to examine the type or properties of an object at runtime is called introspection, and if it could furthermore modify its own structure and behaviour at runtime then it’s called intercession, the combination of these two abilities is named reflection. The ability of a program to exame the type or properties of an object at runtime is called introspection, and if it could furthermore modify itself then it’s called intercession, the combination of these two abilities is named reflection. To be clear: reflection is the ability examine, introspect, and modify its own structure and behavior at runtime. Python, Java, Ruby, Typescript and a bunch of other languages come with reflection baked in the language. But, and what about C++?

As we’ve known, unfortunately, C++ doesn't outstand when it comes to reflection features in runtime. It's designed to be statically built but with the ability to perform dynamic behaviour. All the execution procedures are pre-defined by the programmer, and we’d use the conditional branch and the polymorphism to achieve dynamic in runtime. But when we want some introspection capabilities, the best ability provided by default is Run-Time Type Identification (RTTI), nevertheless, not only RTTI isn't always available, once that it's compiler-specific,
but the RTTI also gives you barely more than the current type of the manipulated object. As we can noticed in the following code snippet:

#include <iostream>
#include <typeinfo>

struct Base { 
    virtual ~Base() = default;  // polymorphic
};

struct Derived : Base {};

Derived d;
Base& b = d;

//NOTE: The string returned by typeid::name is implementation-defined
std::cout << typeid(b).name() << '\n';

Thus, C++ doesn’t provide us facilities to get runtime reflection easy; and it is often criticized for this, but it doesn’t mean it doesn’t have any reflection capabilities.

In this post, we’re gonna explore what introspection features are currently available to us and what is possible to achieve given its limitations. This post is based on Jean Guegant

Introspection Broken Down

To shortly recap, type introspection is the feature of reflection to ask the object something about something in particular. For example, you could ask an object if it has a serialize member function in order to call it, or you could query the object to know if it has a given data member. What we’re doing here is basically inspect the object to check if it fulfils a contract or a set of criteria - concept feelings, I know but this is subject to another post.

C++ offers a quite powerful way to inspect whether an object has a specific member or not: SFINAE. Before explaining what is SFINAEand what this acronym stands for, let's explore one of main motivation example to reflection: serialization. For instance, in Python, using reflection off course, one can do the following:

class PyA(object):
    def __str__(self):
        return "I'm a A"

class PyB(object):
    # Specialize method for serialization.
    def serialize(self):
        return "I'm a B"

class PyC(object):
    def __init__(self):
        # NOTE: 'serialize' is not a method. 
        self.serialize = ""

    def __str__(self):
        return "I'm a C"

def serialize(obj):
    # Let's check if obj has an attribute called 'serialize'.
    if hasattr(obj, "serialize"):
        # Let's check if this 'serialize' attribute is a method.
        if hasattr(obj.serialize, "__call__"):
            return obj.serialize()

    # Else we call the __str__ method.
    return str(obj)

a = PyA()
b = PyB()
c = PyC()

print(serialize(a)) # output: I am a A.
print(serialize(b)) # output: I am a B.
print(serialize(c)) # output: I am a C.

The Python code above show us that introspection comes pretty handy during serialization process. Once that we can check if an object has an attribute and to query the type of this attribute. In our Python example, introspection permits us to use the serialize method if available and fall back to the more generic method otherwise. Great job! We can do it in plain C++ too!. So, our goal now is to bring the fowolling code a life.

struct A {
    virtual ~A() = default;
};

struct B {
    std::string serialize() const { return "I'm a B!"; }
};

struct C { 
    std::string serialize; 
};


// Function overloads to A and B types 
std::string to_string(const A&){ return "I'm a A!";}
std::string to_string(const C&){ return "I'm a C!";}

std::cout << serialize(a)) // output: I am a A.
std::cout << serialize(b)) // output: I am a B.
std::cout << serialize(c) // output: I am a C.

I'm going to start with the using the C++98 to present the C++ evolution during last years. I pretend to combat the wrong idea that C++ don´t evolves I'm going to start with the using the C++98, but don´t worry, I will present the modern form too. My secondary goal with this post is combat the wrong idea that C++ doesn't evolves. So, exposing the reader to the old forms and new ones gives to him a view about the language progress

The C++98-way

The solution presented below relies on 3 key concepts: overload resolution, the static behavior of sizeof and SFINAE.

Overload resolution:

Overload resolution is the process that selects the function to call for a given call expression. Consider the following simple example:


void display_num(int);      // #1
void display_num(double);   // #2

int main()
{
    display_num(399);       // #1 matches better than #2
    display_num(3.99);      // #2 matches better than #1
}

In this example, the function name display_num() is said to be overloaded. When this name is used in a call, a C++ compiler must therefore distinguish between the various candidates using additional information; mostly, this information is the types of the call arguments. The rule of thumb in this case is the compiler picks the candidate function whose parameters match the arguments most closely is the one that is called. So far, so go, but in C++ we also have some sink-hole functions that accept everything: the variadic functions. Variadic functions are functions (e.g. printf) which take a variable number of arguments of any type. How does this work? Nothing is better than an example:

void print(...);  // 1
template <typename T> void print(const T& t); // 2

print(1); // Call the templated function version of f.

I need that you keep in mind that C++ prefer non-templates and templates functions over variadic functions! Finally, a picture speaks a thousand words:

SFINAE (Substitution Failure Is Not An Error)

Let’s start with a C++ principle behind this concept: The compiler can reject code that "would not compile" for a given type to provide protection only against attempts to create invalid types but not against attempts to evaluate invalid expressions. We call this principle SFINAE (pronounced like sfee-nay), which stands for "substitution failure is not an error". In rough terms, SFINAE is a rule that applies during overload resolution for templates. If substituting the template parameter with the deduced type fails, the compiler won’t report an error; it’ll ignore that particular overload. Let me show an example again:

// number of elements in a raw array:
template<typename T, unsigned N> 
std::size_t len (T(&)[N])
{
    return N;
}
// number of elements for a type having size_type:
template<typename T> 
typename T::size_type len (T const& t)
{
    return t.size();
}

int a[10];
std::cout << len(a);        // OK: only len() for array matches
std::cout << len("tmp");    //OK: only len() for array matches

std::vector<int> v = {1, 2, 3};
std::cout << len(v);        // OK: only len() for a type with

Here, we define two function templates len() taking one generic argument:

The first function template declares the parameter as T(&)[N], which means that the parameter has to be an array of N elements of type T.
The second function template declares the parameter simply as T and requires that the passed argument type has a corresponding member size_type and return it.

According to its signature, the second function template also matches when substituting (respectively) int[10] and char const[4] for T, but those substitutions lead to potential errors in the return type size_type. The second template is therefore ignored for these calls. Analogously, when passing a std::vector, only the second function template matches and the first one is ignored.

The operator sizeof:

There is a surprising amount of power in sizeof; this is because you can apply sizeof to any expression, no matter how complex, and sizeof returns its size, without actually evaluating that expression at runtime. This means that sizeof is aware of overloading, template instantiation, conversion rules — everything that can take part in a C++ expression. In fact, sizeof is a complete facility for deducing the type of an expression; eventually, sizeof throws away the expression and returns you only the size of its result. To remember: sizeof returns the size of the object of the type that would be returned by expression, if evaluated.

The real power with sizeof comes in when we start using function overloads. If we have 2 versions of the same function, we can pass some parameters to that function, and the compiler will figure out which function is the best match. If each function has differently sized return types, we can use sizeof to discriminate which one the compiler chose for any given parameters. Are you ready? Let's go:

typedef char no;        
typedef char yes[2];    

template<typename T>
yes test(const T&);

template<typename T>
no test(...);

int main(){
    std::cout<< (sizeof(f<int>(1, 1)) == sizeof(f<int>(1))) << '\n'; // output: 0 
    std::cout<< (sizeof(f<int>(1)) == sizeof(f<int>(1))) << '\n';   // output: 1
}

Calling a function with ellipsis with a C++ object has undefined results, but who cares? Nobody actually calls the function.
It’s not even implemented!

What's the point here? We found a way to exploit the sizeof operator to detect whether an arbitrary type T has the same signature as another arbitrary type U! Thus, we can pass a type and use this technique to check if satisfies the expected signature. Here we are going again.....

NOTE: Here is one little problem. What if T makes its default constructor private? In this case, the expression T fails to compile and so does all of our scaffolding. Fortunately, there is a simple solution — just use a strawman function returning a T.

The working serialize

First, I would like to show you a tricky implemention developed by Jean Guegant

template <typename T>
struct has_serialize
{
    // For the compile time comparison.
    typedef char no;        
    typedef char yes[2];    

    // 1 - This helper struct permits us to check two properties of a template argument.
    template <typename U, U u> struct has_member;

    // 2 - Two overloads for yes: one if for the signature of a normal method,
    // one is for the signature of a const method.
    template <typename C>
    static yes& test(has_member<std::string (C::*)(), &C::serialize>* /*unused*/);

    template <typename C>
    static yes& test(has_member<std::string (C::*)() const, &C::serialize>* /*unused*/);

    // 3 - The C++ sink-hole for failback.
    template <typename>
    static no& test(...);

    // 4 - The test is actually done here, thanks to the sizeof compile-time evaluation.
    static const bool value = sizeof(test<T>(0)) == sizeof(yes);
};

//
std::cout << has_serialize<A>::value << '\n'; // output: 0 - A hasn't a serialize method 
std::cout << has_serialize<B>::value << '\n'; // output: 1 - B has a serialize method
std::cout << has_serialize<C>::value << '\n'; // output: 0 - C has't a serialize method.

Note that here we're using the size of the return value to check how the overloaded has_member function is resolved. It is tricky, I know. To aid clarity: the helper struct has_member checks whether &C::serialize has the same signature as the first argument! For example, for the has_serialize<B> call, the has_member<std::string (C::*)(), &C::serialize> should be substituted by has_member<std::string (C::*)(), std::string (C::*)() &C::serialize> and work!

As we restrict ourselves to C++98, we lose decltype and declval, which are the main driver of this language in C++11 and beyond. Don´t be panic; We can emulate this by abusing sizeof again.

template <typename T>
T declval();

template <typename T>
struct has_serialize
{
    // the size of the array, is determined by our sizeof expression.
    template <typename C>
    static yes& test(int (*)[sizeof(declval<U>().serialize(), 1)]);

    // 2 - The C++ sink-hole for failback.
    template <typename>
    static no& test(...);

    // 3 - The test is actually done here, thanks to the sizeof compile-time evaluation.
    static const bool value = sizeof(test<T>(0)) == sizeof(yes);
};

Here, we're passing a pointer to a fixed size array int (*) [x], where x, this will SFINAE out if our type does not have the
method serialize, just like the previous ones, and will return 1 otherwise.

Now you would think that it will be handle to use our has_serialize to create a serialize function like the Python one:

template <class T>
std::string serialize(const T& obj) {
    if (has_serialize<T>::value) { // Dead branch for a?
        return obj.serialize();
    } 
    else {
        return std::to_string(obj);   
    }
}
// 
A a; 
serialize(a); // ERROR: no member named 'serialize' in 'A'.

But, what's wrong with this solution? Why the compiler reclaims? If you consider the code that you will obtain after substitution and compile-time evaluation, we can understand the reason the error raised by your compiler is absolutely normal:

std::string serialize(const A& obj)
{
    if (0) { // Dead branch 
        return obj.serialize(); // error: no member named 'serialize' in 'A'.
    } 
    else {
        return to_string(obj);
    }
}

The compilers won't drop any dead-branch, and obj must therefore have both a serialize method and a to_string overload in this case.
We need a different technique, we need a way to force compilers to behave as if a particular template didn’t exist. Such templates are said to be disabled. Since, by default, all templates are enabled.

The solution is to apply an SFINAE mechanism to ensure that function templates are ignored for certain constraints by instrumenting the template code to result in invalid code for these constraints. So, using SFINAE we can construct a type that will allow us to guide overload resolution and discard candidate functions based on conditions known at compile-time. I bring to life the last piece of the puzzle called enable_if.


//1 - Default template version.
template <bool, typename T = void>
struct enable_if
{};  // This struct doesn't define "type" and the substitution will fail if you try to access it.

// 2 -  A partial-specialisation recognizes if the expression is true. 
template <typename T>
struct enable_if<true, T> {
  typedef T type; // This struct do have a "type" and won't fail on access.
};

// 3
enable_if<true, int>::type t1; // OK: The first argument is true so type is int.
enable_if<false, int>::type t2; // ERROR: The fisr argument is false so no type named 'type'
// 4
enable_if<has_serialize<B>::value, int>::type t3; // OK: B has a serialize method and t3::type is int.
enable_if<has_serialize<A>::value, int>::type t4; // ERROR: A hasn't a serialize method and no type named 'type'.

In 1, the base template does not define any member types, but the partial specialization on true does in 2. This means, if the condition evaluates to false, the substitution fails and the candidate will be discarded. Do you know what this means? We can trigger a substitution failure according to a compile time expression with enable_if.

template <class T> 
typename enable_if<has_serialize<T>::value, std::string>::type serialize(const T& obj)
{
    return obj.serialize();
}

template <class T> 
typename enable_if<not has_serialize<T>::value, std::string>::type serialize(const T& obj)
{
    return to_string(obj);
}

A a;
B b;
C c;

// The following lines work like a charm!
std::cout << serialize(a) << std::endl;
std::cout << serialize(b) << std::endl;
std::cout << serialize(c) << std::endl;

Note SFINAE at work here. When we make the serialize(B b), the compiler selects the first overload: since the condition has::serialize<B> is true, the specialization of struct enable_if for true is used, and its internal type is set to int. The second overload is omitted because without the true specialization (not has_serialize<A>::value is false) the general form of struct enable_if is selected, and it doesn't have a type, so the type of the argument results in a substitution failure.

Thus, we want one of the two functions to be instantiated for a given type T. In other words, we explicitly manage the overload set at compile-time.

NOTE: enable_if is so important that it was introduced at C++11 in STL. For C++11 and beyond, you can use std::enable_if.

LINKS

[1] Jean Guegant: An introduction to C++'s SFINAE concept: compile-time introspection of a class member

[2] Jean Guegant: How C++ Resolves a Function Call

[3] Guillaume Racicot: Reflection in C++ Part 1: The Present

DEV Community

Reflection in C++

Introduction

Introspection Broken Down

The C++98-way

Overload resolution:

SFINAE (Substitution Failure Is Not An Error)

The operator sizeof:

The working serialize

LINKS

Top comments (0)

Read next

Why Every Developer Should Use Git Rebase

Accessing Dynamic Route Parameters with the useParams Hook in React

Getting Started with Tailwind CSS: A comprehensive guide

Understanding React Router v6: Features, Setup, and Best Practices