DEV Community

loading...

Compile-time polymorphism !!

Srijeyanthan
Hi, I am Sri, Experienced Software Engineer and Entrepreneur, Passionate in distributed systems, and low latency application developments.
・2 min read

Curiously recurring template pattern is one of the rarely used design patterns in general programming, normally this is referred to as CRTP. We all are well familiar with dynamic polymorphism implementation in C++, and how it compiler handles dynamic features using virtual functions.

Alt Text

The above diagram show, how B and C have their own implementation of the write method, it is very straightforward. Once your program compiled, compiler basically creates VPtr table(virtual pointer table) and stores the addresses of virtual methods in that table. Once the Virtual Table is created, it is shared across all the instances of the class.

The compiler only creates one single instance of VTable to be shared across all the objects of a class. Every instance of the class B or C has its own version of VPtr.

Normally, compiler adding 4 bytes for VPtr and If we print the size of a class object containing at least one virtual method, the output will be sizeof(class data) + sizeof(VPtr).

So, what is the issue here? well, if you are working on a high throughput system where every single function call is matters, and collectively delaying the overall process. If your program heavily implementing a virtual function, then frequently calling this writing method in the above example ends up in frequent lookup of VTable.

Now let's jump into compile-time polymorphism where bindings happen during compile time.

template <class T>
class Writer
{
  public:
    Writer()  { }
    ~Writer()  { }

    // Look at this declaration
    void write(const char* str) const
    {
      static_cast<const T*>(this)->writeImpl(str); 
    }
};

class FileWriter : public Writer<FileWriter>
{
  public:
    FileWriter(FILE* aFile) { mFile = aFile; }
    ~FileWriter() { fclose(mFile); }

    //here comes the implementation of the write method on the subclass
    void writeImpl(const char* str) const
    {
       fprintf(mFile, "%s\n", str);
    }

  private:
    FILE* mFile;
};


class ConsoleWriter : public Writer<ConsoleWriter>
{
  public:
    ConsoleWriter() { }
    ~ConsoleWriter() { }

    void writeImpl(const char* str) const
    {
      printf("%s\n", str);
    }
};


// Driver code 
int main() 
{ 
    // An Writer type pointer pointing to FileWriter 
    Writer<FileWriter>* pWriter = new FileWriter; 


    pWriter->write("Hi I am faster"); 



    return 0; 
} 

Now, the compiler won't allocate that 4-byte extra memory to hold VPtr, and it will execute really faster during the high-frequency function calls. Please note that CRTP won't really fit in all the use cases, the developer should decide the places where it can fit to reduce the overhead of VTable lookup.

Discussion (5)

Collapse
eljayadobe profile image
Eljay-Adobe • Edited

For me, it's a bit tricky to wrap my head around CRTP. So thank you for this write up! (I get it now, but it took me a long time.)

Mathieu Ropert wrote up a nice explanation in his article Polymorphic Ducks.

Collapse
jeyanthan profile image
Srijeyanthan Author

Thank you Eljay, Thank you for the reference, Polymorphic Ducks.(y)

Collapse
bluma profile image
Roman Diviš • Edited

Do You tried to perform any kind of serious benchmark to measure performance with early/late bound methods? Saying just “really faster” is not much exact thing. As there still is low level code that is needed to perform call (move values of parameters on stack, jump, prepare stack for new function, cleanup) it might be good to support this statement by some numbers. Also this performance will be much affected by selected compiler, iťs version, compilation options, platform and code itself (parameters of function, local variables).

Collapse
jeyanthan profile image
Srijeyanthan Author

Thanks Roman, The results were obtained while were running our low-latency message streaming platform leorix (leorix.io). If we are using it for general purpose, then we don't much see the time deviation notable, but as per my example, it shows part of log writer class where frequency calling like 100K Msg/Sec streaming. We have evaluated the performance not just the time taken to execute this compile-time binding, instead message per sec handling by the sytem. With the dynamic binding, it was around 92-95 K Msg/Sec. When we were tracing back the bottleneck, one is this one. !!

Collapse
ithinker profile image
Roman Temchenko

But you cannot assign a different Writer to the variable without changing its type which removes half of the polymorphism value. And there is implicit requirement of the inpl function signature. IMHO you might as well use 2 completely unrelated classes with only implicit relationship is function name. Which could probably be made explicit with macro.