Unicorn Developer

Posted on Apr 30

Silent foe or quiet ally: Brief guide to alignment in C++. Part 3

#programming #cpp #compiling

We've already covered basic field alignment and explored how inheritance layers data atop one another. By now you might think we have uncovered every trap. But not so fast! This topic has a truly dark side that few discuss. One short word—virtual—completely rewrites a class "geometry," introducing alignment corrections we can't ignore. Let's find out what really happens under the hood when alignment meets virtuality.

Introduction

We continue our deep dive into memory mechanics. If you are just joining us, I recommend reviewing the foundations first. The first part covered the basics and the magic of simple data alignment, while the second part focused on how ordinary inheritance affects memory layout.

So far, we've treated an object as a static, rigidly determined data structure. Every field address was known at compile time, and inheritance simply layered parent and child attributes. Alas, object-oriented programming is impossible without dynamic polymorphism, which is implemented in C++ via the RTTI mechanism and virtual functions.

What is virtuality?

Let me start from an off-topic question. How does the compiler know which piece of machine code to execute at a specific line in your program? Answering this question will help us understand what virtuality really means.

Imagine we write object.print(). For the processor, it's not an abstract action but a command to jump to a certain address in memory with some function instructions. But where do we take that address from?

During compilation, each code line becomes one or more machine instructions, each receives a unique sequential address in memory. The same applies to functions. A compiler translates them into machine code and assigns the next available address. Stitching a function call in code to its actual memory address is called binding.

Depending on when the call address is bound to the function address, there are two scenarios.

1. Static/early binding.

By default, static (early) binding is used. A compiler rigidly "hardwires" a specific function address at the call place in code during compilation. This decision relies solely on the pointer or reference type, not the actual object in memory. For instance, when a compiler sees a Base* pointer, it chooses the method address from the base class. It's fast and requires no additional evaluations. So, the jump goes to an already known address. It's also efficient, but it makes the program inflexible at runtime.

2. Dynamic/late binding.

Unlike static binding, dynamic (late) binding postpones function selection until the program runs. In this case, a compiler can't insert a specific function address into the code in advance. Instead, it generates a special instruction: "At the moment of call, look inside the object, find the current address of the needed function, and only then jump." This provides tremendous flexibility: the program makes decisions on the fly based on which object is currently in front of it (a derived class or a base class), rather than on the pointer type we are using.

Now that we've covered the mechanics of binding, we can answer the question of what virtuality actually is. Virtuality is a mechanism that implements dynamic polymorphism based on late binding. By marking a method with the virtual keyword, we shift it from static binding to late binding. From this point on, the address of the function's entry point is no longer a compile-time constant but a variable whose value derives from the context of a specific object at runtime.

The C++ standard doesn't specify how virtuality must be implemented, but de facto it follows the rules of specific ABIs (Itanium ABI, MSVC ABI). The key components here are the vtable (Virtual Method Table) and vptr (Virtual Pointer).

Virtual functions

We have unpacked the concept of virtuality, but in practice the main tool for its implementation is the virtual function. Let's see how it works.

Here's a simple class hierarchy where we attempt to override a parent method in a derived class:

class Base
{
public:
  std::string_view NameClass() const {return "Base";}  
};

class Derived : public Base
{
public:
  std::string_view NameClass() const {return "Derived";}
};

Full code fragment

#include <iostream>
#include <format>
#include <string_view>

class Base
{
public:
  std::string_view NameClass() const {return "Base";}  
};

class Derived : public Base
{
public:
  std::string_view NameClass() const {return "Derived";}

};

int main()
{
  Derived derived {};
  Base& base {derived};

  std::cout << "=== Name class ===\n";
  std::cout << "Base has static type " << base.NameClass() <<"\n";
}

Program output

Compiler Explorer

=== Name class ===
Base has static type Base

The result may seem strange. We created a Derived object, but the program insists it's Base. Early bonding is to blame here. The compiler sees the Base& reference and determines the method call at build time. From its perspective, this is a safe and fast optimization—it doesn't have to check what type of object the reference points to while the program is running.

Let's move the decision from compile time to runtime by adding just one word—virtual:

class Base
{
public:
  virtual std::string_view NameClass() const {return "Base";}  
};

class Derived : public Base
{
public:
  std::string_view NameClass() const override {return "Derived";}
};

Full code fragment

#include <iostream>
#include <format>
#include <string_view>

class Base
{
public:
  virtual std::string_view NameClass() const {return "Base";}  
};

class Derived : public Base
{
public:
  std::string_view NameClass() const override {return "Derived";}

};
int main()
{
  Derived derived {};
  Base& base {derived};

  std::cout << "=== Name class ===\n";
  std::cout << "Base has static type " << base.NameClass() <<"\n";
}

Program output

Compiler Explorer

=== Name class===
Base has static type Derived

Seems like it worked out. Despite the reference, the program now sees the object real type in memory. Dynamic binding kicks in: the function address selection occurs at runtime.

Now we can move on to the definition. A virtual function is a class method that is called not based on a reference or pointer type, but on the actual type of the object in memory. The virtual keyword instructs the compiler to replace a direct function call with a dynamic dispatch. This way, a derived class can override the implementation of a base class while maintaining the same structure.

We'll focus on the pros and cons of virtual functions later, but first note: the C++ standard describes the expected behavior of virtual functions but leaves implementation details to compiler developers. But the vast majority of modern compilers (GCC, Clang, MSVC) use a mechanism that has become the default standard: virtual function tables.

Virtual function table (vtable)

A virtual table (vtable, virtual method table, dispatch table) is a static array of pointers that the compiler creates for each class that uses virtual functions (or inherits from such classes).

Let's start with a piece of theory. How does a vtable work?

The compiler creates the table once per class at compile time, not per object.
Each table entry points to an address of the most derived function available to that class. If a derived class overrides a function, its table stores its own version's address; if not—it stores the base class version's address.
Each class in the inheritance hierarchy receives its own unique vtable.
The table itself is only a static structure in memory. To ensure that a specific object knows which table to use, the compiler implicitly adds a hidden field—vptr—to every instance of the class. The constructor initializes it and links the object to its vtable. More on this later.
The compiler strictly determines the function order in the table at compile time. For example, if funcA() is listed first in the base class, it occupies the first index in all derived class tables. As a result, the program finds the desired address in constant time O(1) simply by adding the offset to the table address.
Besides function addresses, the vtable often contains a pointer to a structure with type information (Run-Time Type Information, RTTI). This ensures correct functioning of operators that check the actual object type directly during program execution.
If a polymorphic class is designed correctly, one of the entries in its vtable is always reserved for the destructor. This guarantees that deleting an object via a base class pointer calls the destructor chain of all derived classes to prevent memory leaks.

The chart below illustrates the process:

What have we got here? A memory-level architecture of dynamic polymorphism. The linking element is the vptr, which is a hidden 8-byte pointer at the beginning of the class that redirects calls to the vtable. In this table, the negative index stores type metadata (RTTI), while the zero index stores the address of the virtual destructor. The remaining entries contain physical addresses of the functions. This fixed order is crucial, as calling any method boils down to two operations: reading from and writing to memory.

When the code contains a call to a virtual function via a pointer or a reference to the base class, the following steps execute:

The program accesses a class instance and, through its hidden pointer, finds the corresponding virtual table for its actual type.
The program selects the required entry in the table, since the compiler already knows the function index.
The program extracts the function address from that entry.
The program performs an indirect call to the function at the found address.

Theory always looks bright and shiny, but now let's see how it works in practice:

class Base
{
public:
  virtual ~Base() {};
  virtual void func1() {}
  virtual void func2() {}
};

class Derived : public Base
{
public:
  void func1() override {}
};

class Derived1 : public Base
{
public:
  void func2() override {}
};

Here's a common scenario: we've designed a base interface called Base and created its subclasses. Each child class overrides some virtual functions. Logically everything is simple and clear, but let's look under the hood.

Note: all examples use the Clang compiler.

/* vtable has 3 entries: {
       [0] = ~Base((null)), 
       [2] = func1((null)), 
       [3] = func2((null)), 
    } */

At first glance it seems strange that the first index is skipped, but it's correct. The issue here is the destructor, which can be called in two different contexts:

a complete object destruction—deleting an object via delete;
a deleting destruction—deleting an object via a base class pointer.

The compiler must distinguish these two situations: it creates two entry points in the virtual table. That is why there is no first index—the destructor occupies it. Such behavior is the result of the Itanium ABI. Ordinary virtual functions follow two slots for a destructor.

If we add the -Xclang -fdump-record-layouts flag in Compiler Explorer, we get the following table output:

 vtable for Derived:
        .quad   0
        .quad   typeinfo for Derived
        .quad   Derived::~Derived() [base object destructor]
        .quad   Derived::~Derived() [deleting destructor]
        .quad   Derived::func1()
        .quad   Base::func2()

From the table we see that the destructor was created in two contexts.

Full code fragment

Compiler Explorer

#include <iostream>

class Base
{
public: 
  virtual ~Base() {};
  virtual void func1() {}
  virtual void func2() {}
};

class Derived : public Base
{
public:
  void func1() override {}
};

class Derived1 : public Base
{
public:
  void func2() override {}
};

Base bs;
Derived dr;
Derived1 dr1;

We've now figured out how the compiler constructs the table of functions and sets the indices in a static array. A virtual table is simply a passive data structure that exists as a single instance for the entire class. It's just stored in memory. For polymorphism to work, we need a virtual pointer.

Virtual pointer (vptr)

The vptr (virtual methods table pointer) is a hidden data member (a pointer) that the compiler automatically adds to any base class containing at least one virtual function. It points to the static table of function addresses (vtable) corresponding to the object specific type at runtime.

How the pointer works

Let's delve a bit into the mechanics. The vptr's operation forms the foundation of dynamic polymorphism in object-oriented languages.

class Base
{
public:
  virtualTable *vptr;
  virtual void func1() {}
  virtual void func2() {}
};

class Derived : public Base
{
public:
  void func1() override {}
};

class Derived1 : public Base
{
public:
  void func2() override {}
};

Let's take the same code and add the virtual pointer at the beginning. It follows three fundamental stages.

preparing infrastructure at compile time;
dynamic initialization during object construction;
dispatching calls in real time.

You can see our code represented in the following chart:

When compiling code with virtual functions, the compiler adds a hidden vptr pointer to each object, referencing the virtual function table. For the Base class, it creates a vtable with the addresses of its virtual methods. In derived classes (Derived, Derived1), the compiler replaces the addresses of overridden methods in the table while saving fixed indices for each method. This completes the first stage.

Now we create a class instance via calling:

Base* bs = new Derived();

We are launching a layer-by-layer initialization process that transforms memory into a polymorphic object. First, a block of memory is allocated to store the object data and vptr. The parent class constructor runs first and writes the address of its table into the pointer.

Then the control passes to the Derived constructor. It performs a key operation—it overwrites the pointer value, substituting the address of its own table:

vtable for Derived:
        .quad   0
        .quad   typeinfo for Derived
        .quad   Derived::~Derived() [base object destructor]
        .quad   Derived::~Derived() [deleting destructor]
        .quad   Derived::func1()
        .quad   Base::func2()

By the time all constructors are done, the object becomes stable. Its internal pointer is now firmly linked to the table of the lowest class in the hierarchy. This guarantees that any polymorphic call uses the method version corresponding to an object real type, not the pointer type. We can call this process dynamic initialization during object construction.

When bs->func1() is executed, a dynamic call dispatch mechanism (based on the principle of late binding) triggers. The compiler generates the code that ignores the static type of the Base* pointer and instead extracts the current pointer value from the object memory. It contains the address of the actual virtual table for the real dynamic type. The vtable is then indirectly accessed via the vptr. Using a fixed offset in that table, the program gets the address of the required method implementation. The processor jumps to that address, ensuring the call of the overridden method from Derived rather than the base implementation.

Full code fragment

Compiler Explorer

#include <iostream>

class Base
{
public:
  virtual ~Base() {};
  virtual void func1() {}
  virtual void func2() {}
};

class Derived: public Base
{
public:
  void func1() override {}
};

class Derived1: public Base
{
public:
  void func2() override {}
};

int main()
{
  Base* bs = new Derived();
  bs-> func1();
}

Now that we understand how the vptr enables polymorphism, we should look at its physical impact on the object. Since the pointer is a full-fledged field within the structure, its presence inevitably adjusts memory topology. Let's break down how introducing this pointer triggers alignment mechanisms and affects the structure final size in bytes.

The alignment and vptr

In modern 64-bit systems, the virtual pointer occupies 8 bytes. According to most ABI specifications, data must align to an address that is a multiple of its own size. So, the virtual pointer requires 8-byte alignment.

How does this affect offsets? Since the pointer field takes the first 8 bytes, we can't arbitrarily place any following field—the compiler must adhere to the alignment rules for each data type within the structure.

Let's recall how field placement works. If the vptr is followed by a type with a lower alignment requirement, such as char, it will occupy the next byte. However, if it is followed by a type that requires 4 or 8 bytes, the compiler will insert padding to align the address of the next field. Let's look at the example:

class Example
{
public:  
  virtual void func() {}
  char c;
};

What is its alignment? Mathematically the size would be 9 bytes, but there's a catch. The answer is 16 bytes.

*** Dumping AST Record Layout
         0 | class Example
         0 | (Example vtable pointer)
         8 |  char c
           | [sizeof=16, dsize=9, align=8,
           |  nvsize=9, nvalign=8]

The final size arithmetic is straightforward: 8 bytes for the pointer, 1 byte for data, and 7 bytes for final alignment. But to figure out how this object will behave within the inheritance hierarchy, we need to interpret the specification generated by the compiler. What do the terms dsize, nvsize, and nvalign mean? Let's break them down.

sizeof is the final object size in bytes.
dsize is the actual memory volume used by useful data. We put it together using 8 bytes of vptr and 1 byte of char. Basically, this is the raw size of the object state before the final alignment rules are applied.
align is the address alignment requirement for the object in memory. Since the most restrictive type in the class is an 8-byte pointer, the entire object is assigned an alignment attribute of 8.
nvsize is the size of the "non-virtual" part of the class. In the context of single inheritance, this refers to the amount of memory occupied by a class when it serves as the base class for another. In our example, it matches dsize because the hierarchy is simple.
nvalign is the alignment that a derived class must maintain when placing its own fields.

Armed with this solid foundation, we can now dive into a really dark topic—inheritance.

Inheritance and virtuality

The diamond problem

Look at the following example:

class Entity
{
public:
  int id;
  virtual void update();
};

class Movable: public Entity 
{
public:
  float velocity;
  void move() {}
};

class Renderable: public Entity
{
public:
  int textureId;
  void draw();
};

class Player: public Movable, public Renderable
{
public:
   char name[32];
};

At first glance, this appears logical and well-structured. Let's try using this class in code and see the result.

Creating a Player object and looking at its memory layout reveals something strange:

Player hero;
*** Dumping AST Record Layout
         0 | class Player
         0 |   class Movable (primary base)
         0 |     class Entity (primary base)
         0 |       (Entity vtable pointer)
         8 |       int id
        12 |     float velocity
        16 |   class Renderable (base)
        16 |     class Entity (primary base)
        16 |       (Entity vtable pointer)
        24 |       int id
        28 |     int textureId
        32 |   char[32] name
           | [sizeof=64, dsize=64, align=8,
           |  nvsize=64, nvalign=8]

Instead of carefully combining properties from all ancestors, the compiler literally "glues" two complete structures: Renderable and Movable. The diagram shows the class relationships:

The problem is that both parent classes already contain a full copy of the Entity base class. As a result, there are two independent Entity instances within a single Player object. Each has its own vptr and its own id field. From the perspective of binary structure, the object is redundant: it doesn't simply inherit functionality, but physically duplicates the state of the base class in different segments of its memory.

This structural issue pops up at the worst possible moment—when we try to simply access a player's ID:

hero.id = 1;

The compiler hits a dead end. It sees that the hero object has two paths to the id field:

via the movement branch (Movable);
via the rendering branch (Renderable).

In memory these fields exist separately from each other: the program can't guess which identifier we intend to modify. We end up with an object with an inconsistent internal state: one memory area allocated for Entity (via Movable) may store the value id = 1, while the second area (via Renderable) remains uninitialized or contain a different value. Since these are two physically distinct address-space locations, writing to one has no effect on the other. This situation is called the diamond problem.

Full code fragment

Compiler Explorer

#include <iostream>

class Entity
{
public:
  int id;
  virtual void update();
};

class Movable : public Entity 
{
public:
  float velocity;
  void move() {}
};

class Renderable : public Entity
{
public:
  int textureId;
  void draw();
};

class Player : public Movable, public Renderable
{
public:
  char name[32];
};

int main()
{
  Player hero;
  hero.id = 1;
}

To eliminate this redundancy and restore consistency, we should use virtual inheritance:

class Entity
{
public:
  int id;
  virtual void update()
  {
    std::cout << "Entity update, ID " << id << std::endl;
  }
};

class Movable : virtual public Entity 
{
public:
  float velocity;
  void move()
  {
    std::cout << "Moving with velocity " << velocity << std::endl;
  }
};

class Renderable : virtual public Entity
{
public:
  int textureId;
  void draw()
  {
    std::cout << "Drawing texture " << textureId << std::endl;
  }
};

class Player : public Movable, public Renderable
{
public:
  char name[32];
  void update() override 
  {
    std::cout << "Player " << name << " updating ... " << std::endl;
  }
};

In that case, the compiler changes the algorithm to construct the object. Instead of statically embedding Entity data into each branch, it allocates a single shared memory area for the base class. Now the final Player object will contain only one instance of id and one set of virtual methods regardless of the number of intermediate classes. This area is accessed via additional offset pointers, ensuring the object logical and physical unity.

Full code fragment

Compiler Explorer

#include <iostream>
#include <cstring>

class Entity
{
public:
  int id;
  virtual void update()
  {
    std::cout << "Entity update, ID " << id << std::endl;
  }
};

class Movable: virtual public Entity 
{
public:
  float velocity;
  void move()
  {
    std::cout << "Moving with velocity " << velocity << std::endl;
  }
};

class Renderable: virtual public Entity
{
public:
  int textureId;
  void draw()
  {
    std::cout << "Drawing texture " << textureId << std::endl;
  }
};

class Player: public Movable, public Renderable
{
public:
  char name[32];
  void update() override 
  {
    std::cout << "Player " << name << " updating ... " << std::endl;
  }
};

int main()
{
  Player hero;

  hero.id = 1;
  hero.velocity = 6.5f;
  hero.textureId = 1;
  std::strcpy(hero.name, "Tom");

  hero.update();
  hero.move();
  hero.draw();
}

Program output

Player Tom updating ... 
Moving with velocity 6.5
Drawing texture 1

Virtual inheritance mechanisms: vbptr, vbtable, VTT

Virtual inheritance disrupts the usual linear memory layout. The compiler once knew the exact address of the id field from the start of the object, but now that certainty is lost. The virtual base becomes a dynamic component: one day it's a part of Player, another day —a part of Movable. Its exact location in memory depends on a specific hierarchy in which the final object was built.

To avoid guessing addresses, the compiler introduces an indirect addressing mechanism via the vbptr (virtual base pointer) and vbtable (virtual base table). Each intermediate class that virtually inherits a base receives a hidden pointer—the vbptr. It serves as an entry point to a static offset table (vbtable) that the compiler generates for a specific type whose object contains this vbptr.

The table stores integer values that define the exact distance from the current vbptr position to the start of the parent's virtual base. Therefore, any access to a field transforms into the following algorithm:

read the address from the vbptr;
extract the needed offset from the vbtable;
add the current object address and the retrieved offset to get the final physical data address.

Now that we understand how the process goes, we can provide definitions.

vbptr (virtual base pointer) is a hidden system pointer that the compiler inserts in the layout of a derived class. It acts as a dynamic reference to an offset table, enabling an object to determine at runtime, where its virtual ancestor locates in the current memory configuration.

vbtable (virtual base table) is a static data structure that the compiler generates for each type that uses virtual inheritance. This structure is an array of integer values that specify the exact distance in bytes from the vbptr to the start of each virtual base subobject.

Regular polymorphism and virtual inheritance are often confused, here is a table to show the difference:

After reviewing the comparison table and understanding how individual pointers work, a question arises: how does this complex machinery spring into action? If the vtable and vbtable are static tables, then when creating an object, we need a mechanism that will correctly set all the pointers to their proper locations. In complex hierarchies with virtual inheritance, the VTT (Virtual Method Table Hierarchy Table) plays this role within the Itanium ABI.

A VTT is a data structure that can be informally described as a "table of tables."While a regular virtual table stores function addresses, a VTT stores the addresses of the virtual tables that are needed for a specific inheritance branch. The key technical task of the VTT is to ensure correct object behavior during that borderline moment when the base class constructor has already launched but the derived class constructor hasn't finished yet. Without this mechanism, calling a virtual function or accessing virtual base data from a constructor could result in a crash. It's because the object isn't fully built yet and its pointers may reference incorrect or empty areas.

The VTT operation follows this algorithm.

When the most derived class constructor is called, the compiler secretly passes it the address of the corresponding VTT as an argument.
The compiler reads the addresses of the initial and secondary virtual tables from the VTT and writes them to the vptr and vbptr of the current object.
When constructors of base classes are called, they receive not the entire VTT but only a pointer to its specific fragment for the relevant branch.
The base class uses the received fragment to temporarily adjust the object pointers ensuring that, for the duration of its constructor, the object behaves as an instance of that specific base class.

Now that we've figured out who's responsible for what, we can move on to alignment in the virtual environment.

The pitfalls of virtual inheritance

Pointer casting

Let's look at the challenges we might face with virtual inheritance. The first issue surfaces when we cast a pointer from a derived class to its base types. We are used to thinking of a pointer to an object as a static label that points to the start of a memory block. However, multiple and virtual inheritance introduce their own complications.

Look at this example:

struct Parent1
{
  int num;
  virtual ~Parent1() {}
};

struct Parent2
{
  int num2;
  virtual ~Parent2() {} 
};

struct Derived : Parent1, Parent2
{
  int res;
};

Here we have a classic scenario of multiple inheritance: two parents and one child. Let's output their addresses.

Full code fragment

Compiler Explorer

#include <iostream>
#include  <cstring>

struct Parent1
{
  int num;
  virtual ~Parent1() {}
};

struct Parent2
{
  int num2;
  virtual ~Parent2() {} 
};

struct Derived : Parent1, Parent2
{
  int res;
};

int main()
{
  Derived* d = new Derived();
  Parent1* p1 = d;
  Parent2* p2 = d;

  std::cout << "Derived address: " << d << std::endl;
  std::cout << "Parent1 address: " << p1 << std::endl;
  std::cout << "Parent2 address: " << p2 << std::endl;
}

Program output

Derived address: 0x55b3bf4b92b0
Parent1 address: 0x55b3bf4b92b0
Parent2 address: 0x55b3bf4b92c0

If we run this code, we'll see something strange: the addresses of Derived and Parent2 differ. Here we see a fundamental peculiarity: the same object can have different addresses depending on the type of pointer used to access it.

We expect that a pointer to an object always points to the start of the corresponding subobject in memory. Since Derived contains Parent1 and Parent2, they can't share the same location. The compiler places them one after another.

Thus, when we wrote Parent2* p2 = d; more than just bit copying occurred—the compiler performed a pointer adjustment. It took the address of the Derived start and added an offset so that p2 would point only to the beginning of the data related to Parent2. In ordinary multiple inheritance this offset is static, but when virtuality enters, the situation becomes dynamic.

The order of base classes in memory can change depending on the hierarchy.
The compiler can no longer add +16 bytes to the code.
The compiler generates code that accesses the vbtable, gets the current offset for the object type, and adds it to the address.

What's going on with the alignment here? It also makes its own changes:

*** Dumping AST Record Layout
         0 | struct Derived
         0 |   struct Parent1 (primary base)
         0 |     (Parent1 vtable pointer)
         8 |     int num
        16 |   struct Parent2 (base)
        16 |     (Parent2 vtable pointer)
        24 |     int num2
        28 |   int res
           | [sizeof=32, dsize=32, align=8,
           |  nvsize=32, nvalign=8]

Parent1 is located at the very beginning (at offset 0), but instead of the expected 4 bytes for int num, it takes up 16. The reason is the virtual destructor that forces the compiler to insert an 8-byte vptr and add 4 bytes of padding after the num field so that the next object starts at the correct address.

The second base, Parent2, starts exactly at byte 16—this is the very offset we've seen in the code (p2 != d). It also receives 8 bytes for its own virtual pointer and 4 bytes for data. The structure ends with the res field of the Derived class. The final object size is 32 bytes, even though the sum of useful data and pointers gives only 28. The extra 4 bytes are added at the end to comply with the align=8 rule. The entire object must be a multiple of its most stringent member—the 8-byte pointer.

Let's see what other surprises alignment may have in store.

Casting to void*

Let's go over a situation that often comes up when working with low-level code. We need to pass the complex Derived object to the callback function via a raw void* pointer. So, we pack it into an unaddressed container. Then, in the handler, we try to extract it back as the Parent2 base class. At first, it seems simple, but not when it comes to virtual inheritance. Let's complete the previous code fragment:


void process_callback(void* raw_data)
{
  Parent2* broken_p2 = (Parent2*)raw_data;
  std::cout << "--- The result of a void cast ---" << std::endl;
  std::cout << "Original void* address: " << raw_data << std::endl;
  std::cout << "Broken Parent2 address: " << broken_p2 << " (Not offset)" <<
std::endl;
}

Look at the program output.

Program output

--- The result of a void cast ---
Original void* address: 0x5baf237792b0
Broken Parent2 address: 0x5baf237792b0 (Not offset)

When we run this code, we'll see that broken_p2 points to the same address as the start of the entire object, even though the Parent2 data is offset in memory. void* completely "blinds" the compiler, erasing all information about subobjects' location. When attempting to restore Parent2* directly from void, the compiler incorrectly interprets that the data for this parent starts right at the void*address. In reality, Parent2 is separated from the object start by service pointers and alignment bytes.

As a result, broken_p2 becomes invalid: it ignores the necessary pointer adjustment, and any attempt to read a field returns garbage. The whole mechanism relies on precise byte calculations and technical padding. Casting through void* ignores these gaps, causing the program to read data with a shift.

How can we fix this? We need the compiler to see the correct offsets again. To do this, the pointer has to be restored in stages:

void process_callback(void* raw_data)
{
  Parent2* broken_p2 = (Parent2*)raw_data; 

  Derived* restored_d = (Derived*)raw_data;
  Parent2* safe_p2 = restored_d; 

  std::cout << "--- The result of a void cast ---" << std::endl;
  std::cout << "Original void* address: " << raw_data << std::endl;
  std::cout << "Broken Parent2 address: " << broken_p2 << " (Not offset)" <<
  std::endl;
  std::cout << "Safe Parent2 address: " << safe_p2 << " (Offset)" << std::endl;

}

Instead of attempting a direct leap from untyped memory straight to a base class, we should first return the pointer to its original full type—Derived*. At this stage, the compiler restores the memory layout context of the entire object. It again sees the boundaries of all subobjects, the presence of service pointers, and the current offset tables. Only then, upon the next cast to Parent2*, does the pointer adjustment mechanism activate. The compiler refers to the type metadata (including the vbtable in the case of virtual inheritance), takes alignment requirements into account, and calculates the final effective address of the needed data segment.

Full code fragment

Compiler Explorer

#include <iostream>
#include  <cstring>

struct Parent1
{
  int num;
  virtual ~Parent1() {}
};

struct Parent2
{
  int num2;
  virtual ~Parent2() {} 
};

struct Derived : Parent1, Parent2
{
  int res;
};

void process_callback(void* raw_data)
{
  Parent2* broken_p2 = (Parent2*)raw_data; 

  Derived* restored_d = (Derived*)raw_data;
  Parent2* safe_p2 = restored_d; 
  std::cout << "--- The result of a void cast ---" << std::endl;
  std::cout << "Original void* address: " << raw_data << std::endl;
  std::cout << "Broken Parent2 address: " << broken_p2 << " (Not offset)"
<<std::endl;
  std::cout << "Safe Parent2 address: " << safe_p2 << " (Offset)" << std::endl;

}

int main()
{
  Derived* d = new Derived();
  process_callback((void*)d);
  return 0;
}

Empty Base Optimization with virtual

In the first part we covered Empty Base Optimization (EBO). By the standard, the size of any object can't be zero, so even a completely empty class takes 1 byte in memory. In ordinary inheritance, the compiler can collapse that byte so that the empty base class doesn't bloat the derived class size. We do remember that. But as soon as virtuality enters the hierarchy, this optimization fails.

Here comes a spoiler alert: Clang worked wonders with optimization and managed to optimize everything. It places all empty classes at offset zero, effectively hiding them inside the vtable pointer. So, for this example, we set Clang aside and turn to MSVC to take a look at this code:

class Empty1 {};
class Empty2 {};
class Empty3 {};

class Root : virtual public Empty1
{
  int r;
};

class Root1 : virtual public Empty2
{
  int r1;
};

class Root2 : virtual public Empty3
{
  int r2;
};
class Base : virtual public Root, virtual public Root1, virtual public Root2
{
public:
  double X;
  char symbol;
  virtual void service() {}
};

Let's check the size of the Base class.

Program output

sizeof(Empty): 1 byte
sizeof(Base): 96 byte

We can put it like this:

Compiler Explorer layout

class Base  size(96):
  +---
 0  | {vfptr}
 8  | {vbptr}
16  | X
24  | symbol
    | <alignment member> (size=7)
    | <alignment member> (size=4)
    | <alignment member> (size=4)
  +---
  +--- (virtual base Empty1)
  +---
  +--- (virtual base Root)
32  | {vbptr}
40  | r
    | <alignment member> (size=4)
  +---
  +--- (virtual base Empty2)
  +---
  +--- (virtual base Root1)
56  | {vbptr}
64  | r1
    | <alignment member> (size=4)
  +---
  +--- (virtual base Empty3)
  +---
  +--- (virtual base Root2)
80  | {vbptr}
88  | r2
    | <alignment member> (size=4)
  +---

The object has bloated to 96 bytes, even though its useful payload barely reaches a third of that volume. At the object's start we see the standard "head": 8 bytes for the vfptr (pointer to the function table) and another 8 bytes for the vbptr (pointer to the base table). Then comes the data double X, which due to its size imposes strict 8-byte alignment on the entire structure. But the most interesting part begins after the char symbol field. Instead of packing data more tightly, the compiler inserts massive blocks of alignment members (technical voids of 7 and 4 bytes) to make sure that each following virtual base starts strictly on a clean eight-byte boundary.

The object turns into a chain of virtual databases (Root, Root1, Root2), each of them comes at a high cost. Instead of collapsing Emptyclasses as Clang did, MSVC allocates a separate vbptr for each navigation branch. As a result, each such section consumes 16 to 24 bytes: 8 bytes for the service pointer, 4–8 bytes for the data, and a required padding to maintain symmetry.

The net result is a structure where real variables literally drown in service pointers and gaps.

Full code fragment

Compiler Explorer

#include <iostream>
#include  <cstring>

class Empty1 {};
class Empty2 {};
class Empty3 {};

class Root : virtual public Empty1
{
  int r;
};

class Root1 : virtual public Empty2
{
  int r1;
};

class Root2 : virtual public Empty3
{
  int r2;
};
class Base : virtual public Root, virtual public Root1, virtual public Root2
{
public:
  double X;
  char symbol;
  virtual void service() {}
};

int main()
{
  std::cout << "sizeof(Empty): " << sizeof(Empty1) << " byte" << std::endl;
  std::cout << "sizeof(Base): " << sizeof(Base) << " byte" << std::endl;
  return 0;
}

Performance impact

Having covered theory and examples, we should discuss the cost of virtual inheritance—specifically, how it affects processor performance through cache misses.

We know that data location is crucial in system programming. The more tightly the fields are packed in memory, the higher the probability that the processor loads them into cache in a single fetch. Virtual inheritance, however, deliberately destroys this density. Since the virtual base subobject is forcibly placed at the very end of the layout, while its controlling vbptr locates at the beginning, the data of one logical object becomes split across different cache lines. The processor must repeatedly access main memory, causing execution delays.

On top of all that, there's also an issue of alignment. To ensure consistent pointer access, the compiler inserts extra empty bytes. We end up with an object with holes. When processing such objects, the processor is forced to use memory bus bandwidth to transfer unnecessary alignment bytes along with the useful data. As a result, the cache can't hold all the useful information.

If we work with large arrays of data, things worsen even more. In an ordinary class, fields lie in a dense block, and the processor reads them in a single linear pass. With virtual inheritance, this integrity gets lost: the base class data is moved to the very end of the structure, while only a pointer to the offset table (vbptr) remains at the beginning.

So, to access any database field, the processor must first access the vbptr, calculate the address using a table, and only then jump to the data itself at the end of the object. These constant computations and memory jumps between service pointers and scattered fields deprive the program of caching advantages and multiply array processing slowdowns.

Is virtuality worth it?

We see that using the word virtual can easily bloat our object memory footprint, confuse our code, and introduce various errors. Here comes the question: why do we really need it?

On the one hand, virtual inheritance and virtual functions are fundamental design tools. They enable designing flexible, extensible systems, and solve the diamond problem. In complex architectures, the convenience of polymorphism and clean hierarchies often outweighs the loss of a few dozen bytes. It helps us think in terms of abstractions rather than memory addresses.

On the other hand, architectural flexibility comes at a hardware cost. As we've seen, virtuality transforms compact structures into bulky objects with memory holes. Also, double indirect addressing via the vbtable can become a bottleneck in critical paths of game engines. Yet we wouldn't recommend abandoning this mechanism. In practice, context determines the choice.

If the logic inside a virtual function is complex and lengthy, the microscopic delay of looking up an address in a table becomes negligible and simply dissolves in the overall execution time. Moreover, when it comes to a vast number of derived classes, attempting to replace polymorphism with manual type checking via switch or if-else often proves slower than a direct table jump. Alternatives like CRTP, which promise free static typing, quickly turn into unmaintainable monsters in multiple inheritance scenarios.

In such hierarchies, standard virtuality remains the lesser evil, delivering clean code at an acceptable price. In the end, the choice between static and dynamic structure is always a search for balance.

Conclusion

So, we've come a long way and made sure that virtuality and alignment are an inseparable pair that dictate the physical structure of an object. Understanding these mechanisms transforms abstract classes into a precise architectural layout, where every byte and every offset falls under engineering control. This enables us to strike a balance between the flexibility of dynamic polymorphism and execution efficiency.

To ensure that your code remains under full control and that complex hierarchies create no hidden memory issues, you're welcome to use our static analyzer.

DEV Community

Silent foe or quiet ally: Brief guide to alignment in C++. Part 3

Introduction

What is virtuality?

Virtual functions

Virtual function table (vtable)

Virtual pointer (vptr)

How the pointer works

The alignment and vptr

Inheritance and virtuality

The diamond problem

Virtual inheritance mechanisms: vbptr, vbtable, VTT

The pitfalls of virtual inheritance

Pointer casting

Casting to void*

Empty Base Optimization with virtual

Performance impact

Is virtuality worth it?

Conclusion

Top comments (0)