DEV Community

Paul J. Lucas
Paul J. Lucas

Posted on • Edited on

Unions & std::variant in C++

Introduction

A union in C++ is almost like a union in C (so, if you haven’t read that article, you probably should), but there are a number of non-trivial differences to make it necessary to have a separate article for it in C++. Additionally, C++ offers a better alternative to a union.

Active Member

In C++ (but not C), a union has the concept of an active member. Specifically, whichever member of a union was the last one assigned to (or initialized) is the active member:

union value {
  long   i;
  double f;
  char   c;
  char  *s;
};

value v;             // no active member
v.i = 42;            // active member is 'i'
v.c = 'a';           // active member is now 'c'
Enter fullscreen mode Exit fullscreen mode

When the active member changes, the “lifetime” of the previously active member is said to have ended and the “lifetime” of the new active member is said to have begun.

Accessing any member other than the active member (which is the same as accessing any member either before its lifetime has begun or after it has ended) results in undefined behavior.

Some compilers still allow non-active member access as an extension; but if you want to write portable code, you shouldn’t do it, especially when there are legal alternatives — more later.

There is an exception; consider:

struct S1 { int i, j; };
struct S2 { int x, double r; };
union U { S1 s1; S2 s2; };

U u = { { 1, 2 } };  // active member is s1
cout << u.s2.x;      // OK: as if u.s1.i
Enter fullscreen mode Exit fullscreen mode

That is if a union has an active member of type struct S1, it’s OK to read any member m of type struct S2 provided that m is part of a common initial sequence of members of S1 and S2. For this example, since S2 starts with an int member just like S1 does, it’s OK to read x from s2 even though s1 is the active member.

Why does this exception exist? To continue to support code similar to the restricted class hierarchy example given in the C article.

Type Punning

As I mentioned in the other article:

Type punning is a technique to read or write an object as if it were of a type other than what it was declared as. Since this circumvents the type system, you really have to know what you’re doing. In C (but not C++), a union can be used for type punning.

Because a C++ union has the concept of an active member, a union can not be used for type punning in C++. The previous C example (repeated here) results in undefined behavior (UB) in C++:

uint32_t swap16of32( uint32_t n ) {
  union {
    uint32_t u32;
    uint16_t u16[2];
  } u = { n };
  uint16_t const t16 = u.u16[0];   // OK in C; UB in C++
  u.u16[0] = u.u16[1];
  u.u16[1] = t16;
  return u.u32;                    // OK in C; UB in C++
}
Enter fullscreen mode Exit fullscreen mode

So how do you do type punning legally in C++? There are three ways:

  1. Via memcpy().
  2. Via std::bit_cast()
  3. As of C++23, via std::start_lifetime_as().

You can use reinterpret_cast for type punning in a few specific cases, but not in the general case.

The memcpy() way is:

uint32_t swap16of32( uint32_t n ) {
  uint16_t u16[2];
  memcpy( u16, &n, sizeof(u16) );  // OK in C++
  std::swap( u16[0], u16[1] );
  memcpy( &n, u16, sizeof(n) );    // OK in C++
  return n;
}
Enter fullscreen mode Exit fullscreen mode

While the above makes sense, you might think this version is less efficient since it calls memcpy() twice. The thing is that memcpy() is kind of special in that the compiler “knows” about this pattern and does special optimization for it — enough such that the generated code is exactly the same as the C code.

While it’s nice that the compiler knows how to optimize this pattern, it seems kind of dumb to have to have such a special case; hence the aforementioned other ways.

The bit_cast() way is:

uint32_t swap16of32( uint32_t n ) {
  auto u16 = std::bit_cast<std::array<uint16_t,2>>( n );
  std::swap( u16[0], u16[1] );
  return std::bit_cast<uint32_t>( u16 );
}
Enter fullscreen mode Exit fullscreen mode

The use of std::array is necessary instead of using uint16_t[2] directly because functions can’t return arrays but they can return class objects that have an array as a member.

The start_lifetime_as() (or start_lifetime_as_array()) way is:

uint32_t swap16of32( uint32_t n ) {
  auto u16 = std::start_lifetime_as_array<uint16_t>( &n, 2 );
  std::swap( u16[0], u16[1] );
  return *std::start_lifetime_as<uint32_t>( u16 );
}
Enter fullscreen mode Exit fullscreen mode

What start_lifetime_as_array() in this case does is:

  1. Ends the lifetime of the uint32_t object n.
  2. Begins the lifetime of a uint16_t[2] array accessible via u16.

(At this point, accessing n would be undefined behavior since its lifetime has ended.) Then the two uint16_t array entries are swapped; then start_lifetime_as():

  1. Ends the lifetime of the uint16[2] array.
  2. Begins the lifetime of a uint32_t accessible via the pointer that is then dereferenced to read its value.

While this may seem rather persnickety, it allows you to express your intent to the compiler clearly so it can optimize (or not) accordingly.

Unions Containing Non-Trivial Objects

Originally in C++, you couldn’t put objects that had a non-trivial special member function (a constructor, assignment operator, or destructor) inside a union; as of C++11, however, you can:

union value {
  long        i;
  double      f;
  char        c;
  std::string s;

  value() { }
  ~value() { }
};
Enter fullscreen mode Exit fullscreen mode

However, once you have at least one such member, the caveats are:

  • The compiler will automatically delete the union’s default constructor and destructor, so you have to add them back explicitly yourself (even if they do nothing).
  • To access such a member, you have to call its constructor explicitly yourself to begin its lifetime beforehand.
  • You have to remember to call its destructor explicitly yourself to end its lifetime afterward.

For example:

value v;
new(&v.s) std::string{ "hello" };
// ...
v.s.~basic_string();
Enter fullscreen mode Exit fullscreen mode

Having to remember which is the active member and manually manage its lifetime is both bothersome and bug-prone. As of C++17, there is a better way.

std::variant: A Type-Safe Union

The std::variant class is a type-safe union. However, unlike a union, a variant:

  • Remembers which member is the active one.
  • Automatically calls constructors & destructors when necessary.

For example, here’s the previous union converted to a std::variant:

using value = std::variant<long,double,char,std::string>;
value v;      // active member defaults to first ('long')
v = 4.2;      // active member is now 'double'
v = "hello";  // active member is now 'std::string'
Enter fullscreen mode Exit fullscreen mode

To get the value out of a variant, you can use std::get():

auto s1 = std::get<std::string>( v );
auto s2 = std::get<3>( v );            // same
Enter fullscreen mode Exit fullscreen mode

where it takes either one of the types comprising the variant or the zero-based index of the type. However:

  • If you specify a type or index that is not the active member, then std::bad_variant_access is thrown.
  • If you specify a type that is not unique (a variant may contain the same type more than once) or you specify an index that is out of bounds, then this is considered ill-formed (which means your code is bad and you should fix it).

Alternatively, you can use std::get_if() to get a pointer to the value or nullptr if you requested an inactive member:

if ( auto *ps = std::get_if<std::string>( v ) )
  cout << "string = " << *ps << '\n';
else
  cerr << "not a string\n";
Enter fullscreen mode Exit fullscreen mode

If you simply want to inquire whether a variant’s active member is of a particular type without getting its value, you can use std::holds_alternative():

if ( std::holds_alternative<std::string>( v ) )
  // ...
Enter fullscreen mode Exit fullscreen mode

“Visiting”

Given a variant, you likely want to do different things depending on whichever type the variant holds. This is known as visiting. There are a few ways to do this:

  1. Via either an if-else chain of either std::get_if() or std::holds_alternative().
  2. Via a lambda and an if constexpr-else chain of std::is_same.
  3. Via the overload pattern.
  4. Via a visitor class.

if-else Chain

An if-else chain is fairly straightforward:

if ( auto *pl = std::get_if<long>( v ) )
  // ...
else if ( auto *pd = std::get_if<double>( v ) )
  // ...
else if ( auto *pc = std::get_if<char>( v ) )
  // ...
else if ( auto *ps = std::get_if<std::string>( v ) )
  // ...
Enter fullscreen mode Exit fullscreen mode

That is, you simply check for each type in turn.

if constexpr-else Chain

Alternatively, you can use a lambda with an if constexpr-else chain:

std::visit( []( auto &&t ) {
  using T = std::decay_t<decltype(t)>;
  if constexpr ( std::is_same_v<T,long> )
    // ...
  else if constexpr ( std::is_same_v<T,double> )
    // ...
  else if constexpr ( std::is_same_v<T,char> )
    // ...
  else if constexpr ( std::is_same_v<T,std::string>> )
    // ...
}, v );
Enter fullscreen mode Exit fullscreen mode

Unlike the if-else chain, the if constexpr-else chain is done at compile-time rather than run-time. At this point, you might ask:

If the type of the active member of v isn’t known until run-time, how can the if constexpr-else chain “know” what to do at compile-time?

First, for any lambda, the compiler synthesizes a custom class behind the scenes that has an overloaded operator() where executing the lambda calls the operator. For a lambda that has auto as a parameter type, the custom class is a template:

template<typename T>
struct __lambda {
  void operator()( T ) const { /* ... */ }
  // ...
};
Enter fullscreen mode Exit fullscreen mode

For each type T of the variant, the compiler instantiates __lambda<T> and its operator() performs the if constexpr at compile-time. At run-time, std::visit() determines which is the active member of v and calls the correct operator().

The && in auto&& is a forwarding reference, but that’s a story for another time.

The use of std::decay ensures you get the type you expect — see it for details.

Overload Pattern

Another common alternative allows you to eliminate both if-else chains by using a helper class:

template<class... Ts>                // 1
struct overload : Ts... {            // 2
  using Ts::operator()...;           // 3
};

template<class... Ts>
overload(Ts...) -> overload<Ts...>;  // 4
Enter fullscreen mode Exit fullscreen mode

The numbered comments correspond to the notes below:

  1. The ... declares a template parameter pack.
  2. The Ts... says that the overload class multiply inherits from one or more Ts (classes of type T).
  3. The ... is a parameter pack expansion. In this case, it means that for every class T that overload inherits from, its operator() is imported (and overloaded) into overload.
  4. This is a user-defined deduction guide that tells the compiler, given an invocation like overload{t1,t2} (where t1 is an object of type T1 and t2 is an object of type T2), instantiate a class overload<T1,T2>.

Given all that, we can now do:

std::visit( overload{
  []( long lv ) { /* ... */ },
  []( double dv ) { /* ... */ },
  []( char cv ) { /* ... */ },
  []( std::string const &sv ) { /* ... */ }
}, v );
Enter fullscreen mode Exit fullscreen mode

That is:

  1. Construct an overload object passing it a list of four lambdas, one for each type of the value variant.
  2. Each lambda will have caused the compiler to synthesize its own __lambda class behind the scenes (at compile-time).
  3. The overload object is then passed to std::visit() that will call the operator() of the lambda whose argument type matches the type of the active member of v.

In the case of our value variant, behind the scenes, the compiler would have instantiated a class like:

template<>
struct overload<long,double,char,std::string> :
  __lambda_long, __lambda_double, __lambda_char, __lambda_std_string {
  using __lambda_long::operator();
  using __lambda_double::operator();
  using __lambda_char::operator();
  using __lambda_std_string::operator();
};
Enter fullscreen mode Exit fullscreen mode

There’s nothing special about the class name overload; it’s just the one that’s commonly used for this technique. You can alternatively name it anything you want.

A Vistor Class

The thing with any of the previous ways is that you have to code each case at its point of use. If you do the same thing in more than one place in your code, it’s redundant.

Using a visitor class allows you to factor out the code into a class that can easily be used in multiple places. A visitor class has to have a set of operator() overloaded for each type comprising the variant:

struct value_visitor {
  void operator()( long lv ) const;
  void operator()( double dv ) const;
  void operator()( char cv ) const;
  void operator()( std::string const &sv ) const;
};
Enter fullscreen mode Exit fullscreen mode

Then to use the visitor, simply pass an instance of it to std::visit():

std::visit( value_visitor{}, v );
Enter fullscreen mode Exit fullscreen mode

Conclusion

Use of unions in C++ is certainly possible (as it has to be to be mostly backwards compatible with C), but has more restrictions than in C. In general, you should use std::variant instead of a union unless compatibility with C is required.

Top comments (0)