Introduction
A union
in C++ is almost like a union
in C (so, if you haven’t read that article, you probably should), but there are a number of non-trivial differences to make it necessary to have a separate article for it in C++. Additionally, C++ offers a better alternative to a union
.
Active Member
In C++ (but not C), a union
has the concept of an active member. Specifically, whichever member of a union
was the last one assigned to (or initialized) is the active member:
union value {
long i;
double f;
char c;
char *s;
};
value v; // no active member
v.i = 42; // active member is 'i'
v.c = 'a'; // active member is now 'c'
When the active member changes, the “lifetime” of the previously active member is said to have ended and the “lifetime” of the new active member is said to have begun.
Accessing any member other than the active member (which is the same as accessing any member either before its lifetime has begun or after it has ended) results in undefined behavior.
Some compilers still allow non-active member access as an extension; but if you want to write portable code, you shouldn’t do it, especially when there are legal alternatives — more later.
There is an exception; consider:
struct S1 { int i, j; };
struct S2 { int x, double r; };
union U { S1 s1; S2 s2; };
U u = { { 1, 2 } }; // active member is s1
cout << u.s2.x; // OK: as if u.s1.i
That is if a union
has an active member of type struct
S1
, it’s OK to read any member m of type struct
S2
provided that m is part of a common initial sequence of members of S1
and S2
. For this example, since S2
starts with an int
member just like S1
does, it’s OK to read x
from s2
even though s1
is the active member.
Why does this exception exist? To continue to support code similar to the restricted class hierarchy example given in the C article.
Type Punning
As I mentioned in the other article:
Type punning is a technique to read or write an object as if it were of a type other than what it was declared as. Since this circumvents the type system, you really have to know what you’re doing. In C (but not C++), a
union
can be used for type punning.
Because a C++ union
has the concept of an active member, a union
can not be used for type punning in C++. The previous C example (repeated here) results in undefined behavior (UB) in C++:
uint32_t swap16of32( uint32_t n ) {
union {
uint32_t u32;
uint16_t u16[2];
} u = { n };
uint16_t const t16 = u.u16[0]; // OK in C; UB in C++
u.u16[0] = u.u16[1];
u.u16[1] = t16;
return u.u32; // OK in C; UB in C++
}
So how do you do type punning legally in C++? There are three ways:
- Via
memcpy()
. - Via
std::bit_cast()
- As of C++23, via
std::start_lifetime_as()
.
You can use
reinterpret_cast
for type punning in a few specific cases, but not in the general case.
The memcpy()
way is:
uint32_t swap16of32( uint32_t n ) {
uint16_t u16[2];
memcpy( u16, &n, sizeof(u16) ); // OK in C++
std::swap( u16[0], u16[1] );
memcpy( &n, u16, sizeof(n) ); // OK in C++
return n;
}
While the above makes sense, you might think this version is less efficient since it calls memcpy()
twice. The thing is that memcpy()
is kind of special in that the compiler “knows” about this pattern and does special optimization for it — enough such that the generated code is exactly the same as the C code.
While it’s nice that the compiler knows how to optimize this pattern, it seems kind of dumb to have to have such a special case; hence the aforementioned other ways.
The bit_cast()
way is:
uint32_t swap16of32( uint32_t n ) {
auto u16 = std::bit_cast<std::array<uint16_t,2>>( n );
std::swap( u16[0], u16[1] );
return std::bit_cast<uint32_t>( u16 );
}
The use of std::array
is necessary instead of using uint16_t[2]
directly because functions can’t return arrays but they can return class objects that have an array as a member.
The start_lifetime_as()
(or start_lifetime_as_array()
) way is:
uint32_t swap16of32( uint32_t n ) {
auto u16 = std::start_lifetime_as_array<uint16_t>( &n, 2 );
std::swap( u16[0], u16[1] );
return *std::start_lifetime_as<uint32_t>( u16 );
}
What start_lifetime_as_array()
in this case does is:
- Ends the lifetime of the
uint32_t
objectn
. - Begins the lifetime of a
uint16_t[2]
array accessible viau16
.
(At this point, accessing n
would be undefined behavior since its lifetime has ended.) Then the two uint16_t
array entries are swapped; then start_lifetime_as()
:
- Ends the lifetime of the
uint16[2]
array. - Begins the lifetime of a
uint32_t
accessible via the pointer that is then dereferenced to read its value.
While this may seem rather persnickety, it allows you to express your intent to the compiler clearly so it can optimize (or not) accordingly.
Unions Containing Non-Trivial Objects
Originally in C++, you couldn’t put objects that had a non-trivial special member function (a constructor, assignment operator, or destructor) inside a union
; as of C++11, however, you can:
union value {
long i;
double f;
char c;
std::string s;
value() { }
~value() { }
};
However, once you have at least one such member, the caveats are:
- The compiler will automatically delete the
union
’s default constructor and destructor, so you have to add them back explicitly yourself (even if they do nothing). - To access such a member, you have to call its constructor explicitly yourself to begin its lifetime beforehand.
- You have to remember to call its destructor explicitly yourself to end its lifetime afterward.
For example:
value v;
new(&v.s) std::string{ "hello" };
// ...
v.s.~basic_string();
Having to remember which is the active member and manually manage its lifetime is both bothersome and bug-prone. As of C++17, there is a better way.
std::variant
: A Type-Safe Union
The std::variant
class is a type-safe union
. However, unlike a union
, a variant
:
- Remembers which member is the active one.
- Automatically calls constructors & destructors when necessary.
For example, here’s the previous union
converted to a std::variant
:
using value = std::variant<long,double,char,std::string>;
value v; // active member defaults to first ('long')
v = 4.2; // active member is now 'double'
v = "hello"; // active member is now 'std::string'
To get the value out of a variant, you can use std::get()
:
auto s1 = std::get<std::string>( v );
auto s2 = std::get<3>( v ); // same
where it takes either one of the types comprising the variant or the zero-based index of the type. However:
- If you specify a type or index that is not the active member, then
std::bad_variant_access
is thrown. - If you specify a type that is not unique (a
variant
may contain the same type more than once) or you specify an index that is out of bounds, then this is considered ill-formed (which means your code is bad and you should fix it).
Alternatively, you can use std::get_if()
to get a pointer to the value or nullptr
if you requested an inactive member:
if ( auto *ps = std::get_if<std::string>( v ) )
cout << "string = " << *ps << '\n';
else
cerr << "not a string\n";
If you simply want to inquire whether a variant’s active member is of a particular type without getting its value, you can use std::holds_alternative()
:
if ( std::holds_alternative<std::string>( v ) )
// ...
“Visiting”
Given a variant, you likely want to do different things depending on whichever type the variant holds. This is known as visiting. There are a few ways to do this:
- Via either an
if
-else
chain of eitherstd::get_if()
orstd::holds_alternative()
. - Via a lambda and an
if constexpr
-else
chain ofstd::is_same
. - Via the overload pattern.
- Via a visitor class.
if
-else
Chain
An if
-else
chain is fairly straightforward:
if ( auto *pl = std::get_if<long>( v ) )
// ...
else if ( auto *pd = std::get_if<double>( v ) )
// ...
else if ( auto *pc = std::get_if<char>( v ) )
// ...
else if ( auto *ps = std::get_if<std::string>( v ) )
// ...
That is, you simply check for each type in turn.
if constexpr
-else
Chain
Alternatively, you can use a lambda with an if constexpr
-else
chain:
std::visit( []( auto &&t ) {
using T = std::decay_t<decltype(t)>;
if constexpr ( std::is_same_v<T,long> )
// ...
else if constexpr ( std::is_same_v<T,double> )
// ...
else if constexpr ( std::is_same_v<T,char> )
// ...
else if constexpr ( std::is_same_v<T,std::string>> )
// ...
}, v );
Unlike the if
-else
chain, the if constexpr
-else
chain is done at compile-time rather than run-time. At this point, you might ask:
If the type of the active member of
v
isn’t known until run-time, how can theif constexpr
-else
chain “know” what to do at compile-time?
First, for any lambda, the compiler synthesizes a custom class behind the scenes that has an overloaded operator()
where executing the lambda calls the operator. For a lambda that has auto
as a parameter type, the custom class is a template
:
template<typename T>
struct __lambda {
void operator()( T ) const { /* ... */ }
// ...
};
For each type T of the variant
, the compiler instantiates __lambda<T>
and its operator()
performs the if constexpr
at compile-time. At run-time, std::visit()
determines which is the active member of v
and calls the correct operator()
.
The
&&
inauto&&
is a forwarding reference, but that’s a story for another time.The use of
std::decay
ensures you get the type you expect — see it for details.
Overload Pattern
Another common alternative allows you to eliminate both if
-else
chains by using a helper class:
template<class... Ts> // 1
struct overload : Ts... { // 2
using Ts::operator()...; // 3
};
template<class... Ts>
overload(Ts...) -> overload<Ts...>; // 4
The numbered comments correspond to the notes below:
- The
...
declares a template parameter pack. - The
Ts...
says that theoverload
class multiply inherits from one or moreTs
(classes of typeT
). - The
...
is a parameter pack expansion. In this case, it means that for every classT
thatoverload
inherits from, itsoperator()
is imported (and overloaded) intooverload
. - This is a user-defined deduction guide that tells the compiler, given an invocation like
overload{t1,t2}
(wheret1
is an object of typeT1
andt2
is an object of typeT2
), instantiate a classoverload<T1,T2>
.
Given all that, we can now do:
std::visit( overload{
[]( long lv ) { /* ... */ },
[]( double dv ) { /* ... */ },
[]( char cv ) { /* ... */ },
[]( std::string const &sv ) { /* ... */ }
}, v );
That is:
- Construct an
overload
object passing it a list of four lambdas, one for each type of thevalue
variant. - Each lambda will have caused the compiler to synthesize its own
__lambda
class behind the scenes (at compile-time). - The
overload
object is then passed tostd::visit()
that will call theoperator()
of the lambda whose argument type matches the type of the active member ofv
.
In the case of our value
variant, behind the scenes, the compiler would have instantiated a class like:
template<>
struct overload<long,double,char,std::string> :
__lambda_long, __lambda_double, __lambda_char, __lambda_std_string {
using __lambda_long::operator();
using __lambda_double::operator();
using __lambda_char::operator();
using __lambda_std_string::operator();
};
There’s nothing special about the class name
overload
; it’s just the one that’s commonly used for this technique. You can alternatively name it anything you want.
A Vistor Class
The thing with any of the previous ways is that you have to code each case at its point of use. If you do the same thing in more than one place in your code, it’s redundant.
Using a visitor class allows you to factor out the code into a class that can easily be used in multiple places. A visitor class has to have a set of operator()
overloaded for each type comprising the variant:
struct value_visitor {
void operator()( long lv ) const;
void operator()( double dv ) const;
void operator()( char cv ) const;
void operator()( std::string const &sv ) const;
};
Then to use the visitor, simply pass an instance of it to std::visit()
:
std::visit( value_visitor{}, v );
Conclusion
Use of union
s in C++ is certainly possible (as it has to be to be mostly backwards compatible with C), but has more restrictions than in C. In general, you should use std::variant
instead of a union
unless compatibility with C is required.
Top comments (0)