Paul J. Lucas

Posted on Jul 6

Unreachable: The Standard Function for Inserting Undefined Behavior

#cpp #c

Introduction

Both C23 and C++23 added the unreachable() and std::unreachable() functions, respectively, that insert undefined behavior into your program. (If you don’t know what undefined behavior is, you should definitely read that article first.)

Ordinarily, undefined behavior should be avoided at all costs. So why does unreachable() exist and when would you ever want to use it? Its name gives a hint: it tells both compilers and programmers that the line of code on which unreachable is “called” is never actually called. What use is that? It allows you to:

Suppress warnings.
Perform a bit of optimization.

It can do those things because the compiler is allowed to assume that undefined behavior never happens.

Suppressing Warnings

It’s probably easiest to explain how unreachable() can be used to suppress warnings by way of example. The C Exception library includes the following:

typedef void (*cx_terminate_handler_t)( cx_exception_t const* );

static cx_terminate_handler_t cx_impl_terminate_handler;

That is, it declares a pointer-to-function type for a “terminate handler” and a global such pointer allowing the library user to install a custom function to be called upon encountering an unrecoverable situation, e.g., invoking throw without any active try. The library calls the handler via:

[[noreturn]] static void cx_terminate( void ) {
  (*cx_impl_terminate_handler)( &cx_impl_exception );
  unreachable();
}

If unreachable() were not there, you’d get a warning like:

warning: function declared 'noreturn' should not return

The contract is that the terminate handler must terminate the program, i.e., not return. Hence, cx_terminate is declared [[noreturn]]. The problem is that [[noreturn]] can’t be part of a typedef, hence the compiler has no way to know that a function called via pointer-to-function won’t return. By inserting the unreachable(), you’re explicitly telling the compiler that code after the function is called is unreachable, hence cx_terminate will not return, hence there is no reason to warn.

A similar example is:

bool cx_impl_try_condition( cx_impl_try_block_t *tb ) {
  switch ( tb->state ) {
    case CX_IMPL_INIT:
      // ...
      return true;
    case CX_IMPL_CAUGHT:
      // ...
      [[fallthrough]];
    case CX_IMPL_TRY:
    case CX_IMPL_THROWN:
      // ...
      return true;
    case CX_IMPL_FINALLY:
      // ...
      return false;
  }
  unreachable();
}

The code for the individual cases isn’t important here; what is important is that the code switches on an enumeration, has a case for every enumeration constant, and every case ends with return.

If unreachable() were not there, some compilers will give you a warning like:

warning: control reaches end of non-void function

Such a compiler is being (overly?) cautious and considering the possibility that tb->state might contain a value other than one of the declared enumeration constants, hence none of the cases will match, and the function will “fall out the bottom” and return without returning a value. By inserting the unreachable(), you’re reassuring the compiler that this can’t happen.

Note that it’s better to put unreachable() after the switch and not in a default case since a default prevents the compiler from catching unhandled enumeration constants.

Optimization

Similarly, it’s probably easiest to explain how unreachable() can be used to perform (small) optimizations by way of example. The cdecl program includes the following:

static bool c_ast_check_alignas( c_ast_t const *ast ) {
  if ( ast->align.kind == C_ALIGNAS_NONE )
    return true;

  // ... lots of code ...

  switch ( ast->align.kind ) {
    case C_ALIGNAS_NONE:
      unreachable();
    case C_ALIGNAS_BYTES:
      if ( !is_01_bit( ast->align.bytes ) ) {
        print_error( &ast->align.loc,
          "\"%u\": alignment must be a power of 2\n",
          ast->align.bytes
        );
        return false;
      }
      break;
    case C_ALIGNAS_SNAME:
      // nothing to do
      break;
    case C_ALIGNAS_TYPE:
      return c_ast_check( ast->align.type_ast );
  } // switch

  return true;
}

Briefly, the function checks that the semantics of an alignas are valid. The first if checks whether a declaration actually contains an alignas: if not, it returns immediately avoiding the “lots of code.” However, later on, a switch is done on the kind of alignment. Since C_ALIGNAS_NONE has already been checked for by the if, there’s no reason to include a case for it — except omitting it would cause the compiler to give you a warning like:

warning: enumeration value 'C_ALIGNAS_NONE' not handled in switch

If unreachable didn’t exist, you could instead do either of the following to suppress the warning:

    case C_ALIGNAS_NONE:
      abort();

or:

    case C_ALIGNAS_NONE:
      break;

The first (the comparison to C_ALIGNAS_NONE and the abort()) will still generate code to do the comparison and call abort — code that will never be executed.

The second (the comparison to C_ALIGNAS_NONE and the break) will both be optimized away by the compiler (which is good), but it doesn’t convey to programmers that C_ALIGNAS_NONE has already been accounted for.

Using unreachable() here is better than either of those alternatives since it generates no code and also conveys to programmers that the case is impossible.

Conclusion

In a few corner cases, unreachable is useful to tell both compilers and programmers that a particular code path is unreachable. For the compiler, this can suppress warnings and also optimize away code.

Top comments (4)

Alf P. Steinbach • Jul 6

Yep, but what's wrong with just for(;;){}. Works fine with C++17.

Paul J. Lucas • Jul 6 • Edited

It would work, sure, at least in the examples I've given. If you're someone in charge of implementing the standard C or C++ libraries, you might even do:

#define unreachable()   for (;;) { }

In programs, using unreachable() expresses your intent much better than doing for(;;){} explicitly.

However, the for implementation doesn't actually generate undefined behavior — which is what you want. Both gcc and msvc implement their own functions to generate undefined behavior, e.g., __builtin_unreachable(). Presumably, they wouldn't have gone to the trouble of implementing those functions unless it was better.

One example I found where it actually makes a small difference is given by this answer. If you replace the unreachable() with for(;;){}, you get different branch ordering and prediction. I've confirmed this.

In your codebase, if you want unreachable in C++17, you can always implement your own using the possible implementation given here.

Alf P. Steinbach • Jul 7

the for implementation doesn't actually generate undefined behavior

In C++23 and earlier it does.

Not sure about C++26. There is a paper, (isocpp.org/files/papers/P2809R3.html), whose title seems to indicate that it will no longer be UB in C++26. Which if so is sad, yet another case where the committee chooses to make things more complex and unreliable. :(

Instead of writing up an explanation of the UB I just quote the Google AI synopsis I got when I searched for the standard's wording - which in C++23 is in §intro.progress:

❞ In C++, an infinite loop that performs no observable behavior (such as I/O, volatile access, or atomic/synchronization operations) can lead to undefined behavior. This applies to "empty loops" like for(;;); or while(true); without any code inside the loop body that interacts with the outside world in a way defined by the C++ standard as observable.
The C++ standard includes a "forward progress guarantee" which states that a thread must eventually do something observable, such as terminate, call an I/O function, access a volatile object, or perform an atomic/synchronization operation. If a loop continues indefinitely without performing any of these actions, the implementation is allowed to assume it will eventually terminate or perform an observable action.
This allows compilers to perform aggressive optimizations, including removing such empty infinite loops entirely, as they are considered to violate the forward progress guarantee. The consequence of this optimization can be unexpected behavior, as the compiler might deduce that code following the "infinite" loop is unreachable and optimize it away as well.
It is important to note that this applies specifically to trivial infinite loops that lack observable side effects. If an infinite loop performs any of the actions listed in the forward progress guarantee (e.g., printing to console, reading from a file, modifying a volatile variable), it does not fall under this undefined behavior category.

Paul J. Lucas • Jul 7

Ah, OK: C and C++ diverge on how infinite loops are handled.

That aside, the point is to use unreachable regardless of how it's defined under the hood. If you want to define it using for, go ahead; but there are alternatives.