Unicorn Developer

Posted on Jan 16

How do exceptions work in C++ on Linux?

#cpp #linux #programming #development

Our treasured language lets us leverage powerful tools and guard us from countless implementation details. Although exceptions have a bad name among many developers, a detailed analysis of how they work can greatly expand your understanding of how C++ really works. That's what we'll do!

Before diving into the depths of exceptions let pick a right angle to how they could possibly work. In general, there are two options: the landing pad or via the frame handler.

Sorry, what? Indeed, we get off to a rough start. Let's try again.

We can create exceptions via tables and linked lists. Well... that doesn't help much either. Okay, once again.

The exception implementation may or may not follow the Itanium ABI. Damn, it's still unclear.

Yeah, folks, it's not so easy thing. Various platforms may implement exceptions in a different way. Even on the same platform, multiple approaches can coexist, and each has its own pros and cons. As it's said, "How do you eat an elephant (poor creature)? One bite at a time!" To keep from getting indigestion, we'll adhere to the same principle.

This article will highlight the Linux world. To avoid cluttering the internet with lengthy texts, we'll divide this article into several ones, each of which will be devoted to its own topic. At the beginning of the article, we'll add a table of contents with links to related sections. If it's not there, it means that we haven't made any new discoveries yet!

We'll discover other platforms in next articles when their time will come.

Each article would be accompanied with relevant code snippets scrutinized for details. Whenever possible, we'll base on the libcxx library from LLVM, and in other cases, on libstdc++ from GCC. To avert flooding with code snippets, we'll link directly to the relevant places in the repositories.

"Hey, do you know the internet is crammed with texts about exceptions?"

Yep, it's true. However, the author got the impression that most of these materials fall into two categories: overly technical descriptions that are more like specifications, or attempts—often quite successful ones—to manually implement exceptions from scratch.

Unfortunately, the author couldn't find a detailed examination of this mechanism with code references and its breadkown. Perhaps this series of articles will fill, what the author considers to be, a useful niche.

Well, let's roll?

101 for the impatient

Let's add a little teaser so you don't close this article too quickly. We'll quickly describe how exceptions are implemented on Linux—just a couple of sentences. Don't be afraid, we'll give a detailed description afterwards.

Okay, we can create exceptions via generating extra code that runs inside try ... catch blocks or via generating metadata. In both cases, generation occurs when the source code is translated into the assembly language. Thus, the try ... catch block is transformed into a set of data structures and function calls. Their specific form depends on the chosen implementation approach.

When the exception is thrown, the control flow "jumps" from one place in the program to another. What unpopular mechanism of our beloved language provides arbitrary control flow jumps to specific locations? That's right, it's the good old goto. However, it won't help us with jumps outside the function, so we'll have to use its steroid-taking stepbrother, setjmp/[longjmp](https://en.cppreference.com/w/c/program/longjmp.html).

Before jumping somewhere, we should ask ourselves: "Where to?" We wouldn't want to be in the shoes of the famous traveler in unknown places from "The Wizard of Oz", right? To answer the question, we create a linked list, each element of which stores data about the frame context, such as the state of registers. We can search for the required catch block by traversing this list.

This implementation is often called portable exceptions. Quite a fitting name, as the mechanism doesn't base on platform-specific tables. It instead depends on the setjmp/longjmp mechanism, which generally works in a similar way on all Unix-based platforms. We need to generate only the calls to the setjmp/longjmp functions—possibly even in the form of compiler intrinsics—and the mechanism implementing the linked list.

You might say, "Hey, I'll have to call these functions even when I don't throw exceptions!" Eh...yes? This surely increases the cost of program execution.

That's why the second approach—exceptions implemented via metadata—has become so widespread.

Metadata is commonly called exception handling tables, but the author prefers its former name. It better illustrates what is happening under the hood of this mechanism. The exception mechanism built on metadata is called zero-cost exception handling. Zero cost sounds tempting, doesn't it?

Instead of writing resource-intensive linked list, we create an exception handling table. We'll hardcode this table into stack frames that can handle exceptions or call destructors. All necessary data that has been previously placed in linked list nodes is now stored in these tables.

Zero-cost sounds riveting, following the control flow indicated by the exception throw involves extra overhead. At least, we need to call the destructors of stack variables because the C++ standard guarantees it. It won't be possible to catch a free ride and stop at the required station where the exception will be processed.

Still, throwing an exception is usually assumed to be, pardon the pun, an "exceptional" situation. What happens when the control flow follows the regular execution path? That's right, we don't spend extra resources on maintaining a linked list. "Don't pay for what we don't use" is an old motto among fans of our beloved language.

However, implementing exceptions through exception handling tables can lead to execution time pessimisation, even if exceptions are thrown rarely or not at all. This is a very important point that is very easy to overlook and even easier to mystify. We hope to return to the issue of pessimisation in other articles.

That's it for the brief overview of exception innards on Linux! It doesn't seem so scary. If you, dear reader, need more details (which we're sure is why you came here), let's keep going.

Innards

Let's start with the fact that exceptions are non-platform-dependent. They're compiler-dependent. Although the C++ standard clearly describes the rules for exceptions at the language level, language implementations are free to choose their own implementations of this mechanism.

Therefore, it seems reasonable to view the exception implementation as a layered system. So, what does our "little onion" consist of?

At the top, of course, sits the C++ standard that simply says, "Take such a beautiful syntax, write try ... catch, throw whatever you want—I guarantee you RAII and the order of destructor calls." How it works isn't important to the standard. What matters is what the end user, i.e. me, the developer, can and can't write!

Below the standard lies the so-called Itanium C++ ABI. We'll talk about why it's called that later. For now, just note that this layer consists of two others. The first one, let's call it C++-specific, directly translates constructs from the C++ standard. In turn, the second layer is responsible for finding the necessary catch block and calling the destructors of objects that are destroyed in the process. We can call this second layer language-independent—except for cases where language-specific constructs do occur there.

Now we need to find the place where the exception is delivered and call destructors during this delivery. The exact mechanism for that is called stack unwinding, when a certain system—in our case, the C++ runtime—scans through the low-level function representation (stack frames), going through them in reverse execution order. In the context of exceptions, this is implemented on Linux in two ways: DWARF and SJLJ. This is the last layer of our little onion.

So, shall we get into the details?

C++ Standard

Friends, we believe in you and in humanity and assume that since you've come to read about the language innards, you're already familiar with that language. We won't describe the basics of using exceptions here, but if it turns out to be necessary, we'll write about it in a separate article and provide a link. For now, we suggest you read the description on cppreference.

Itanium

It feels the heat! This layer is where the secret inner magic happens, driving the entire exception mechanism in C++ (and not only in this language). To avoid getting burned and burning out too quickly from the abundance of various stuff, we suggest moving forward gradually.

First, we'll look at the C++-specific layer responsible for throwing and catching exceptions. It's usually called the Itanium C++ ABI, cxxabi, or something like that. Next, move on to the language-independent ABI part, which allows us to find the necessary catch blocks, call destructors, and perform other useful tricks. It's often referred to as Base level Itanium ABI or Unwinder. It should be noted that formally this layer also belongs to the Itanium ABI.

What does Itanium have to do with this, huh? Well, I'm sorry, but it happens for no particular reason. It was one of the first 64-bit platforms developed by Intel and HP. Although AMD64 ultimately won in the battle of 64-bit architectures, the ABI specification and low-level C++ implementation were still created for Itanium. People appreciated it, and it caught on, with system-specific tweaks, of course. As a result, the Itanuim ABI became an established name for this kind of specification in the Linux world. Windows people have their own vibes though, as usual.

If you want to know more about the processor-dependent specification, you can read info here. A clear description of how exception handling tables work in HP's aC++ compiler is available here, and a more up-to-date explanation of how all this is currently implemented on Linux can be found here.

To avoid rewriting the ABI specification for no reason, let's take a small example and use it to gradually delve deeper into the innards of C++ runtime implementation.

During our journey, we'll learn new things, talk about them as they come in, and, at the end of each section, provide a summary.

Itanium C++ ABI

Let's meet our test subject:

int bar()
{
    throw -1;
}

int foo()
{
    try {
        return bar();
    }
    catch (...) {
        return -1;
    }
}

int main()
{
    return foo();
}

It contains:

a function that throws an exception;
a function that catches an exception;
the main function that runs our example.

From here on, we'll use this code snippet to drive our study.

Let's compile this! We use Clang 21.1.0 for x86-64 (the latest version at the time of writing):

Code under the cut

bar():
        push    rbp
        mov     rbp, rsp
        mov     edi, 4
        call    __cxa_allocate_exception@PLT
        mov     rdi, rax
        mov     dword ptr [rdi], -1
        mov     rsi, qword ptr [rip + typeinfo for int@GOTPCREL]
        xor     eax, eax
        mov     edx, eax
        call    __cxa_throw@PLT
foo():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        call    bar()
        mov     dword ptr [rbp - 24], eax
        jmp     .LBB1_1
.LBB1_1:
        mov     eax, dword ptr [rbp - 24]
        mov     dword ptr [rbp - 4], eax
        jmp     .LBB1_4
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rbp - 16], rcx
        mov     dword ptr [rbp - 20], eax
        mov     rdi, qword ptr [rbp - 16]
        call    __cxa_begin_catch@PLT
        mov     dword ptr [rbp - 4], -1
        call    __cxa_end_catch@PLT
.LBB1_4:
        mov     eax, dword ptr [rbp - 4]
        add     rsp, 32
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     dword ptr [rbp - 4], 0
        call    foo()
        add     rsp, 16
        pop     rbp
        ret
DW.ref.__gxx_personality_v0:
        .quad   __gxx_personality_v0

If we quickly skim through the code, we will spot some intriguing points. Our functions now include calls to new functions, typeinfo has appeared out of nowhere, and at the very bottom, we see the definition of some mysterious symbol: __gxx_personality_v0.

Let's focus on the bar function. In the C++ code, it actually serves one thing: it throws an exception using the throw -1 expression. In the assembly, that single expression unfolds into calls to __cxa_allocate_exception and __cxa_throw functions. We didn't write these functions, which means that the compiler knew something about them in advance. So, it's either intrinsics or library functions.

Two sources will help us figure this out: the C++ ABI specification and the C++ runtime source code. Personally, the author prefers to dig into libcxx from LLVM, so we'll use it as a basis.

Indeed, we can find definitions of these functions in the libcxx source code. The __cxa_allocate_exception function, as indicated by its name, allocates space on the heap for exceptions.

When we write throw MyException(42);, the compiler doesn't just copy the object somewhere and hope for the best. Since exceptions can be thrown in many functions, threads, and even languages, the runtime environment needs a reliable, self-contained object that holds both the exception and everything necessary to handle it.

Therefore, it's easier to provide __cxa_allocate_exception with only the size of the actual object, wait till it does something with it, and get a pointer to the memory region where the exception object will be written.

push    rbp
mov     rbp, rsp
mov     edi, 4
call    __cxa_allocate_exception@PLT

The client code—the one generated for us by the compiler—is responsible for constructing the object in that memory. But take a closer look: the function allocates far more space than is needed to store the exception object itself.

char *raw_buffer =
    (char *)__aligned_malloc_with_fallback(header_offset + actual_size);

This is necessary for several reasons.

Firstly, runtime will later need metadata about the thrown exception to recognize how to handle it—and the __cxa_exception structure is responsible for this. It contains data about the exception type (std::type_info), a pointer to the destructor (because the object needs to be destroyed at some point in the future), various counters, handlers, and other fun stuff. At the end of this structure, there is a certain _Unwind_Exception. For now, let's pretend we don't see it.

Secondly, the function handles the platform-specific alignment rules. Some processors are very picky: objects must start at addresses that are multiples of 8, 16, or more. The function rounds the total size up so that the exception object is correctly aligned.

As a result, the complete structure of the exception looks as follows:

__cxa_exception
Unwind_Exception 
thrown object (int in our case)

If memory allocation fails, hello std::terminate! But if everything goes according to plan, all allocated memory will be zeroed. After that, the client is returned a pointer to the location where the exception object should be created. It's important that the returned pointer isn't the beginning of the allocated block, but points to the exact location where the exception object should be located. The __cxa_exception header is located in memory immediately before it.

By the way, here's a fun fact: libstdc++ (a library from GCC) can throw exceptions even when heap memory is exhausted thanks to the epic implementation of the arena pool allocator. If you're curious about high-quality, performance-oriented code, it's worth a look—you won't regret it!

Alright, we get the place for our exception. What's next? Let's take another look at the generated code:

call    __cxa_allocate_exception@PLT
mov     rdi, rax
mov     dword ptr [rdi], -1
mov     rsi, qword ptr [rip + typeinfo for int@GOTPCREL]
xor     eax, eax
mov     edx, eax
call    __cxa_throw@PLT

We see a call to the __cxa_throw function. The code shows how we form three arguments before calling it. First, we write our exception object to the space allocated for it. Then we obtain a pointer to typeinfo. When handling exceptions, we need to know their type—and what could be better than old but gold RTTI! The __cxa_throw also has a third argument, which in our case is zeroed. This argument is a pointer to the exception destructor. We can't use the operator delete because we don't initially create the object using the operator new, so we have to pass a pointer to the destructor. The built-in int type doesn't have a separate destructor, so there's nothing to pass.

The described signature matches the one we see in the source code. It uses the __cxa_eg_globals object, which is local to each thread and stores the stack of exceptions that have reached their catch block and the counter of exceptions that have not yet been processed. After that, __cxa_exception is initialized. An interesting point is the setting of the referenceCount data member. It's not described in the Itanium ABI specification, as it appeared much later—with the release of C++11 to support std::exception_ptr. At the end, one of two functions is called: _Unwind_SjLj_RaiseException or _Unwind_RaiseException.

We'll talk about them in more detail later, but for now, all we need to know is that they shouldn't return the control flow under any circumstances. If this happens, it means that something has gone terribly wrong, and it's a direct path to std::terminate.

Fun fact: all this means that we can throw out any object, so to speak, without exception. Well, you got the idea.

Great, we've figured out how exceptions are thrown. We don't yet know exactly how it'll be delivered, but we'll get there soon enough. For now, let's look at the second way we can interact with exceptions: catching them.

If we look at the assembly for the foo function, we immediately notice something strange: the code between the jmp .LBB1_4 command and the LBB1_4 label itself isn't executed.

        jmp     .LBB1_4
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rbp - 16], rcx
        mov     dword ptr [rbp - 20], eax
        mov     rdi, qword ptr [rbp - 16]
        call    __cxa_begin_catch@PLT
        mov     dword ptr [rbp - 4], -1
        call    __cxa_end_catch@PLT
.LBB1_4:

Indeed, it's skipped every time we execute the function sequentially from start to finish. If we do some mental gymnastics and completely remove this code from the assembly, we'll end up with the normal execution path for the foo function—calling the bar function and returning its value:

foo():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        call    bar()
        mov     dword ptr [rbp - 4], eax
        mov     eax, dword ptr [rbp - 4]
        add     rsp, 32
        pop     rbp
        ret

Everything will work as it should, except for one tiny detail: exceptions won't be handled in this implementation. Yeah, maybe because we cut out the implementation of our catch block! But how can we get into it if the normal execution path constantly jumps over it?

In general, we answer this question a little later, when we go even deeper into our layered structure of the exception mechanism. Now let's focus on what happens when we've already entered the catch block. Two new functions, __cxa_begin_catch and __cxa_end_catch, are called.

call    __cxa_begin_catch@PLT
; ....
call    __cxa_end_catch@PLT

The __cxa_begin_catch function takes a pointer as a parameter. Its code shows that this pointer is converted into a pointer to _Unwind_Exception. Where this pointer originally came from remains a mystery for now. Let's just believe that we have it. It's also worth noting that if we throw an exception more complex than just int (i.e., the catch block would catch an object with a copy constructor), another call to __cxa_get_exception_ptr would be added before __cxa_begin_catch. Let's leave the analysis of this behavior as homework.

First, __cxa_begin_catch attempts to retrieve the already known __cxa_exception by shifting relative to _Unwind_Exception. Earlier, we've seen how two exception handling structures lie in the memory next to the object. After that, the function increments the __cxa_exception.handlerCount counter—the counter of handlers where the exception is still located. Next, we get __cxa_eh_globals, add the current exception to the top of the stack, and decrease the number of exceptions that have not yet been caught. Remember, when throwing exceptions, we also worked with this structure and performed the reverse operation with the uncaughtExceptions data member.

Fun fact: this function can also handle exceptions from other languages, even though the C++ standard doesn't provide such functionality.

In both cases, the function returns a pointer to the exception that is originally thrown. The catch block follows this behavior, after which __cxa_end_catch is called. It deletes the exception and frees up the memory allocated for it. Moreover, this function includes functionality for handling rethrown exceptions. By the way, speaking of them...

Before we move on to examine the layer that delivers the exception to its catch block, let's pause for a moment in that very block. What happens if we rethrow the caught exception? Let's replace return -1; with throw; inside the catch block in our original code and translate it back into assembly:

Look at the assembly with rethrow:

foo():
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        call    bar()
        mov     dword ptr [rbp - 16], eax
        jmp     .LBB1_1
.LBB1_1:
        mov     eax, dword ptr [rbp - 16]
        add     rsp, 16
        pop     rbp
        ret
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rbp - 8], rcx
        mov     dword ptr [rbp - 12], eax
        mov     rdi, qword ptr [rbp - 8]
        call    __cxa_begin_catch@PLT
        call    __cxa_rethrow@PLT
        jmp     .LBB1_8
        mov     rcx, rax
        mov     eax, edx
        mov     qword ptr [rbp - 8], rcx
        mov     dword ptr [rbp - 12], eax
        call    __cxa_end_catch@PLT
        jmp     .LBB1_5
.LBB1_5:
        jmp     .LBB1_6
.LBB1_6:
        mov     rdi, qword ptr [rbp - 8]
        call    _Unwind_Resume@PLT
        mov     rdi, rax
        call    __clang_call_terminate
.LBB1_8:

__clang_call_terminate:
        push    rbp
        mov     rbp, rsp
        call    __cxa_begin_catch@PLT
        call    std::terminate()@PLT

The assembly code for the foo function has become even longer. The call to the __cxa_rethrow function has appeared. Its purpose is to cancel the effect of __cxa_begin_catch and rethrow the exception.

mov     rcx, rax
mov     eax, edx
mov     qword ptr [rbp - 8], rcx
mov     dword ptr [rbp - 12], eax
mov     rdi, qword ptr [rbp - 8]
call    __cxa_begin_catch@PLT
call    __cxa_rethrow@PLT

In fact, it does the same thing as our old friend __cxa_throw—it throws an exception. The only difference is that we already have the memory allocated for it, so we just need to update already familiar __cxa_exception and __cxa_eh_globals objects. In the end, _Unwind_SjLj_RaiseException or _Unwind_RaiseException is still called. At the very end, if we somehow miraculously get there, std::terminate is called.

It would seem that after calling __cxa_rethrow in the assembly, there should be nothing, but here's the problem: there is a jump to the LBB1_8 label, followed by a fall through to the code under the __clang_call_terminate label that, in turn, calls std::terminate. This is probably just an additional safeguard from the compiler, so let's not fixate on it too long.

call    __cxa_rethrow@PLT
    jmp     .LBB1_8
    ;...
.LBB1_8:
__clang_call_terminate:
    push    rbp
    mov     rbp, rsp
    call    __cxa_begin_catch@PLT
    call    std::terminate()@PLT

There is some more code between LBB1_6 and LBB1_8. We end up there "somehow" in cases where the current function can't handle the exception, and the catch block has to be searched where this very function has been called from. The _Unwind_Resume function, which we won't discuss right now, is responsible for this. If it suddenly returns control (which should not happen), a jump to the __clang_call_terminate label occurs, and we already know what happens there.

.LBB1_6:
    mov     rdi, qword ptr [rbp - 8]
    call    _Unwind_Resume@PLT
    mov     rdi, rax
    call    __clang_call_terminate
.LBB1_8:

As we've already seen, the __cxa_end_catch function can distinguish not only between C++ exceptions and exceptions from external languages, but also between thrown and rethrown exceptions. All this ensures that nothing breaks during their handling, except for the developer's belief in a bright future.

Yes, it seems we need to recap a little.

We've examined the internal mechanics of exception handling in C++ using a simple code example. It includes the bar function, which throws an exception, the foo function, which catches it, and main, which calls foo. We've translated this code into the assembly using Clang 21.1.0 for x86-64. We've analyzed the assembly code and source code of libcxxabi and seen how the compiler and runtime implement exception throwing and catching logic based on the Itanium C++ ABI.

In bar, the exception throw is converted into two key calls:

__cxa_allocate_exception allocates memory on the heap not only for the exception itself (in our case, int), but also for the __cxa_exception header. This structure contains data about the thrown exception: std::type_info, a pointer to the destructor, counters, and _Unwind_Exception. Memory is aligned according to platform requirements and filled with zeros. If allocation fails, std::terminate is called;
In turn, __cxa_throw initializes __cxa_exception, updates the thread-local __cxa_eh_globals storage with the stack of caught exceptions and the counter of uncaught ones. It ends with a call to _Unwind_RaiseException or _Unwind_SjLj_RaiseException. If they return control, std::terminate is called.

We've also seen how exception handling unfolds in several calls:

__cxa_begin_catch obtains a pointer to _Unwind_Exception, increments the handler counter in __cxa_exception, adds the exception to the __cxa_eh_globals stack, and decrements the uncaught exception counter, then returns a pointer to the original exception for use in the catch block. To obtain non-trivial exception objects, a call to the __cxa_get_exception_ptr function may be added;
__cxa_end_catch decrements the handler counter, removes the exception from the __cxa_eh_globals stack, calls the destructor, and frees memory.

If the catch block contains the throw; expression, the current exception will be rethrown. In this case, the assembly calls the __cxa_rethrow function: it updates __cxa_exception and __cxa_eh_globals, and then calls the familiar _Unwind_RaiseException or _Unwind_SjLj_RaiseException. If there is no catch block, _Unwind_Resume is used to continue the process of searching for the required handler.

Itanium Unwind ABI

During our previous dive into the inner workings of C++ exception handling, we encountered several peculiarities, whether we intended to or not. We saw several functions and one structure with the _Unwind prefix. We still have questions about how exactly the control gets into the catch block. And what the heck is this __gxx_personality_v0 symbol that the author has completely ignored? Let's figure this out.

_Unwind_SjLj_RaiseException looks scary, so let's put it aside for now. Of the things we have already seen, we're left with _Unwind_Exception, _Unwind_RaiseException, and _Unwind_Resume. Let's deal with the first one.

The _Unwind_Exception structure serves several interesting purposes. First, the runtime needs to know what kind of exception it's dealing with—whether it's native or external. The C++ standard doesn't officially support catching exceptions from other languages, but the low-level mechanism is still capable of handling them. If the exception comes from another language, the structure includes an exception_cleanup data member, which contains a pointer to the exception cleanup function. It'll clear the memory allocated for such an external exception. Also, in _Unwind_Exception, there are two private data members allocated for the runtime needs. The specification doesn't say what they're supposed to be used for, but we'll see what LLVM does with them later on.

Let's move on to _Unwind_RaiseException. As we remember, it's called from __cxa_throw and __cxa_rethrow, and represents the main driver that finds the required catch block and delivers the exception to it. It has one parameter—a pointer to the intended _Unwind_Exception. Looking at the code, we can see that two big things happen there: the unwind_phase1 and unwind_phase2 functions are called. We remember that _Unwind_RaiseException shouldn't return execution. Apparently, after executing unwind_phase2, "we're not in Kansas anymore."

Also, at the very beginning of the function, we see the following code:

unw_context_t uc;
unw_cursor_t cursor;
__unw_getcontext(&uc);

This is a call to libunwind, a library responsible for stack unwinding. Stack unwinding is a process in which the runtime sequentially looks at the contents of stack frames. It starts with the very last frame, in our case, the frame of the _Unwind_RaiseException function, and then recursively goes through the frame of each function that has not yet returned control at the time of unwinding. The author assumes that the reader already knows what a stack frame is. If not, let's guess that stack unwinding is a process of looking at what's happening at a particular execution moment inside every function whose calls eventually led us to the current point.

Now look inside the unwind_phase1 function. Well, the amount of code explodes here, but we don't need to digest all of it, only the interesting parts. First, we see the declaration of the while (true) loop. In this loop, we move up the stack, as indicated by the int stepResult = __unw_step(cursor); line. We're interested in the declaration of the unw_proc_info_t frameInfo; variable.

The unw_proc_info_t structure carries data about the current function that is important for stack unwinding. There are pointers to the start and end addresses of the function, to something called language specific data area, and to the handler data member. Ultimately, executing the unwind_phase1 function boils down to calling this handler.

_Unwind_Personality_Fn p = get_handler_function(&frameInfo);
//...
_Unwind_Reason_Code personalityResult =(*p)(
    1, _UA_SEARCH_PHASE, exception_object->exception_class,
    exception_object, (struct _Unwind_Context *)(cursor));

Now we should remember that one of its arguments is _UA_SEARCH_PHASE, and it can return the following values:

_URC_HANDLER_FOUND: unwind_phase1 saves the stack pointer of the last viewed frame and returns control with a zero exit code;
_URC_CONTINUE_UNWIND: the while loop continues on the next frame;
or some other value, which causes unwind_phase1 to return an error code.

We can't say anything more at this point. Let's keep going!

And then goes unwind_phase2. Aside from extra security measures, for example, like using a shadow stack, this phase is similar to the first one. Once again, we see unw_proc_info_t and handler. This time, handler is called either with _UA_CLEANUP_PHASE or with _UA_CLEANUP_PHASE | _UA_HANDLER_FRAME. It happens if the unwinding has reached the frame that was successfully saved after unwind_phase1 execution.

After calling handler, we either continue unwinding the stack, return with an error (usually, this should not happen), or do one interesting trick. Please note what happens in the case(_URC_INSTALL_CONTEXT) block:

__unw_phase2_resume(cursor, framesWalked);
return _URC_FATAL_PHASE2_ERROR;

Here, we can see a call to __unw_phase2_resume, followed by the return of the _URC_FATAL_PHASE2_ERROR error. We put two and two together and conclude that __unw_phase2_resume shouldn't return control, and most likely, this is where the jump we've been searching for so long occurs!

Sorry, but __unw_phase2_resume is a macros. We know that nobody likes macros except those who like them. Be patient for a little while longer; it'll be over soon. The implementation of this macro depends on two things: whether the shadow stack is enabled and the platform. If the shadow stack is used, then, oh la la, we see the assembly language inserts for different platforms. These inserts contain instructions that execute the control flow jump.

If we don't need additional security bells and whistles, we simply call __unw_resume_with_frames_walked, which calls __unw_resume, which in turn calls AbstractUnwindCursor::jumpto. Interestingly, jumpto is a virtual function! The other day, while reading source code, the author was quite amused to find that even such a low-level library has virtual functions. We can encounter the implementation further down the code. Oh look, wow, there are templates!

From there, we reach the platform-dependent implementation (on the author's machine: it's x86-64), where __libunwind_Registers_x86_64_jumpto is called. This function no longer contains any assembly language inserts, as it's written entirely in the assembly. And there, the context of the target frame is actually restored, and the execution jumps right to it.

Well, there we have it, we've figured out how control flow reaches parts of our little program that would never be touched along the normal execution path! To do this, we descended all the way to the very bottom of the stack that supports exceptions' runtime.

Before we find out what this handler is, let's quickly take a look at the last function from the _Unwind_* family: _Unwind_Resume. Do you recall, how it appeared when we've been rethrowing exceptions in the catch block?

_Unwind_Resume looks very familiar.In many ways, it's similar to _Unwind_RaiseException, except that it skips the first phase and goes straight to the second one. This makes sense, since we've already found the required handler, so all that's left is to move from frame to frame until we reach it.

However, there's one nuance: in addition to the familiar unwind_phase2, the code also mentions a certain unwind_phase2_forced. What does it force, and how does it differ from the regular version?

We can cheat a little and look for other places where this unwind_phase2_forced is used. During our search, we'll inevitably stumble upon the _Unwind_ForcedUnwind function. It's very similar to _Unwind_RaiseException, but the first phase never occurs, and the second phase is handled specifically through unwind_phase2_forced.

If we look at the documentation for the implementation of the Itanium ABI specification, we can see the following example of how the _Unwind_ForcedUnwind function works:

The setjmp procedure saves the state for restoration (including the frame pointer) in its usual place. The longjmp_unwind procedure calls _Unwind_ForcedUnwind, passing it a stop function that compares the current frame pointer with the previously saved frame pointer.

This gives us a small clue about what's going on under the hood. The _Unwind_ForcedUnwind function is used where we need to unwind the stack, but we don't need to throw a classic C++ exception. Moreover, comments indicate that it's not used in C++ (author's note: at runtime). So, where is it used?

For example, it can be used when a thread is exiting. You can see this in the pthread implementation in the glibc library. The function's call is located here. A detailed analysis of nptl is far beyond the scope of this article, so we'll leave it at that.

Catch another interesting fact: GCC's libstdc++ uses a "special exception" called __forced_unwind for forced unwinding. This allows various structures in the vendor's C++ standard library to distinguish cases of forced stack unwinding from regular ones. In fact, no exception is thrown: the personality routine simply sets the corresponding typeid. As a result, neither the C++ exception itself nor __cxa_exception structure is created.

if (actions & _UA_FORCE_UNWIND)
{
    throw_type = &typeid(abi::__forced_unwind);
}
else if (foreign_exception)
{
    throw_type = &typeid(abi::__foreign_exception);
}

Wait wait wait... What is a personality routine?

Before we answer this question, let's do a quick recap. We've explored low-level exception handling mechanisms in C++ based on the Itanium ABI and focused on functions and structures from the _Unwind_* family.

We've examined the _Unwind_Exception structure, which allows us to distinguish between native and external exceptions and store runtime-specific data. We've seen how _Unwind_RaiseException triggers two phases of stack unwinding: the first phase to find a suitable handler and the second phase to fully unwind to the handler. We also figured out how to jump to the catch block using platform-dependent assembly code.

We've also looked at the _Unwind_Resume function for rethrowing exceptions and the _Unwind_ForcedUnwind function for forced stack unwinding without using C++ exceptions, which occurs, for example, when exiting a thread in pthreads.

Ultimately, our research led us to a new concept: the personality routine.

As a parting thought

Friends, we've read, read, read, 'til our eyes went red! It's time for a short break.

We still need to figure out what kind of beast this personality routine of yours is, how runtime determines whether it has entered the correct catch block, and how destructors are called. We've also completely skipped the _Unwind_SjLj* family of functions for now—and we haven't even touched topics from "101 for the impatient": what are these exception tables and SjLj lists?

So, in the best traditions of Middle Eastern folktales, we'll pause at the most interesting point and invite you to join us in the next article.

Meanwhile, as usual,_ El Psy Kongroo_.

DEV Community