After we have implemented function overloading in C in a previous post, we will now look what the effect of these methods look like in the compiled binary in terms of assembly. To do so we'll need to use a disassembler.
Disassemblers are tools used to analyze binary files and understand how software works at the machine code level.
They allow users to convert compiled code into assembly language, acting as microscopic view on a program, so that one can read and understand the instructions executed by a program.
This is particularly useful as it can reveal the inner workings of the code and help areas of performance improvement or algorithmic optimizations.
In this post, we will use IDA (Interactive Disassembler) to view the programs produced from our compiled code.
The image above depicts a loaded binary file in IDA. IDA does a lot of the heavy lifting for us as it can detect the architecture and system that the given file was built for.
We will start by loading the binary that was created from compiling the code that used _Generic
keyword.
We are currently inspecting the main function of the program as shown in the pane in the left hand side, the area of interest for us starts from the following instructions
mov [rbp+var_20], 2
movss xmm0, cs:dword_2050
mov [rbp+var_10], 4
This section shows our initial variables being loaded. If we look at the following set of instructions, we noticed there are function calls for double_int
, `double_float
and double_long
.
We can observe from this that the mapping between the function calls and the arguments was done by the compiler during compilation time.
This however is not the case for the code that uses clang's overloadable
attribute as we see in the following image:
We can see the three functions that we had seen in the _Generic
case, have been replaced with 3 functions all having the prefix _Z10double_num
following by i
, f
, l
respectively for each data type. This is known as Name mangling.
Name mangling, also referred to as function name decoration, is a technique that is leveraged by compilers to ensure that function names or symbols in an object file or executable are unique.
Name mangling is crucial for a programming language that supports functions or symbols having the same name but different arguments or types (i.e. function overloading).
Name mangling involves transforming the original function name into a new, unique name that includes additional information about the function's parameters and types. This new name is then used to create a symbol table, which is used by the linker to resolve function calls at runtime.
The specific rules for name mangling vary by compiler and programming language, but generally, the mangled name includes the original function name, a list of parameter types, and other information such as the namespace or class name.
For example, in C++, the function "add(int a, int b)" might be mangled to _Z10addii
on a system using the GNU C++ compiler.
This is mentioned in the documentationof clang.
We now move to the code that used varargs
to achieve function overloading
We can see here that all three function calls are to double_num
with the addition to the fact that a format specifier being passed along with the original value.
The code may look somewhat uniform in the main
function but if we examine the code of the function double_num
itself we see this large flow graph
While it may not actually be the case, the size of the function graph, as well as the fact of the additional checks being done, will most likely cause a performance impact which is often not desirable.
Hopefully, this post can be help whoever reads it to consider a different view of overloading and how it works under the hood.
References
https://itanium-cxx-abi.github.io/cxx-abi/abi.html
https://www.h-schmidt.net/FloatConverter/IEEE754.html
https://stackoverflow.com/questions/40157489/in-the-itanium-c-abi-why-does-the-mangled-name-for-template-functions-not-res
https://hex-rays.com/ida-free/
Attributes in Clang — Clang 17.0.0git documentation (llvm.org)
Top comments (0)