DEV Community

Aadhitya A for Kubernetes Community Days Chennai

Posted on • Updated on

How a C/C++ code is compiled in a system?

We may all used C or C++ language in our life, either as a product (eg: Games created using Unreal Engine) or even created/developed projects. Well, its most popular and it even exists today as its a foundation for many languages too like Python, Golang. But have you ever wondered how a C or C++ code is compiled in a system? Well, actually I also didn't know at first glance and later found a nice tip from my seniors about it. So let's get into it 😁

Image1

Tools needed?

Before knowing about how the stuff works, I recommend you to use GNU's GCC Compiler as it's widely used. You could also use LLVM Clang (which is better than GCC) to try it out too!

In Action!

Alright, to get things started, let's type a normal C++ Hello World code. You'd understand at end, why I started with simple one.

// Hello.cpp
#include <iostream>

using namespace std;

int main()
{
    cout << "Hello World! \n";
    return 0;
}

Enter fullscreen mode Exit fullscreen mode

Right, now let's compile the code using g++ Hello.cpp or clang++ Hello.cpp

If you run the output file called a.out (a.exein Windows), you'd get pretty straight output. No doubt.

Hello World!

Enter fullscreen mode Exit fullscreen mode

Now, to make things interesting, compile the code again but this time using -S flag

// For GCC
gcc -S Hello.c (for C)
g++ -S Hello.cpp (for C++)

// For Clang
clang++ -S Hello.cpp
Enter fullscreen mode Exit fullscreen mode

You'd get a file called Hello.s and if you read the contents, it's bit different than usual syntax and if you slightly read a bit more... It's all encoded in assembly format!

Here's an example (do note that it differs across compilers, computation speed, processor used and much more)

// From GCC (arrows are presented for better view)
    .file   "Hello.cpp"
    .text
    .section    .rodata
    .type   _ZStL19piecewise_construct, @object
    .size   _ZStL19piecewise_construct, 1
_ZStL19piecewise_construct:   <--- Initialize step
    .zero   1
    .local  _ZStL8__ioinit
    .comm   _ZStL8__ioinit,1,1
.LC0:                          <--- Which function to execute?
    .string "Hello World!"
    .text
    .globl  main
    .type   main, @function
main:                          <--- Main operation
.LFB1522:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rsi
    leaq    _ZSt4cout(%rip), %rdi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE1522:
    .size   main, .-main
    .type   _Z41__static_initialization_and_destruction_0ii, @function
_Z41__static_initialization_and_destruction_0ii:
.LFB2006:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    %edi, -4(%rbp)
    movl    %esi, -8(%rbp)
    cmpl    $1, -4(%rbp)
    jne .L5
    cmpl    $65535, -8(%rbp)
    jne .L5
    leaq    _ZStL8__ioinit(%rip), %rdi
    call    _ZNSt8ios_base4InitC1Ev@PLT
    leaq    __dso_handle(%rip), %rdx
    leaq    _ZStL8__ioinit(%rip), %rsi
    movq    _ZNSt8ios_base4InitD1Ev@GOTPCREL(%rip), %rax
    movq    %rax, %rdi
    call    __cxa_atexit@PLT
.L5:
    nop
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2006:
    .size   _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii
    .type   _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB2007:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $65535, %esi
    movl    $1, %edi
    call    _Z41__static_initialization_and_destruction_0ii
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE2007:
    .size   _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
    .section    .init_array,"aw"
    .align 8
    .quad   _GLOBAL__sub_I_main
    .hidden __dso_handle
    .ident  "GCC: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0"
    .section    .note.GNU-stack,"",@progbits
    .section    .note.gnu.property,"a"
    .align 8
    .long    1f - 0f
    .long    4f - 1f
    .long    5
0:
    .string  "GNU"
1:
    .align 8
    .long    0xc0000002
    .long    3f - 2f
2:
    .long    0x3
3:
    .align 8
4:
Enter fullscreen mode Exit fullscreen mode

You'd see a lot of code just for a Hello World program! (Sounds bit shocking right?! Yeah, I got it too) so let me just explain a bit of it to get it clear...

    .file   "Hello.cpp"
    .text
    .section    .rodata
    .type   _ZStL19piecewise_construct, @object
    .size   _ZStL19piecewise_construct, 1
Enter fullscreen mode Exit fullscreen mode

This section contains metadata for the C++ file and has some information about the file.

_ZStL19piecewise_construct:
    .zero   1
    .local  _ZStL8__ioinit
    .comm   _ZStL8__ioinit,1,1
.LC0:
    .string "Hello World!"
    .text
    .globl  main
    .type   main, @function
main:
.LFB1522:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rsi
    leaq    _ZSt4cout(%rip), %rdi
    call    
Enter fullscreen mode Exit fullscreen mode

In above code, the section _ZSt... denotes initialization of pointers and the main part of code is done on the .LC0 and main sections. LC0 has the information about the function and where to execute it (in this case, it's main function). In main section, you'd see a bit on handling the data and the pointers. Interesting fact is that, it's all encoded in assembly.

Now, let's see a bit complex one (don't panic xD). Addition of 2 numbers!

#include <iostream>

using namespace std;

int main()
{
    int a=5, b=10;
    cout << a+b << "\n";
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Compiling the program with -S flag, you'd get the assembly file. Let's just see the important details about it.

main:
.LFB1522:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $16, %rsp
    movl    $5, -8(%rbp)  <-- NOTE HERE
    movl    $10, -4(%rbp) <-- NOTE HERE
    movl    -8(%rbp), %edx
    movl    -4(%rbp), %eax
    addl    %edx, %eax
    movl    %eax, %esi
    leaq    _ZSt4cout(%rip), %rdi
    call    _ZNSolsEi@PLT
    leaq    .LC0(%rip), %rsi
    movq    %rax, %rdi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@PLT
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
Enter fullscreen mode Exit fullscreen mode

In the above main section, you can see that the registers are being used to store the values 5 and 10... Let's have a close look...

movl    $5, -8(%rbp)
movl    $10, -4(%rbp)
Enter fullscreen mode Exit fullscreen mode

So, the values are made to move towards the destination registers and eventually carry out the operation in further parts of assembly code

This is how generally C or C++ is compiled to assembly mode and I've just shown only the upper layer of the glacier (yes, it's bit deep!). I'll also share a blog post soon on how pointers are used in such case to access while running in assembly mode and much more. Until then, see ya! ;)

Oldest comments (4)

Collapse
 
pauljlucas profile image
Paul J. Lucas

I recommend you to use GNU's GCC Compiler as it's standard and widely used.

It's not standard. There's no standards body that standardizes compilers. Widely used? Yes. Standard? No.

cout << "Hello World!";

Missing \n.

... using -S flag ...

You really should check this out.

... do note that it differs across compilers used ...

It also differs on the platform, CPU, and optimization level used.

You also should really talk about name mangling.

Collapse
 
alphax86 profile image
Aadhitya A

Hey Paul, thanks for your comment and your tip. I'll recheck all again and edit the blog post accordingly. Thanks for your quick response and your tip

Collapse
 
tiagomelojuca profile image
Tiago Melo Jucá

"GCC is not standard"
Well, it's a lot compliant. It does has some faults. And probably the software we write too. It happens. There is not standards for compilers? This sounds exactly what the C++ standard should be. And this is why if you find something non compliant about GCC, you can just open a bug request.

Missing new line character for a low level article? Omg, seriously?

I do agree Compiler Explorer is great, but there is no problem at all in using -S flag for any explanation. A lot of blog posts do this, and just because we have a nice tool for studying compilers, doesnt mean -S is useless. Actually, to begginers, it's nice teach about this feature.

And yes, he could talk about name mangling. Also, he could talk about the ABI. Also, about parsing, generating the Abstract Syntax Tree, linking, dynamic/static linking differences... and it goes on. Finally, he could write a book and draw a dragon as cover, couldn't he? C'mon.

About differences across platforms, cpu, and opt level, when I read the statement was already there, but probably it was your tip. This was a good one.

Criticism is good, but sorry, in this case, most is just pointless. Are you trying to prove you're smart? I'm pretty sure you are (I'm not being ironic at all here). That said, there are better ways to achieve this :)

Collapse
 
pauljlucas profile image
Paul J. Lucas

There is not standards for compilers?

Correct.

This sounds exactly what the C++ standard should be.

No. Any language standard can only standardize the language itself, not compilers.

And this is why if you find something non compliant about GCC, you can just open a bug request.

True, but irrelevant. Compliance to a standard is not the same thing as a standard.

Suppose there is a standard for a wrench. A standards body wrote that standard. Manufacturer X can make X's wrench that conforms to that standard. Manufacturer Y can make Y's wrench that also conforms to that standard. However, the standards body does not standardize either X's or Y's wrench. Again, they write only the standard and do not standardize particular wrenches. A wrench can be standards-conforming, but that is not the same as being a standard itself.

Words matter, especially when it comes to computers. Words have precise meanings. Please learn what these words mean.

Missing new line character for a low level article? Omg, seriously?

So that gives you license to get it wrong? So, yes, seriously.

... but there is no problem at all in using -S flag for any explanation ...

I never said there was a problem. I'm saying that using Compiler Explorer is a lot better.

And yes, he could talk about name mangling. Also, he could talk about the ABI. Also, about parsing, generating the Abstract Syntax Tree, linking, dynamic/static linking differences... and it goes on. Finally, he could write a book and draw a dragon as cover, couldn't he? C'mon.

Of all those, name mangling is the relevant one since it would explain why names in the assembly output have different names. You obviously missed that point.

Criticism is good, but sorry, in this case, most is just pointless.

You're entitled to your opinion.