lostghost

Posted on Jul 1

Linux from the developer's perspective. Part 2 - Compilation and linking

#programming #linux

This blog is part of a series

How does a C program get compiled? For C-like languages, compilation involves four steps:

Preprocessing, compile-time metaprogramming
Compilation itself, translation of the source code to assembly
Assembling, turning assembly into machine code in an object file
Linking, turning the object file into an executable or a library

Of course, all these categories, except for linking, are to some degree arbitrary. Preprocessing is an anomaly, a language within a language, a crutch - it exposes the limited expressive power of the base language. Compilation is to a degree arbitrary, because you can embed assembly code into C code, which doesn't require compilation. Assembly is not actually assembly - it's Gnu Assembly, the universal assembly. Originally, the assembly language was described in the ISA Manual, and the manufacturer provided with it the assembler itself, which read and compiled the assembly - GNU Assembly is not that. It's a higher-level, universal assembler. Still, the mental framework of these four steps is a net positive, but past a point of experience, you can see gaps in the structure.

We already discussed the preprocessor in the previous blog, let's now turn our attention to compilation. Compile our test program like this:

[lostghost1@archlinux c]$ gcc -S main.c

Or rather, for more clean, unoptimized assembly:

[lostghost1@archlinux c]$ gcc -S -O0 -fno-asynchronous-unwind-tables -fno-unwind-tables -fno-ident -fno-stack-protector main.c

Resulting assembly with explanatory comments:

    .file   "main.c"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp                  # Prologue: save old base pointer
    movq    %rsp, %rbp            # Set new base pointer
    subq    $16, %rsp             # Allocate 16 bytes for local variables

    movl    %edi, -4(%rbp)        # Save argc (1st argument, int) at -4(%rbp)
    movq    %rsi, -16(%rbp)       # Save argv (2nd argument, char **) at -16(%rbp)

    cmpl    $1, -4(%rbp)          # Compare argc to 1
    jg      .L2                   # If argc > 1, jump to .L2 (print argument)

    movl    $1, %eax              # argc <= 1: set return value to 1
    jmp     .L3                   # Return

.L2:
    movq    -16(%rbp), %rax       # Load argv into %rax
    addq    $8, %rax              # Advance to argv[1] (first argument, skipping program name)
    movq    (%rax), %rax          # Dereference: load pointer to argument string
    movq    %rax, %rdi            # Move that pointer to %rdi (argument for puts)
    call    puts@PLT              # Print argv[1] with puts()
    movl    $0, %eax              # Set return value to 0

.L3:
    leave                         # Epilogue: restore frame pointer and stack
    ret                           # Return to caller

    .size   main, .-main
    .section    .note.GNU-stack,"",@progbits

As you can see, many C constructs translate into assembly directly. For example:

int a = 10, b = 20, c;
c = a + b;

Translates to:

mov eax, 10
mov ebx, 20
add eax, ebx
mov c, eax

Another example:

int arr[4] = {1, 2, 3, 4};
int *p = &arr[2];
*p = 99;

Translates to:

mov eax, [arr + 8]   ; access arr[2] (int, 4 bytes each)
mov dword ptr [arr + 8], 99

So in a way, C is just higher-level assembly. But in other ways, it isn't - some constructs don't have a translation, producing undefined behavior. Structs, enums and unions are higher-level datatypes, which don't have a direct assembly counterpart. Calling conventions vary between CPUs and OS'es. In fact, if you want to explore, how exactly does code translate into assembly - there is a really useful website for that, GodBolt.

After compilation comes assembly, which translates assembly code into machine code, for a given ISA. But it doesn't output just text - it outputs a binary image. Specifically, one in an ELF format.

But the resulting artifact is an object file, which isn't the final process image. It contains information about sections (.text, .data, .bss) and their contents (machine code, using section-relative addresses), as well as references to symbols imported from external libraries. However, machine code uses section-relative addresses - addresses based on offsets from start of sections. But because we don't yet know at which address these sections are loaded - so we can't run the program yet. What lays out the sections in memory, thus turning them into segments, is a linker - and it does so with a linker script. On Arch Linux, these are at /lib/ldscripts/.

Let's examine one. Take elf_x86_64.x.

OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", "elf64-x86-64") // self-explanatory
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start) // which symbol is the entry point to the executable
SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib64"); SEARCH_DIR("/usr/lib"); SEARCH_DIR("/usr/local/lib"); SEARCH_DIR("/usr/x86_64-pc-linux-gnu/lib"); // which directories to look for for libraries, while linking
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000));
  . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  /* Place the build-id as close to the ELF headers as possible.  This
     maximises the chance the build-id will be present in core files,
     which GDB can then use to locate the associated debuginfo file.  */
  .note.gnu.build-id  : { *(.note.gnu.build-id) }
  .interp         : { *(.interp) }
  .hash           : { *(.hash) }

This shows the mapping of sections into segments, starting at address 0x400000.
Let's now link the program manually

[lostghost1@archlinux c]$ gcc -c main.c 
[lostghost1@archlinux c]$ ld main.o --dynamic-linker /lib64/ld-linux-x86-64.so.2 /usr/lib/crt1.o -lc -o main
[lostghost1@archlinux c]$ ./main hello
hello

When invoking ld, our linker, we needed to specify the path to the dynamic loader (which is specified as --dynamic-linker - quite confusing), because we are compiling a dynamic and not a static executable - more on the distinction later. crt1.o is a special object file, part of the standard C library, which contains the entry point (the _start) symbol. -lc is libc, glibc in our case - alternatives such as musl libc exist.

Now let's inspect the binary:

[lostghost1@archlinux c]$ readelf -a main
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401060
  Start of program headers:          64 (bytes into file)
  Start of section headers:          13088 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           64 (bytes)
  Number of section headers:         24
  Section header string table index: 23

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .interp           PROGBITS         00000000004002e0  000002e0
       000000000000001c  0000000000000000   A       0     0     1
  [ 2] .hash             HASH             0000000000400300  00000300
       0000000000000018  0000000000000004   A       4     0     8
  [ 3] .gnu.hash         GNU_HASH         0000000000400318  00000318
       000000000000001c  0000000000000000   A       4     0     8
  [ 4] .dynsym           DYNSYM           0000000000400338  00000338
       0000000000000048  0000000000000018   A       5     1     8
  [ 5] .dynstr           STRTAB           0000000000400380  00000380
       0000000000000039  0000000000000000   A       0     0     1
  [ 6] .gnu.version      VERSYM           00000000004003ba  000003ba
       0000000000000006  0000000000000002   A       4     0     2
  [ 7] .gnu.version_r    VERNEED          00000000004003c0  000003c0
       0000000000000030  0000000000000000   A       5     1     8
  [ 8] .rela.dyn         RELA             00000000004003f0  000003f0
       0000000000000018  0000000000000018   A       4     0     8
  [ 9] .rela.plt         RELA             0000000000400408  00000408
       0000000000000018  0000000000000018  AI       4    18     8
  [10] .plt              PROGBITS         0000000000401000  00001000
       0000000000000020  0000000000000010  AX       0     0     16
  [11] .text             PROGBITS         0000000000401020  00001020
       0000000000000075  0000000000000000  AX       0     0     16
  [12] .rodata           PROGBITS         0000000000402000  00002000
       0000000000000004  0000000000000004  AM       0     0     4
  [13] .eh_frame         PROGBITS         0000000000402008  00002008
       0000000000000088  0000000000000000   A       0     0     8
  [14] .note.gnu.pr[...] NOTE             0000000000402090  00002090
       0000000000000040  0000000000000000   A       0     0     8
  [15] .note.ABI-tag     NOTE             00000000004020d0  000020d0
       0000000000000020  0000000000000000   A       0     0     4
  [16] .dynamic          DYNAMIC          0000000000403e60  00002e60
       0000000000000180  0000000000000010  WA       5     0     8
  [17] .got              PROGBITS         0000000000403fe0  00002fe0
       0000000000000008  0000000000000008  WA       0     0     8
  [18] .got.plt          PROGBITS         0000000000403fe8  00002fe8
       0000000000000020  0000000000000008  WA       0     0     8
  [19] .data             PROGBITS         0000000000404008  00003008
       0000000000000004  0000000000000000  WA       0     0     1
  [20] .comment          PROGBITS         0000000000000000  0000300c
       000000000000001b  0000000000000001  MS       0     0     1
  [21] .symtab           SYMTAB           0000000000000000  00003028
       0000000000000180  0000000000000018          22     5     8
  [22] .strtab           STRTAB           0000000000000000  000031a8
       00000000000000a6  0000000000000000           0     0     1
  [23] .shstrtab         STRTAB           0000000000000000  0000324e
       00000000000000cc  0000000000000000           0     0     1

We see that we still have the section headers - along with the program headers! Let's remove all of it, since we won't be debugging this executable:

[lostghost1@archlinux c]$ strip --strip-section-headers main
[lostghost1@archlinux c]$ readelf -a main
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401060
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           0 (bytes)
  Number of section headers:         0
  Section header string table index: 0

There are no sections in this file.

There are no section groups in this file.

Much better!

Now on the difference between static and dynamic executables. Object files that call out to external functions, produce unresolved symbols. They are resolved during linking - when the executable is laid out in program segments, the points where functions are called get replaced with jumps to the actual function addresses. This makes for a static executable. However, we can choose to postpone resolving the symbols - and resolve them at program start. Then, we will declare which libraries we need, and which symbols from them are needed - and at program start, the linker will run first, find those libraries, load them, and resolve the symbols. This makes for a dynamic executable.

Let's see which one our program is:

[lostghost1@archlinux c]$ ldd main
    linux-vdso.so.1 (0x00007ffedcd23000)
    libc.so.6 => /usr/lib/libc.so.6 (0x0000756dbada8000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x0000756dbafc0000)

Both libc and the loader are needed at runtime (linux-vdso is a special pseudo-library). That makes the executable dynamic.

Glibc shouldn't produce static executables. To compile one, install musl-libc:

[lostghost1@archlinux c]$ yay -S musl clang
[lostghost1@archlinux c]$ musl-clang --static main.c -o main
[lostghost1@archlinux c]$ ldd main
    not a dynamic executable
[lostghost1@archlinux c]$ ./main hello
hello

This executable has all its symbols resolved - no dynamic loader needed!

Lastly, let's touch upon compiling dynamic and static libraries themselves. A static library is just an archived object file:

[lostghost1@archlinux c]$ cat main.c
#include <stdio.h>
#include "sayhello.h"
int main(int argc, char** argv){
    sayhello();
    return 0;
}
[lostghost1@archlinux c]$ cat sayhello.h
#ifndef _SAYHELLO_H
#define _SAYHELLO_H
void sayhello();
#endif
[lostghost1@archlinux c]$ cat sayhello.c
#include <stdio.h>
void sayhello(){
    printf("Hello!\n");
}
[lostghost1@archlinux c]$ musl-clang -c sayhello.c
[lostghost1@archlinux c]$ musl-clang -c main.c
[lostghost1@archlinux c]$ ar q libsayhello.a sayhello.o
ar: creating libsayhello.a
[lostghost1@archlinux c]$ musl-clang --static main.o -L. -lsayhello -o main
[lostghost1@archlinux c]$ ldd main
    not a dynamic executable
[lostghost1@archlinux c]$ ./main 
Hello!

Here, -L. means "look in this directory", -lsayhello means "look for a file libsayhello.a" (.a because we specified --static, otherwise it would be .so).

As for a dynamic library:

[lostghost1@archlinux c]$ rm main
[lostghost1@archlinux c]$ gcc -shared sayhello.o -o libsayhello.so
[lostghost1@archlinux c]$ gcc main.o -L. -lsayhello -o main
[lostghost1@archlinux c]$ ldd main
    linux-vdso.so.1 (0x00007ffc384aa000)
    libsayhello.so => not found
    libc.so.6 => /usr/lib/libc.so.6 (0x000074e040a48000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x000074e040c65000)
[lostghost1@archlinux c]$ ./main 
./main: error while loading shared libraries: libsayhello.so: cannot open shared object file: No such file or directory
[lostghost1@archlinux c]$ LD_LIBRARY_PATH=. ./main 
Hello!

Typically we don't look in the current directory - neither for executables (which is why we have to specify ./ when running ./main), nor for libraries - this is for security reasons, so that we don't accidentally run what we didn't intend to. Which is why we have to resort to specifying the environment variable.

Of course, the shared library advertises it's exported symbol:

[lostghost1@archlinux c]$ readelf -a libsayhello.so
...
Symbol table '.dynsym' contains 7 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterT[...]
     2: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND [...]@GLIBC_2.2.5 (2)
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMC[...]
     5: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND [...]@GLIBC_2.2.5 (2)
     6: 0000000000001110    20 FUNC    GLOBAL DEFAULT   11 sayhello

And that's all I have to share, when it comes to compiling and linking a C program. In the next blog we will examine loading and running an ELF executable file. See ya then!

Top comments (1)

Roi Kadmon • Jul 5

Fantastic job, very in-depth.
I think that some people who program in C for the first time may not necessarily come from a low-level background of knowing Assembly, but from a higher-level program of programming in Python, so they may not necessarily be aware of what a CPU architecture is.
It could be helpful to give a general overview of a CPU architecture defining the instructions supported by a CPU "type" and the encoding of the instructions, and also explaining the prominent architectures:

The x86 / x86_64 architecture, which this article mostly references, being dominant in the desktop, server segment and supercomputers;
The ARM architecture, with its 32-bit variant prevailing in the embedded market, and its 64-bit variant dominant in the smartphone and recently laptop segments.