DEV Community

Cover image for Dissecting Hello World: Removing printf, libc, and main.
Rahul Shankar
Rahul Shankar

Posted on

Dissecting Hello World: Removing printf, libc, and main.

Quick reminder of printf

printf is a function that is used in c programs to display text on the screen. This is not the official definition, but I do not wish to pursue that definition. We use it by including the stdio.h library. For consistency, all program will follow the same output standard. It is one string literal "Hello world\n" being printed into the terminal

#include <stdio.h>

int main() {
  printf("Hello World!\n");
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Autopsy of printf:

For the compilation, I am also using the --save-temps flags to get access to the .s file. Below you can see file size and the output of the ldd command which shows the dynamically linked libraries(Ill call dll from now on)

~>ls -lh a.out 
-rwxrwxr-x 1 noob noob 16K Feb 13 09:43 a.out
~>ldd ./a.out
    linux-vdso.so.1 (0x00007513c2293000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007513c2000000)
    /lib64/ld-linux-x86-64.so.2 (0x00007513c2295000)
Enter fullscreen mode Exit fullscreen mode

As you can see there are 3 different objects, but the one we are concerned about is libc.so.6 which is what contains the definitions for printf. Ideally, the goal at the end of this is to not have that library.

For the sake of brevity, I'll say that the .s file is the assembly breakdown of the c program, this is what we will use to figure out what is being called under the hood of the program

.LC0:
    .string "Hello World!"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rax
    movq    %rax, %rdi
    call    puts@PLT
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
Enter fullscreen mode Exit fullscreen mode

There are 3 sections we need to pay attention to. One is LC0 which contains our string. For the scope of this blog, focus on the fact that our input string are stored here. The second section is main, specifically LFB0 and LFE0 which are labels meant for debugging to indicate where the Local Function B*egins and *E*nds. The *final part of this is the fact that in the assembly code, puts is being called. This is because the compiler read through the program and decided to use puts as it is more efficient. This will be something to be keep in mind.

Level 1: direct write function

Write is a standard c library function that writes data from a buffer to a file descriptor(not to it, but to the file referenced by it). In our case we will be writing to the standard input stream STDOUT_FILENO. However this requires the unistd.h headerfile. So We are not in the clear yet

#include <unistd.h>

int main() {
  write(STDOUT_FILENO, "Hello World!\n", sizeof("Hello World!\n") - 1);
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

Secondly, to solve the dll requirement we will be statically linking them to the executable. But the file size will increase as those helper libraries are part of the executable now.

~>ls -lh a.out 
-rwxrwxr-x 1 noob noob 767K Feb 13 09:50 a.out
~>ldd ./a.out 
    not a dynamic executable
Enter fullscreen mode Exit fullscreen mode

Im skipping the LC0 because it doesnt change for any of the other. The difference will be in the main label.

.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movl    $13, %edx
    leaq    .LC0(%rip), %rax
    movq    %rax, %rsi
    movl    $1, %edi
    call    write@PLT
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
Enter fullscreen mode Exit fullscreen mode

Firstly the compiler auto calculated the size of the string to 13 (remember compiler optmisation). But secondly, we have 3 arguments so each one of them is going into 3 separate register. The details about them are not significant now. and finally the write method is called as it is. From my understanding, this is because write is not treated as "smart" function so the compiler doesn't bother optimising it.

Level 2: Direct system call(noob c dev tries assembly edition):

Since we have access to the assembly code, we can try to write assembly instruction into the c file. The only thing you need to know now is that each assembly code block starts with asm, which indicate that the instruction in the block is assembly and volatile which informs the compiler not to touch this block and run as it is.

int main() {
  asm volatile("mov $1, %%rax\n\t"
               "mov $1, %%rdi\n\t"
               "mov %0, %%rsi\n\t"
               "mov $14, %%rdx\n\t"
               "syscall\n\t"
               :
               : "r"("Hello world!\n")
               : "%rax", "%rdi", "%rsi", "%rdx");
  return 0;
}
Enter fullscreen mode Exit fullscreen mode

In the code above, we can see that what we have essentially done is copy the commands in the .s file to the c program. But the more interesting thing is the "r" value being assigned with the string literal. In the register assignment above we mentioned one of the values as %0. This %0 is a placeholder. The r constraint tells GCC to pick a register, put the string address inside it, and replace %0 in our assembly code with that register's name (e.g., %rcx or %rsi).

.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rcx
#APP
# 2 "3-BadAssemblyPrintf.c" 1
    mov $1, %rax
    mov $1, %rdi
    mov %rcx, %rsi
    mov $14, %rdx
    syscall

# 0 "" 2
#NO_APP
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
Enter fullscreen mode Exit fullscreen mode

Checking the .s file. You will notice that now there is a section in main that starts with #APP and ends with #NO_APP. This will contain the assembly code copied word for word from the c file. But the more important thing to note is the absence of the call instruction and the in its place we have syscall indicating that we are in fact calling a system call.

~>ls -lh a.out 
-rwxrwxr-x 1 noob noob 767K Feb 13 09:59 a.out
Enter fullscreen mode Exit fullscreen mode

Now if we run the same static build command, you'll notice that the file size is still the same. This is because by default, gcc adds the standard library to all its executable. This is important to note, because if we try to compile by not including the library, we will get a warning saying no _start function found. and trying to execute the executable will result in a segfault

~>gcc 3-BadAssemblyPrintf.c --save-temps -static -nostdlib
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
~>./a.out 
Hello world!
Segmentation fault (core dumped)
Enter fullscreen mode Exit fullscreen mode

By default, the OS calls the _start function and the standard library provide this function and _start calls the main function. Alongside that it also provides the exit function to close the program. So to truly become free of any libraries, we have to firstly implement our own _start function

Level 3: Proper assembly level self contained binary that does what print f does

First the _start function's code is basically the main function's code, so we can replace its name. Easy enough. But doing so means that we have to implement our exit function as well. This is why our previous code ended in a segfault, because it tried to return to a memory address that did not exist.

void _start(void) {
  asm volatile("mov $1, %%rax\n\t"
               "mov $1, %%rdi\n\t"
               "mov %0, %%rsi\n\t"
               "mov $14, %%rdx\n\t"
               "syscall\n\t"
               :
               : "r"("Hello world!\n")
               : "%rax", "%rdi", "%rsi", "%rdx");
  asm volatile("mov $60, %%rax\n\t"
               "mov $0, %%rdi\n\t"
               "syscall\n\t" ::
                   : "%rax", "%rdi");
}
Enter fullscreen mode Exit fullscreen mode

You'll notice that there are 2 assembly code blocks here. I did this to basically recreate my original scripts effect(i.e) the printf statement and the return 0 statement are 2 different lines and no each code block represents that. The value 60 indicates the exit system call and 0 is the value being returned, so effectively calling exit(0).

_start:
.LFB0:
    .cfi_startproc
    endbr64
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rcx
#APP
# 3 "4-GoodAssemblyPrintf.c" 1
    mov $1, %rax
    mov $1, %rdi
    mov %rcx, %rsi
    mov $14, %rdx
    syscall

# 0 "" 2
# 11 "4-GoodAssemblyPrintf.c" 1
    mov $60, %rax
    mov $0, %rdi
    syscall

# 0 "" 2
#NO_APP
    nop
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
Enter fullscreen mode Exit fullscreen mode

Looking at the assembly code, we can see the similarity to the previous code we had, with 2 major differences. First is the exit snippet which is again copied word for word from the c code. Then we have 2 # statements, the same kind we had in the previous code snippet. The # lines are 'breadcrumbs' left by the compiler(rule of 3 cliche) for the assembler and debugger. They help map the generated assembly back to the original C source code during the build process, but they don't affect the final program."

~>ls -lh a.out 
-rwxrwxr-x 1 noob noob 9.1K Feb 13 10:41 a.out
Enter fullscreen mode Exit fullscreen mode

Thus I have successfully printed hello world without libraries,since the use of the -nostdlib flag and the file size is smaller. Now the executable only contains the code and some ELF headers. The printf function is no longer being used. While it was a challenge to do, it did teach me about how low level programs work and enough to instil interest to look into assembly code. Might be something I do in the future.

Here is the code to those interested: Github

References:

Video which inspired me

Top comments (0)