Quick reminder of printf
printf is a function that is used in c programs to display text on the screen. This is not the official definition, but I do not wish to pursue that definition. We use it by including the stdio.h library. For consistency, all program will follow the same output standard. It is one string literal "Hello world\n" being printed into the terminal
#include <stdio.h>
int main() {
printf("Hello World!\n");
return 0;
}
Autopsy of printf:
For the compilation, I am also using the --save-temps flags to get access to the .s file. Below you can see file size and the output of the ldd command which shows the dynamically linked libraries(Ill call dll from now on)
~>ls -lh a.out
-rwxrwxr-x 1 noob noob 16K Feb 13 09:43 a.out
~>ldd ./a.out
linux-vdso.so.1 (0x00007513c2293000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007513c2000000)
/lib64/ld-linux-x86-64.so.2 (0x00007513c2295000)
As you can see there are 3 different objects, but the one we are concerned about is libc.so.6 which is what contains the definitions for printf. Ideally, the goal at the end of this is to not have that library.
For the sake of brevity, I'll say that the .s file is the assembly breakdown of the c program, this is what we will use to figure out what is being called under the hood of the program
.LC0:
.string "Hello World!"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rax
movq %rax, %rdi
call puts@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
There are 3 sections we need to pay attention to. One is LC0 which contains our string. For the scope of this blog, focus on the fact that our input string are stored here. The second section is main, specifically LFB0 and LFE0 which are labels meant for debugging to indicate where the Local Function B*egins and *E*nds. The *final part of this is the fact that in the assembly code, puts is being called. This is because the compiler read through the program and decided to use puts as it is more efficient. This will be something to be keep in mind.
Level 1: direct write function
Write is a standard c library function that writes data from a buffer to a file descriptor(not to it, but to the file referenced by it). In our case we will be writing to the standard input stream STDOUT_FILENO. However this requires the unistd.h headerfile. So We are not in the clear yet
#include <unistd.h>
int main() {
write(STDOUT_FILENO, "Hello World!\n", sizeof("Hello World!\n") - 1);
return 0;
}
Secondly, to solve the dll requirement we will be statically linking them to the executable. But the file size will increase as those helper libraries are part of the executable now.
~>ls -lh a.out
-rwxrwxr-x 1 noob noob 767K Feb 13 09:50 a.out
~>ldd ./a.out
not a dynamic executable
Im skipping the LC0 because it doesnt change for any of the other. The difference will be in the main label.
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $13, %edx
leaq .LC0(%rip), %rax
movq %rax, %rsi
movl $1, %edi
call write@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
Firstly the compiler auto calculated the size of the string to 13 (remember compiler optmisation). But secondly, we have 3 arguments so each one of them is going into 3 separate register. The details about them are not significant now. and finally the write method is called as it is. From my understanding, this is because write is not treated as "smart" function so the compiler doesn't bother optimising it.
Level 2: Direct system call(noob c dev tries assembly edition):
Since we have access to the assembly code, we can try to write assembly instruction into the c file. The only thing you need to know now is that each assembly code block starts with asm, which indicate that the instruction in the block is assembly and volatile which informs the compiler not to touch this block and run as it is.
int main() {
asm volatile("mov $1, %%rax\n\t"
"mov $1, %%rdi\n\t"
"mov %0, %%rsi\n\t"
"mov $14, %%rdx\n\t"
"syscall\n\t"
:
: "r"("Hello world!\n")
: "%rax", "%rdi", "%rsi", "%rdx");
return 0;
}
In the code above, we can see that what we have essentially done is copy the commands in the .s file to the c program. But the more interesting thing is the "r" value being assigned with the string literal. In the register assignment above we mentioned one of the values as %0. This %0 is a placeholder. The r constraint tells GCC to pick a register, put the string address inside it, and replace %0 in our assembly code with that register's name (e.g., %rcx or %rsi).
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rcx
#APP
# 2 "3-BadAssemblyPrintf.c" 1
mov $1, %rax
mov $1, %rdi
mov %rcx, %rsi
mov $14, %rdx
syscall
# 0 "" 2
#NO_APP
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
Checking the .s file. You will notice that now there is a section in main that starts with #APP and ends with #NO_APP. This will contain the assembly code copied word for word from the c file. But the more important thing to note is the absence of the call instruction and the in its place we have syscall indicating that we are in fact calling a system call.
~>ls -lh a.out
-rwxrwxr-x 1 noob noob 767K Feb 13 09:59 a.out
Now if we run the same static build command, you'll notice that the file size is still the same. This is because by default, gcc adds the standard library to all its executable. This is important to note, because if we try to compile by not including the library, we will get a warning saying no _start function found. and trying to execute the executable will result in a segfault
~>gcc 3-BadAssemblyPrintf.c --save-temps -static -nostdlib
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
~>./a.out
Hello world!
Segmentation fault (core dumped)
By default, the OS calls the _start function and the standard library provide this function and _start calls the main function. Alongside that it also provides the exit function to close the program. So to truly become free of any libraries, we have to firstly implement our own _start function
Level 3: Proper assembly level self contained binary that does what print f does
First the _start function's code is basically the main function's code, so we can replace its name. Easy enough. But doing so means that we have to implement our exit function as well. This is why our previous code ended in a segfault, because it tried to return to a memory address that did not exist.
void _start(void) {
asm volatile("mov $1, %%rax\n\t"
"mov $1, %%rdi\n\t"
"mov %0, %%rsi\n\t"
"mov $14, %%rdx\n\t"
"syscall\n\t"
:
: "r"("Hello world!\n")
: "%rax", "%rdi", "%rsi", "%rdx");
asm volatile("mov $60, %%rax\n\t"
"mov $0, %%rdi\n\t"
"syscall\n\t" ::
: "%rax", "%rdi");
}
You'll notice that there are 2 assembly code blocks here. I did this to basically recreate my original scripts effect(i.e) the printf statement and the return 0 statement are 2 different lines and no each code block represents that. The value 60 indicates the exit system call and 0 is the value being returned, so effectively calling exit(0).
_start:
.LFB0:
.cfi_startproc
endbr64
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rcx
#APP
# 3 "4-GoodAssemblyPrintf.c" 1
mov $1, %rax
mov $1, %rdi
mov %rcx, %rsi
mov $14, %rdx
syscall
# 0 "" 2
# 11 "4-GoodAssemblyPrintf.c" 1
mov $60, %rax
mov $0, %rdi
syscall
# 0 "" 2
#NO_APP
nop
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
Looking at the assembly code, we can see the similarity to the previous code we had, with 2 major differences. First is the exit snippet which is again copied word for word from the c code. Then we have 2 # statements, the same kind we had in the previous code snippet. The # lines are 'breadcrumbs' left by the compiler(rule of 3 cliche) for the assembler and debugger. They help map the generated assembly back to the original C source code during the build process, but they don't affect the final program."
~>ls -lh a.out
-rwxrwxr-x 1 noob noob 9.1K Feb 13 10:41 a.out
Thus I have successfully printed hello world without libraries,since the use of the -nostdlib flag and the file size is smaller. Now the executable only contains the code and some ELF headers. The printf function is no longer being used. While it was a challenge to do, it did teach me about how low level programs work and enough to instil interest to look into assembly code. Might be something I do in the future.
Here is the code to those interested: Github
Top comments (0)