Ever wondered how printf("Hello world!")
work under the hood? in the high level languages we take formatted output for granted but what if you had to implement it from scratch in assembly? in this posts we are going dark and explore world of assembly.
Formatted output can be deceptively simple on the surface but surprisingly complex under the hood, even the simplest thing like printf("%d", 42)
lies parsing, variable argument handling, formatting, and all which managed by luxuries of standard library provided by C runtime.
Table of Content
Getting Started
Before we begin we should install our tools, at this time I will be using FASM and Linux (whatever is fine), FASM is pretty lightweight and easy to use but if you are using Windows it work seamlessly except for the syscall.
But what is syscall?
syscall is way of us developer to request a service from operating system, it can be hardware related, execution of process and other kind of thing, why do we have this? because for security, process or an app aren't allowed to access this kind of stuff freely, what kind of bad stuff it will happen right?
Interlude
Before we even begin writing we must need to define our intend to FASM since we are in Linux and we also wanted for 64 bit (I'm not in the mood for 32 bit) so we need to put this magic text on the first line of our main.asm
file.
format elf64 executable
Next is our entry point of our code, usually it start at _start
not main but we can define it whatever we want.
format elf64 executable
entry _start
But before we even start writing our code we must define our application layout this is where our code, constant data will live in the executable, these layout or segment also need additional parameter like is it allowed to be read or to be execute or to be write into.
We will define 3 of them, first one will be where to store our hello world string, the second one will be to store our global variable, and the third is where our code will live.
; ... stuff from before
segment readable
segment readable writable
segment readable executable
All right cool now we can start writing our code, but wait we don't have function in assembly.
That's right, to create a function we need to use our imagination by using label
. But what is Label? label is a way for us to create bookmark instead remembering the memory address in our code so we can reference it later like jumping to that specific part, as you can see we can use these label to create a function since we are just calling it and then go back.
; ...stuff from before
segment readable executable
_start:
If we try to compile it using fasm main.asm main
and then running it we can see that the output is terminated by signal SIGSEGV (Address boundary error)
basically we are outside our allowed access, But why? well since this is just empty executable file, the label _start
is at the doorstep of allowed access provided by our operating system.
But before we thinking about that we need to way to exit our program gracefully by opening these awesome resources, based on these we can see syscall exit, we can also see that rax
, rdi
, etc, these are our registers, a thing like fixed variable we can read and write inside our CPU.
Okay cool, now how do we exit?
First we need to set out rax
to 60
because that is our syscall exit is, and then set our rdi
to our exit code, for example 40
, how do we specify the register? by using mov
instruction to move our constant data into our register.
; ...stuff from before
_start:
mov rax, 60 ; our syscall exit
mov rdi, 40 ; exit code
syscall
If we try to compile it we will see nothing but if you are using shell like bash
you can type echo $?
to show the last exit code of previous program, it should output 40
.
But before we can do hello world we need to define our data first, we will define it inside our readable segment, we also need to use label to help reference it later.
; ...stuff
segment readable
helloWorld: db "Hello, World", 10, 0
helloWorldLen = $-helloWorld
; ...stuff
You may notice that there is helloWorldLen
we will need it when we call our syscall it will need our length of the string we want to print, the $-
is used to get the length of helloWorld
until helloWorldLen
, and there is also 10, 0 in the code, 10 is for next line and 0 for null terminator to specify that it's the end of the string. db
is a FASM instruction for define byte to allow us write anything with size of a byte into that label since our string is just ASCII we can confidently use byte.
Now, how to do "hello world" in assembly by using these constant variable? it actually simple, in the linux syscall there is thing called write
these thing is allow us to write in the file descriptor, the file descriptor we are interested is 1 because that is our terminal.
; ...stuff
segment readable executable
_start:
mov rax, 1 ; syscall write
mov rdi, 1 ; Stdout
mov rsi, helloWorld ; load address of helloWorld
mov rdx, helloWorldLen ; fetch data on the address of helloWorldLen
syscall
; ...our syscall exit
If you try to compile it it will output Hello World
, congratulation!
Basic Hello World Printf
Now come to the fun part writing basic hello world but printf. Firstly we need to rewrite our constant and our global variable to help us later down the line, but don't forget to remove our first hello world syscall too.
; ...stuff
segment readable
hello: db "Hello, %s", 10, 0
world: db "World", 0
segment readable writable
buffer: rb 1024
; ...stuff
buffer
global variable will be used on our printf
to copy the string before we print to the terminal, these allow us to just call 1 syscall, it's for optimization since syscall can be slow but in this case we are just lazy dev and ask the OS sparingly.
We also need label printf
too, we put it on top of the _start
and we use instruction call
to call our "function" or label.
; ...stuff
segment readable executable
printf:
_start:
call printf
mov rax, 60 ; syscall exit
mov rdi, 40 ; Exit code
syscall
If you try to compile these you will probably get SIGSEGV
because in assembly code is run top to bottom so if we call our printf
it will jump to printf
and since it's empty it will go down and run code since the below it is call printf
it will just recursively call itself and causing stack overflow (yo stack overflow mentioned), to return it to the caller we need use ret
instruction to return, but how it work under the hood?
There is called stack that stored our data (local variable), address, when we call
it will actually push the current address of execution to the stack and then jump to specified memory address, when we ret
it will pop the stack and use that as return address.
Okay so we just add ret
to our code.
; ...stuff
printf:
ret
; ...stuff
Voila, no SIGSEGV
. now we need to think what our function parameter should have, for starter let's start by using rax
as our pointer to the address we want to print. But you may notice that syscall write require us to provide the length too, we can loop the data on that address and find null terminator while doing that we can increment a counter and copy it into our buffer
in global variable.
; ...stuff
_start:
mov rax, hello
call printf
; ...stuff
But before all the fun stuff, how do we loop? well... there is no loop in assembly.
We can use label to specify the location and instead of using call we use jmp
instruction, something like this.
; ...other stuff
printf:
.iter_loop:
jmp .iter_loop ; [new stuff]
ret
; ...other stuff
But wait! if you try run this code it will loop indefinitely, so while we at it we need prepare our other register to act as counter, but we need to preserve it, why? Because maybe in other part of our code they might use them for something important and we don't want to trash them. But we don't want to store everything on global right? we can use the stack by using push
instruction to push data from register into stack just like how call
push return address and near at the end before ret
we need to pop
it to restore it.
; ...stuff
; Print stuff like C printf (but budget)
; @params rax - const char* (pointer to string)
printf:
push rdx ; [new stuff]
.iter_loop:
jmp .iter_loop
pop rdx ; [new stuff]
ret
; ...stuff
Now we can set up our counter register in peace, by using xor
instruction we can zero out our register value if we xor
it by itself.
; ...other stuff
printf:
push rdx
xor rdx, rdx ; [new stuff]
; ...stuff
Now after all of that we can use data from address inside rax
and compare it with 0
as it was our null terminator, and to fetch get data from address we use [address]
in mov
instruction, since our data is in byte we need to specify it using byte
on our byte [address]
and also use register that can only store byte (8 bit) since all of our register is 8 byte (64 bit), for this case let's trash register r10
8 bit variant r10b
.
; ...other stuff
printf:
push rdx
xor rdx, rdx
.iter_loop:
mov r10b, byte [rax] ; [new stuff]
jmp .iter_loop
; ...stuff
Wait now, we are not done yet we are going to compare if the r10b
is now with our 0
and then we jump out to label so we don't stuck in infinite loop, we can use cmp
instruction to compare between 2 value either it's register with register or with constant.
; ...stuff
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
; ...stuff
But after comparing we need to do with the result of comparison, the result is on flag register which we usually cannot access freely, by using other version of jmp
we can use it to create branch in our code since we want that value of r10b is equal we need to use je
for jump if equal.
; ...stuff
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
je .done
.done:
pop rdx
ret
; ...stuff
Now if we try to run this it will still loop forever because we didn't increment our pointer in rax
and also counter, we can use inc
instruction to increment it by one.
; ...stuff
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
je .done
inc rdx ; [new stuff]
inc rax ; [new stuff]
jmp .iter_loop
; ...stuff
If we try to run this now.... nothing happen and we do not loop forever, finally.
Next step is to copy the data to the buffer it's actually simple because we just reverse the mov
operation like in .iter_loop
but in reverse and store it into buffer
instead of rax
, for nice readability let's add .push_char
.
; ...stuff
printf:
push rdx
xor rdx, rdx
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
je .done
; [new stuff]
.push_char:
mov byte [buffer], r10b
inc rdx
inc rax
jmp .iter_loop
.done:
pop rdx
ret
; ...stuff
Now you may notice that we just keep updating value inside buffer
without incrementing the index, we can use rcx
to store the base pointer of buffer and then increment it when after the we push the char to the buffer.
; ...stuff
printf:
push rdx
push rcx ; [new stuff]
xor rdx, rdx
mov rcx, buffer ; [new stuff]
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
je .done
.push_char:
mov byte [rcx], r10b ; [new stuff]
inc rcx ; [new stuff]
inc rdx
inc rax
jmp .iter_loop
.done:
pop rcx ; [new stuff]
pop rdx
ret
; ...stuff
Now lastly we need to call our os to write the thing in buffer since the length is already stored inside rdx
we don't need to specify it manually
; ...stuff
.done:
; [new stuff]
mov rax, 1
mov rdi, 1
mov rsi, buffer
syscall
pop rcx
pop rdx
ret
; ...stuff
If you compile and run it Hello %s
should be printed on the terminal, congratulation! address inside rax
has successfully copied into buffer.
Variable argument Printf
Okay now the fun part how do we get variable parameter inside our printf
function? if we take a look on os dev website regarding on the list of the register there are at least 14 register that we can play around, and we already use up 4.
No need to rack our brain on the control flow hell, we can use the stack and dynamically calculate our next parameter. so basically we know that we push stuff on the stack 64 bit register 2 times and there is also return address.
We can visualize our stack at that moment.
On the picture above you might notice that our stack is inverted because that is the fact, the image before are just for easy visualization.
There is register called rsp
and rbp
these guys are the one who keeping track each time we push something it will decrement rsp
so it will goes down. If we push our parameter before we call it will look like this.
Now we can see where are we going, since all of our push
(and call
) is storing at least 8 byte we can use our rbx
register to use rsp
and add by 8 three times we will land on the &world
.
So we can update our code before calling printf
and set rbx to the first params.
; ...stuff
; Print stuff like C printf (but budget)
; @params rax - const char* (pointer to string)
; @params stack (8 bytes each, last push is first params)
; @trash rbx, r10
printf:
push rdx
push rcx
xor rdx, rdx
mov rcx, buffer
mov rbx, [rsp+(8*3)] ; [new stuff]
; ...stuff
_start:
mov rax, hello
push world
call printf
; ...stuff
After all that stuff we can focus on parsing we can add additional cmp
instruction after the null terminator byte check and add label for special symbol, like this.
; ...stuff
.iter_loop:
mov r10b, byte [rax]
cmp r10b, 0
je .done
; [new stuff]
cmp r10b, '%'
jne .push_char
.symbol:
.push_char:
mov byte [rcx], r10b
; ...stuff
So what is jne
? it same as je
but not equal one, notice the n
word at middle.
Next we can increment our rax
to next char and check if it was s
if not we just jump to .iter_loop
to continue our printing.
; ...stuff
.symbol:
inc rax
mov r10b, byte [rax]
cmp r10b, 's'
jne .iter_loop
; ...stuff
So next part is simple use the rbx
get the character we want and copy it into our buffer.
; ...stuff
.symbol:
inc rax
mov r10b, byte [rax]
cmp r10b, 's'
jne .iter_loop
.string_symbol:
mov r10b, byte [rbx] ; [new stuff]
mov [rcx], r10b ; [new stuff]
; ...stuff
If we compile it now it should print Hello W
, not yet our "Hello World" but still progress, we forgetting about our loop, pretty simple to add.
; ...stuff
.string_symbol:
mov r10b, byte [rbx]
cmp r10b, 0
je .iter_loop
mov [rcx], r10b
inc rcx
inc rdx
jmp .string_symbol
; ...stuff
If we try to run it now....
Bam! we got another SIGSEGV
now we really forgetting to increment our rbx
register, silly me. Don't worry our modern OS can handle this kind of stuff pretty well, we just need to update our code
; ...stuff
.string_symbol:
mov r10b, byte [rbx]
cmp r10b, 0
je .iter_loop
mov [rcx], r10b
inc rbx ; [new stuff]
inc rcx
inc rdx
jmp .string_symbol
; ...stuff
Okay cool now if we run it it should print Hello Worlds
so where that s
coming from? well if you recalling the data we define it before it was Hello %s
so, we forgetting incrementing our rax
when we enter the .symbol
, easy fix.
; ...stuff
.symbol:
inc rax
mov r10b, byte [rax]
cmp r10b, 's'
jne .iter_loop
inc rax ; [new stuff]
; ...stuff
Now we are done! but are we forgetting something? yes how about 2 parameter, since we are using stack we can just push another address to act as parameter, we can add additional constant like this.
; ...stuff
segment readable
hello: db "Hello, %s. %s", 10, 0
world: db "World", 0
goodDay: db "It was a good day", 0
; ...stuff
And then we push it before our world
; ...stuff
_start:
mov rax, hello
push goodDay ; [new stuff]
push world
call printf
; ...stuff
Now if we try to run now.... well it didn't output the new string. well we are forgetting something... that's right we are incrementing our rbx
but we never change it's address to next parameter, it's actually simple we just add 8 byte to it right?
Well you are not entirely wrong, what should we do is save our rbx
to different register and then increment it so we don't skip parameter, we are going to trash another register let's say r11
.
; ...stuff
.symbol:
; ...stuff
; [new stuff]
mov r11, rbx
add rbx, 8
.string_symbol:
mov r10b, byte [r11] ; [new stuff]
cmp r10b, 0
je .iter_loop
mov [rcx], r10b
inc r11 ; [new stuff]
inc rcx
inc rdx
jmp .string_symbol
; ...stuff
If we try to run it now... well seem like our It
inside our goodDay
variable is missing, well what did we do wrong this time?
Actually we are loading our address to rbx
is wrong for all this time, since mov
is for moving data and not for loading proper memory address. So how do we load proper address? there is instruction called lea
and it literally mean Load Effective Address.
lea rbx, [rsp+(8*3)]
if we run it now.. well now our output is jumbled mess, what is wrong now?
If you recall it now our rbx
is proper address now, and we need to get it using []
when moving it into our r11
.
; ...stuff
.symbol:
; ...stuff
mov r11, [rbx] ; [new stuff]
add rbx, 8
.string_symbol:
; ...stuff
Okay, I think we are done now, let's run it again and see what happen...
Yatta, it print Hello, World. It was a good day
now we got our first proper variable parameter for our printf
!
Okay now we are just need to display a number now... do you think we end right here, no.
Let's update our constant to include %d
format specifier. And update our printf
call.
; ...stuff
segment readable
hello: db "Hello, %s %d. %s", 10, 0
world: db "World", 0
goodDay: db "It was a good day", 0
; ...stuff
_start:
push goodDay
push 42
push world
call printf
; ...stuff
We want to treat our number as is not as pointer to an address.
So let's start by adding our parsing to include d
character now, it will look like this.
; ...stuff
.symbol:
inc rax
mov r10b, byte [rax]
cmp r10b, 's'
je .prep_string_symbol ; [new stuff]
cmp r10b, 'd' ; [new stuff]
jne .iter_loop ; [new stuff]
.number_symbol:
.prep_string_symbol: ; [new stuff]
inc rax
mov r11, [rbx]
add rbx, 8
.string_symbol:
; ...stuff
If you are recall in ASCII that number start at 48, so we can just add the number with 48, but firstly we need to get the data from stack first.
; ...stuff
inc rax
mov r11, [rbx]
add rbx, 8
.number_symbol:
; ...stuff
So we need to loop and divide it by 10 get the remainder and convert it into ASCII, pretty simple.
To divide we use div
instruction, this require rax
to be thing that we divide, but wait Isn't where our parameter at? we can just push it into the stack to save it, but it also use rdx
we just trash register r12
to store it as our temporary counter, but where are divisor? we are going to use r10
.
; ...stuff
inc rax
mov r11, [rbx]
add rbx, 8
push rax
mov r12, rdx
mov r10, 10
mov rax, r11 ; [new stuff]
.number_symbol:
xor rdx, rdx ; clear out every operation
div r10
test rax, rax
jnz .number_symbol
pop rax
mov r12, rdx
jmp .iter_loop
; ...stuff
Now we just need to push it into our buffer the converted version by adding number of 48 or just use '0'
for ease readability. pretty simple update.
; ...stuff
.number_symbol:
xor rdx, rdx
div r10
add dl, '0' ; [new stuff]
; [new stuff]
mov byte [rcx], dl
inc r12
inc rcx
test rax, rax
jnz .number_symbol
; ...stuff
Now if you run it now and not sleeping on this whole ordeal it will output Hello, World 24. It was a good day
wait, the number is inverted! that's right we need to flip it, to make it easier we just create another buffer.
; ...stuff
segment readable writable
buffer: rb 1024
bufferNum: rb 1024 ; [new stuff]
bufferNumLen = $-bufferNum ; [new stuff]
; ...stuff
Right now we just need to change instead of rcx
we will use the new buffer to store it but in reverse, firstly we need to preserve our rcx
and use to store end address of bufferNum
. We also want to decrement the it since we are doing it in reverse.
; ...stuff
push rax
push rcx ; [new stuff]
mov rcx, buffer + bufferNumLen ; [new stuff]
mov r12, rdx
mov r10, 10
mov rax, r11
.number_symbol:
xor rdx, rdx
div r10
add dl, '0'
mov byte [rcx], dl
dec rcx ; [new stuff]
test rax, rax
jnz .number_symbol
pop rcx ; [new stuff]
pop rax
mov rdx, r12
jmp .iter_loop
; ...stuff
Right we just need to copy these inverted data into main buffer.
Firstly we are need to pop
old rcx
into somewhere, for example r11
and then use it to fill the number buffer into it and increment it the counter.
; ...stuff
jnz .number_symbol
pop r11 ; old buffer
pop rax ; our params fmt
mov rdx, r12
.copy_number:
; [new stuff]
mov r10b, byte [rcx]
mov [r11], r10b
inc rcx ; increment the bufferNum
inc r11
inc rdx
cmp rcx, buffer + bufferNumLen + 1
jnz .copy_number
mov rcx, r11
jmp .iter_loop
; ...stuff
That's it we are done!
Afterword
It was interesting journey we are taking, learning a bit of how CPU and memory work. For other format I will leave at you guys as home work starting with hex format and then pointer one.
That's it from me folks if there are feedback let me know. So go on do whatever you usually do.
Top comments (1)
Nice and cool