Unknown Rori

Posted on Jul 18 • Edited on Aug 4

Crafting C-like printf in assembly

#programming #assembly #c #tutorial

Ever wondered how printf("Hello world!") work under the hood? in the high level languages we take formatted output for granted but what if you had to implement it from scratch in assembly? in this posts we are going dark and explore world of assembly.

Formatted output can be deceptively simple on the surface but surprisingly complex under the hood, even the simplest thing like printf("%d", 42) lies parsing, variable argument handling, formatting, and all which managed by luxuries of standard library provided by C runtime.

Getting Started

Before we begin we should install our tools, at this time I will be using FASM and Linux (whatever is fine), FASM is pretty lightweight and easy to use but if you are using Windows it work seamlessly except for the syscall.

But what is syscall?

syscall is way of us developer to request a service from operating system, it can be hardware related, execution of process and other kind of thing, why do we have this? because for security, process or an app aren't allowed to access this kind of stuff freely, what kind of bad stuff it will happen right?

Interlude

Before we even begin writing we must need to define our intend to FASM since we are in Linux and we also wanted for 64 bit (I'm not in the mood for 32 bit) so we need to put this magic text on the first line of our main.asm file.

format elf64 executable

Next is our entry point of our code, usually it start at _start not main but we can define it whatever we want.

format elf64 executable
entry _start

But before we even start writing our code we must define our application layout this is where our code, constant data will live in the executable, these layout or segment also need additional parameter like is it allowed to be read or to be execute or to be write into.

We will define 3 of them, first one will be where to store our hello world string, the second one will be to store our global variable, and the third is where our code will live.

; ... stuff from before
segment readable

segment readable writable

segment readable executable

All right cool now we can start writing our code, but wait we don't have function in assembly.

That's right, to create a function we need to use our imagination by using label. But what is Label? label is a way for us to create bookmark instead remembering the memory address in our code so we can reference it later like jumping to that specific part, as you can see we can use these label to create a function since we are just calling it and then go back.

; ...stuff from before
segment readable executable
_start:

If we try to compile it using fasm main.asm main and then running it we can see that the output is terminated by signal SIGSEGV (Address boundary error) basically we are outside our allowed access, But why? well since this is just empty executable file, the label _start is at the doorstep of allowed access provided by our operating system.

But before we thinking about that we need to way to exit our program gracefully by opening these awesome resources, based on these we can see syscall exit, we can also see that rax, rdi, etc, these are our registers, a thing like fixed variable we can read and write inside our CPU.

Okay cool, now how do we exit?

First we need to set out rax to 60 because that is our syscall exit is, and then set our rdi to our exit code, for example 40, how do we specify the register? by using mov instruction to move our constant data into our register.

; ...stuff from before
_start:
    mov rax, 60 ; our syscall exit
    mov rdi, 40 ; exit code
    syscall

If we try to compile it we will see nothing but if you are using shell like bash you can type echo $? to show the last exit code of previous program, it should output 40.

But before we can do hello world we need to define our data first, we will define it inside our readable segment, we also need to use label to help reference it later.

; ...stuff
segment readable
    helloWorld: db "Hello, World", 10, 0
    helloWorldLen = $-helloWorld
; ...stuff

You may notice that there is helloWorldLen we will need it when we call our syscall it will need our length of the string we want to print, the $- is used to get the length of helloWorld until helloWorldLen, and there is also 10, 0 in the code, 10 is for next line and 0 for null terminator to specify that it's the end of the string. db is a FASM instruction for define byte to allow us write anything with size of a byte into that label since our string is just ASCII we can confidently use byte.

Now, how to do "hello world" in assembly by using these constant variable? it actually simple, in the linux syscall there is thing called write these thing is allow us to write in the file descriptor, the file descriptor we are interested is 1 because that is our terminal.

; ...stuff
segment readable executable
_start:
    mov rax, 1 ; syscall write
    mov rdi, 1 ; Stdout
    mov rsi, helloWorld ; load address of helloWorld
    mov rdx, helloWorldLen ; fetch data on the address of helloWorldLen
    syscall
; ...our syscall exit

If you try to compile it it will output Hello World, congratulation!

Basic Hello World Printf

Now come to the fun part writing basic hello world but printf. Firstly we need to rewrite our constant and our global variable to help us later down the line, but don't forget to remove our first hello world syscall too.

; ...stuff
segment readable
    hello: db "Hello, %s", 10, 0
    world: db "World", 0

segment readable writable
    buffer: rb 1024
; ...stuff

buffer global variable will be used on our printf to copy the string before we print to the terminal, these allow us to just call 1 syscall, it's for optimization since syscall can be slow but in this case we are just lazy dev and ask the OS sparingly.

We also need label printf too, we put it on top of the _start and we use instruction call to call our "function" or label.

; ...stuff
segment readable executable
printf:

_start:
    call printf

    mov rax, 60 ; syscall exit
    mov rdi, 40 ; Exit code
    syscall

If you try to compile these you will probably get SIGSEGV because in assembly code is run top to bottom so if we call our printf it will jump to printf and since it's empty it will go down and run code since the below it is call printf it will just recursively call itself and causing stack overflow (yo stack overflow mentioned), to return it to the caller we need use ret instruction to return, but how it work under the hood?

There is called stack that stored our data (local variable), address, when we call it will actually push the current address of execution to the stack and then jump to specified memory address, when we ret it will pop the stack and use that as return address.

Okay so we just add ret to our code.

; ...stuff
printf:
    ret
; ...stuff

Voila, no SIGSEGV. now we need to think what our function parameter should have, for starter let's start by using rax as our pointer to the address we want to print. But you may notice that syscall write require us to provide the length too, we can loop the data on that address and find null terminator while doing that we can increment a counter and copy it into our buffer in global variable.

; ...stuff
_start:
    mov rax, hello
    call printf
; ...stuff

But before all the fun stuff, how do we loop? well... there is no loop in assembly.

We can use label to specify the location and instead of using call we use jmp instruction, something like this.

; ...other stuff
printf:
.iter_loop:
    jmp .iter_loop ; [new stuff]
    ret
; ...other stuff

But wait! if you try run this code it will loop indefinitely, so while we at it we need prepare our other register to act as counter, but we need to preserve it, why? Because maybe in other part of our code they might use them for something important and we don't want to trash them. But we don't want to store everything on global right? we can use the stack by using push instruction to push data from register into stack just like how call push return address and near at the end before ret we need to pop it to restore it.

; ...stuff

; Print stuff like C printf (but budget)
; @params   rax - const char* (pointer to string)
printf:
    push rdx ; [new stuff]
.iter_loop:
    jmp .iter_loop

    pop rdx ; [new stuff]
    ret
; ...stuff

Now we can set up our counter register in peace, by using xor instruction we can zero out our register value if we xor it by itself.

; ...other stuff
printf:
    push rdx
    xor rdx, rdx ; [new stuff]
; ...stuff

Now after all of that we can use data from address inside rax and compare it with 0 as it was our null terminator, and to fetch get data from address we use [address] in mov instruction, since our data is in byte we need to specify it using byte on our byte [address] and also use register that can only store byte (8 bit) since all of our register is 8 byte (64 bit), for this case let's trash register r10 8 bit variant r10b.

; ...other stuff
printf:
    push rdx
    xor rdx, rdx 
.iter_loop:
    mov r10b, byte [rax] ; [new stuff]
    jmp .iter_loop

; ...stuff

Wait now, we are not done yet we are going to compare if the r10b is now with our 0 and then we jump out to label so we don't stuck in infinite loop, we can use cmp instruction to compare between 2 value either it's register with register or with constant.

; ...stuff
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
; ...stuff

But after comparing we need to do with the result of comparison, the result is on flag register which we usually cannot access freely, by using other version of jmp we can use it to create branch in our code since we want that value of r10b is equal we need to use je for jump if equal.

; ...stuff
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
    je .done
.done:
    pop rdx
    ret
; ...stuff

Now if we try to run this it will still loop forever because we didn't increment our pointer in rax and also counter, we can use inc instruction to increment it by one.

; ...stuff
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
    je .done
    inc rdx ; [new stuff]
    inc rax ; [new stuff]
    jmp .iter_loop
; ...stuff

If we try to run this now.... nothing happen and we do not loop forever, finally.

Next step is to copy the data to the buffer it's actually simple because we just reverse the mov operation like in .iter_loop but in reverse and store it into buffer instead of rax, for nice readability let's add .push_char.

; ...stuff
printf:
    push rdx

    xor rdx, rdx
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
    je .done

; [new stuff]
.push_char:
    mov byte [buffer], r10b
    inc rdx
    inc rax
    jmp .iter_loop
.done:
    pop rdx
    ret

; ...stuff

Now you may notice that we just keep updating value inside buffer without incrementing the index, we can use rcx to store the base pointer of buffer and then increment it when after the we push the char to the buffer.

; ...stuff
printf:
    push rdx
    push rcx ; [new stuff]

    xor rdx, rdx
    mov rcx, buffer ; [new stuff]
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
    je .done
.push_char:
    mov byte [rcx], r10b ; [new stuff]
    inc rcx ; [new stuff]
    inc rdx
    inc rax
    jmp .iter_loop
.done:
    pop rcx ; [new stuff]
    pop rdx
    ret
; ...stuff

Now lastly we need to call our os to write the thing in buffer since the length is already stored inside rdx we don't need to specify it manually

; ...stuff
.done:
; [new stuff]
    mov rax, 1
    mov rdi, 1
    mov rsi, buffer
    syscall

    pop rcx
    pop rdx
    ret
; ...stuff

If you compile and run it Hello %s should be printed on the terminal, congratulation! address inside rax has successfully copied into buffer.

Variable argument Printf

Okay now the fun part how do we get variable parameter inside our printf function? if we take a look on os dev website regarding on the list of the register there are at least 14 register that we can play around, and we already use up 4.

No need to rack our brain on the control flow hell, we can use the stack and dynamically calculate our next parameter. so basically we know that we push stuff on the stack 64 bit register 2 times and there is also return address.

We can visualize our stack at that moment.

On the picture above you might notice that our stack is inverted because that is the fact, the image before are just for easy visualization.

There is register called rsp and rbp these guys are the one who keeping track each time we push something it will decrement rsp so it will goes down. If we push our parameter before we call it will look like this.

Now we can see where are we going, since all of our push (and call) is storing at least 8 byte we can use our rbx register to use rsp and add by 8 three times we will land on the &world.

So we can update our code before calling printf and set rbx to the first params.

; ...stuff
; Print stuff like C printf (but budget)
; @params   rax - const char* (pointer to string)
; @params   stack (8 bytes each, last push is first params)
; @trash    rbx, r10
printf:
    push rdx
    push rcx

    xor rdx, rdx
    mov rcx, buffer
    mov rbx, [rsp+(8*3)] ; [new stuff]
; ...stuff
_start:
    mov rax, hello
    push world
    call printf
; ...stuff

After all that stuff we can focus on parsing we can add additional cmp instruction after the null terminator byte check and add label for special symbol, like this.

; ...stuff
.iter_loop:
    mov r10b, byte [rax]
    cmp r10b, 0
    je .done
; [new stuff]
    cmp r10b, '%'
    jne .push_char
.symbol:

.push_char:
    mov byte [rcx], r10b
; ...stuff

So what is jne? it same as je but not equal one, notice the n word at middle.

Next we can increment our rax to next char and check if it was s if not we just jump to .iter_loop to continue our printing.

; ...stuff
.symbol:
    inc rax
    mov r10b, byte [rax]
    cmp r10b, 's'
    jne .iter_loop
; ...stuff

So next part is simple use the rbx get the character we want and copy it into our buffer.

; ...stuff
.symbol:
    inc rax
    mov r10b, byte [rax]
    cmp r10b, 's'
    jne .iter_loop
.string_symbol:
    mov r10b, byte [rbx] ; [new stuff]
    mov [rcx], r10b ; [new stuff]
; ...stuff

If we compile it now it should print Hello W, not yet our "Hello World" but still progress, we forgetting about our loop, pretty simple to add.

; ...stuff
.string_symbol:
    mov r10b, byte [rbx]
    cmp r10b, 0
    je .iter_loop

    mov [rcx], r10b

    inc rcx
    inc rdx
    jmp .string_symbol
; ...stuff

If we try to run it now....

Bam! we got another SIGSEGV now we really forgetting to increment our rbx register, silly me. Don't worry our modern OS can handle this kind of stuff pretty well, we just need to update our code

; ...stuff
.string_symbol:
    mov r10b, byte [rbx]
    cmp r10b, 0
    je .iter_loop

    mov [rcx], r10b

    inc rbx ; [new stuff]
    inc rcx
    inc rdx
    jmp .string_symbol
; ...stuff

Okay cool now if we run it it should print Hello Worlds so where that s coming from? well if you recalling the data we define it before it was Hello %s so, we forgetting incrementing our rax when we enter the .symbol, easy fix.

; ...stuff
.symbol:
    inc rax
    mov r10b, byte [rax]
    cmp r10b, 's'
    jne .iter_loop
    inc rax ; [new stuff]
; ...stuff

Now we are done! but are we forgetting something? yes how about 2 parameter, since we are using stack we can just push another address to act as parameter, we can add additional constant like this.

; ...stuff
segment readable
    hello: db "Hello, %s. %s", 10, 0
    world: db "World", 0
    goodDay: db "It was a good day", 0
; ...stuff

And then we push it before our world

; ...stuff
_start:
    mov rax, hello
    push goodDay ; [new stuff]
    push world
    call printf
; ...stuff

Now if we try to run now.... well it didn't output the new string. well we are forgetting something... that's right we are incrementing our rbx but we never change it's address to next parameter, it's actually simple we just add 8 byte to it right?

Well you are not entirely wrong, what should we do is save our rbx to different register and then increment it so we don't skip parameter, we are going to trash another register let's say r11.

; ...stuff
.symbol:
    ; ...stuff
    ; [new stuff]
    mov r11, rbx
    add rbx, 8
.string_symbol:
    mov r10b, byte [r11] ; [new stuff]
    cmp r10b, 0
    je .iter_loop

    mov [rcx], r10b

    inc r11 ; [new stuff]
    inc rcx
    inc rdx
    jmp .string_symbol
; ...stuff

If we try to run it now... well seem like our It inside our goodDay variable is missing, well what did we do wrong this time?

Actually we are loading our address to rbx is wrong for all this time, since mov is for moving data and not for loading proper memory address. So how do we load proper address? there is instruction called lea and it literally mean Load Effective Address.

    lea rbx, [rsp+(8*3)]

if we run it now.. well now our output is jumbled mess, what is wrong now?

If you recall it now our rbx is proper address now, and we need to get it using [] when moving it into our r11.

; ...stuff
.symbol:
    ; ...stuff
    mov r11, [rbx] ; [new stuff]
    add rbx, 8
.string_symbol:
    ; ...stuff

Okay, I think we are done now, let's run it again and see what happen...

Yatta, it print Hello, World. It was a good day now we got our first proper variable parameter for our printf!

Okay now we are just need to display a number now... do you think we end right here, no.

Let's update our constant to include %d format specifier. And update our printf call.

; ...stuff
segment readable
    hello: db "Hello, %s %d. %s", 10, 0
    world: db "World", 0
    goodDay: db "It was a good day", 0
; ...stuff
_start:
    push goodDay
    push 42
    push world
    call printf
; ...stuff

We want to treat our number as is not as pointer to an address.

So let's start by adding our parsing to include d character now, it will look like this.

; ...stuff
.symbol:
    inc rax
    mov r10b, byte [rax]
    cmp r10b, 's'
    je .prep_string_symbol ; [new stuff]
    cmp r10b, 'd' ; [new stuff]
    jne .iter_loop ; [new stuff]
.number_symbol:

.prep_string_symbol: ; [new stuff]
    inc rax
    mov r11, [rbx]
    add rbx, 8
.string_symbol:
    ; ...stuff

If you are recall in ASCII that number start at 48, so we can just add the number with 48, but firstly we need to get the data from stack first.

; ...stuff
    inc rax
    mov r11, [rbx]
    add rbx, 8

.number_symbol:

; ...stuff

So we need to loop and divide it by 10 get the remainder and convert it into ASCII, pretty simple.

To divide we use div instruction, this require rax to be thing that we divide, but wait Isn't where our parameter at? we can just push it into the stack to save it, but it also use rdx we just trash register r12 to store it as our temporary counter, but where are divisor? we are going to use r10.

; ...stuff
    inc rax
    mov r11, [rbx]
    add rbx, 8

    push rax
    mov r12, rdx
    mov r10, 10
    mov rax, r11 ; [new stuff]
.number_symbol:
    xor rdx, rdx ; clear out every operation
    div r10

    test rax, rax
    jnz .number_symbol

    pop rax
    mov r12, rdx
    jmp .iter_loop

; ...stuff

Now we just need to push it into our buffer the converted version by adding number of 48 or just use '0' for ease readability. pretty simple update.

; ...stuff
.number_symbol:
    xor rdx, rdx
    div r10
    add dl, '0' ; [new stuff]

    ; [new stuff]
    mov byte [rcx], dl
    inc r12
    inc rcx

    test rax, rax
    jnz .number_symbol
; ...stuff

Now if you run it now and not sleeping on this whole ordeal it will output Hello, World 24. It was a good day wait, the number is inverted! that's right we need to flip it, to make it easier we just create another buffer.

; ...stuff
segment readable writable
    buffer: rb 1024
    bufferNum: rb 1024 ; [new stuff]
    bufferNumLen = $-bufferNum ; [new stuff]
; ...stuff

Right now we just need to change instead of rcx we will use the new buffer to store it but in reverse, firstly we need to preserve our rcx and use to store end address of bufferNum. We also want to decrement the it since we are doing it in reverse.

; ...stuff
    push rax
    push rcx ; [new stuff]
    mov rcx, buffer + bufferNumLen ; [new stuff]
    mov r12, rdx
    mov r10, 10
    mov rax, r11
.number_symbol:
    xor rdx, rdx
    div r10
    add dl, '0'

    mov byte [rcx], dl
    dec rcx ; [new stuff]

    test rax, rax
    jnz .number_symbol

    pop rcx ; [new stuff]
    pop rax
    mov rdx, r12
    jmp .iter_loop
; ...stuff

Right we just need to copy these inverted data into main buffer.
Firstly we are need to pop old rcx into somewhere, for example r11 and then use it to fill the number buffer into it and increment it the counter.

; ...stuff
    jnz .number_symbol

    pop r11 ; old buffer
    pop rax ; our params fmt
    mov rdx, r12
.copy_number:
; [new stuff]
    mov r10b, byte [rcx]
    mov [r11], r10b
    inc rcx ;  increment the bufferNum
    inc r11
    inc rdx
    cmp rcx, buffer + bufferNumLen + 1
    jnz .copy_number

    mov rcx, r11
    jmp .iter_loop
; ...stuff

That's it we are done!

Afterword

It was interesting journey we are taking, learning a bit of how CPU and memory work. For other format I will leave at you guys as home work starting with hex format and then pointer one.

That's it from me folks if there are feedback let me know. So go on do whatever you usually do.

Top comments (1)

eslam linux • Aug 17

Nice and cool