First of all this blog contains spoilers from the course that Xeno Kovah prepared. It is Architecture 1001: x86-64 Assembly. If you want to learn it by yourself do not read this blog and go the link below. This and probably two more blog will be entirely about what I was learning during this course. Thank you for Xeno Kovah for preparing such a wonderful course.
Architecture 1001: x86-64 Assembly Course
Life of RSP in Stack Memory
RSP aka register stack pointer is a 64 bit general purpose register.
General purpose registers are extra registers that are present in the CPU and are utilized anytime data or a memory location is required. These registers are used for storing operands and pointers.
https://www.geeksforgeeks.org/general-purpose-registers/
We will check the C/C++ code, check the disassembly code and draw a stack diagram to see what is happening in stack. We will use Visual Studio’s complier. Other disassembly codes from other compliers (GCC) can be different from the assembly code that we will see in this blog.
main()
0000000140001010 sub rsp,28h
0000000140001014 call func (0140001000h)
0000000140001019 mov eax,0F00Dh
000000014000101E add rsp,28h
0000000140001022 ret
func()
0000000140001000 mov eax,0BEEFh
0000000140001005 ret
Before we start drawing a stack diagram we need to understand how stack works. Stack grows from higher to lower memory addresses. The bottom of the stack have higher address and the top of the stack have lower address. When we check C++ code above, we start with main() function and then call func() inside main(), so our func() function will have a lower memory address than main().
If we have a function that calls another function, in our scenerio main() calls func(). At the top of main()'s frame there should be a return address for the func(). That is func()'s return address to main().
0000000140001014 call func (0140001000h)
0000000140001019 mov eax,0F00Dh
After calling func() and executing it, we need to return main() to continue executing main()
0000000140001019 mov eax,0F00Dh
is the instruction that we will return in main() and 00000001'40001019
is the return address to main(). So we will store func()'s return address to main() at the top of the main() frame. It is a bit confusing at first, where is the bottom and top of the stack which address is lower or higher. Don't worry you will understand all of them at the end of the post.
RSP(rsp1) will start from the function which called main()'s top stack address.
0000000140001010 sub rsp,28h
we subtract 0x28 (40 byte) from rsp inside main(), we will understand why we create those undefined memories later. Position of the rsp register is now in rsp2 as we can see in the diagram which is the top of undefined spaces 14FDE0.
0000000140001014 call func (0140001000h)
We call func(). rsp (rsp3) goes to the top of main() which is in 14FDD8 and store func()'s return address. After we execute func() we need to return to main() to continue executing from where we have left. 14FDD8 (top of main()) has our return address to main(), rsp will go to that return address when returning to main() time comes and rsp(rsp4) will return to main() again.
After executing main(), rsp(rsp5) will return where it came from before the program starts.
main()
0000000140001020 sub rsp,28h
0000000140001024 call func (0140001000h)
0000000140001029 add rsp,28h
000000014000102D ret
func()
0000000140001000 sub rsp,18h
0000000140001004 mov dword ptr [rsp],5CA1AB1Eh
000000014000100B mov eax,dword ptr [rsp]
000000014000100E add rsp,18h
0000000140001012 ret
Because we know how to get inside func() from the previous example we can check this line below.
0000000140001004 mov dword ptr [rsp],5CA1AB1Eh
When we see dword ptr [rsp]
or qword ptr [rsp]
or word ptr[rax]
kind of lines, they are generally represents a variable initialization in reverse engineering perspective. What I understand from mov dword ptr [rsp],5CA1AB1Eh
is, because of it is a DWORD, it should be 4 byte memory and 4 byte memory is generally an integer. So our code must be
C/C++
int a = 0x5CA1AB1E;
RSP is now in 8 byte space 14FDC0 because it is an 64 bit(8byte) register. We will use half of the 8 byte space to store 0x5CA1AB1E because it is 4 byte integer.
So our stack diagram will be
Our stack diagram shows frames will be,
Oranges are main() frame and greens are func() frame.
Before we continue with another examples we need to clarify some of the undefined spaces in stack. I don’t have a deeper knowledge about why but Windows compiler seems to choose 16 bytes alignment padding in the stack.
Lets draw the stack diagram again..
As we see some undefined memory will be cleared. They are still undefined memory but now we now we know what is their purposes.
main()
0000000140001050 sub rsp,28h
0000000140001054 call 0000000140001000
0000000140001059 add rsp,28h
000000014000105D ret
func()
0000000140001000 sub rsp,28h
0000000140001004 mov rax,0F01DAB1EF007BAh
000000014000100E mov qword ptr [rsp+8],rax
0000000140001013 mov eax,0B57AC1E5h
0000000140001018 mov qword ptr [rsp],rax
000000014000101C mov rax,57ABBADABAD00h
0000000140001026 mov qword ptr [rsp+10h],rax
000000014000102B mov rax,qword ptr [rsp]
000000014000102F mov rcx,qword ptr [rsp+8]
0000000140001034 add rcx,rax
0000000140001037 mov rax,rcx
000000014000103A add rsp,28h
000000014000103E ret
Lets reverse the func()
0000000140001004 mov rax,0F01DAB1EF007BAh
000000014000100E mov qword ptr [rsp+8],rax
we move 0F01DAB1EF007BAh(8 byte value) to rax(64 bit register) because it stores more than 8 hex digits so it is more than 4 byte space.
[0XFFFFFFFF -> max value of 4 byte, 8 hex digit ]
Then we move rax to qword ptr [rsp+8] (should be a variable). We store it in memory[rsp + 8] which is 8 byte memory 14FDB8.
C/C++ code needs to look like
long long a = 0xF01DAB1EF007BA;
0000000140001013 mov eax,0B57AC1E5h
0000000140001018 mov qword ptr [rsp],rax
we are using eax(32 bit register), so 0B57AC1E5h can be stored in 4 byte space.
But in the second line after seeing QWORD we understand, variable that stores 0B57AC1E5h, uses 8 byte memory. We store it in memory of [rsp] 14FDB0
C/C++ code needs to look like
long long b = 0xB57AC1E5;
000000014000101C mov rax,57ABBADABAD00h
0000000140001026 mov qword ptr [rsp+10h],rax
C/C++ code needs to look like
long long c = 0x57ABBADABAD00;
We store it in [rsp + 10h] which is 16 byte away from rsp (0x10 = 16 decimal).
000000014000102B mov rax,qword ptr [rsp]
000000014000102F mov rcx,qword ptr [rsp+8]
0000000140001034 add rcx,rax
0000000140001037 mov rax,rcx
- we move value in [rsp] to [rax] = 0B57AC1E5h
- we move value in [rsp + 8] to [rcx] = 0F01DAB1EF007BAh
- we add rax to rcx which is rcx += rax.
- we move rcx to rax, because [rax] is return register
C/C++ code needs to look like
return = a + b; // return 0B57AC1E5h + 0F01DAB1EF007BAh
main()
0000000140001000 sub rsp,38h
0000000140001004 mov eax,0FFFFBABEh
0000000140001009 mov word ptr [rsp],ax
000000014000100D mov rax,0BA1B0AB1EDB100Dh
0000000140001017 mov qword ptr [rsp+8],rax
000000014000101C mov eax,4
0000000140001021 imul rax,rax,1
0000000140001025 movsx ecx,word ptr [rsp]
0000000140001029 mov dword ptr [rsp+rax+10h],ecx
000000014000102D mov eax,4
0000000140001032 imul rax,rax,1
0000000140001036 movsxd rax,dword ptr [rsp+rax+10h]
000000014000103B add rax,qword ptr [rsp+8]
0000000140001040 mov ecx,4
0000000140001045 imul rcx,rcx,4
0000000140001049 mov dword ptr [rsp+rcx+10h],eax
000000014000104D mov eax,4
0000000140001052 imul rax,rax,4
0000000140001056 movzx eax,word ptr [rsp+rax+10h]
000000014000105B add rsp,38h
000000014000105F ret
Lets check the code first.
0000000140001004 mov eax,0FFFFBABEh
0000000140001009 mov word ptr [rsp],ax
we move 0FFFFBABEh to eax(32 bit) but we only use 16 bit(LSB) from it.
0000000140001009 mov word ptr [rsp],ax
get 16 bit(LSB) of eax which is ax and store it in a variable which is 2 byte (WORD).
C/C++ code should be
short a = 0Xbabe;
000000014000101C mov eax,4
0000000140001021 imul rax,rax,1
0000000140001025 movsx ecx,word ptr [rsp]
0000000140001029 mov dword ptr [rsp+rax+10h],ecx
In this code, things will become interesting.
000000014000101C mov eax,4
we move decimal 4 to eax which is the size of integer.
0000000140001021 imul rax,rax,1
Than multiply it with 1. Which is the index of an integer array.
If we multiply it with 2, index will be 2 and it will represent arr[2].
0000000140001025 movsx ecx,word ptr [rsp]
Than we sign extend (movsx instruction) short variable(16 bit) to 32 bit and move it to ecx(32 bit register). What we understand is, to store a 2 byte variable in 4 byte space(integer) we need to extend it and fill these 2 byte gap.
0xBABE will become 0xFFFFBABE in sign extension because B is negative in signed values.
0Xbabe = 1011 1010 1011 1010 (Sign bit = 1 value is negative)
0000000140001029 mov dword ptr [rsp+rax+10h],ecx
because of rax = 4 in decimal we convert it to hex which is same as decimal value of 4 -> 0x4. Assembly code will become mov dword ptr [rsp+14h],ecx
14FDD0[rsp] + 10h = 14FDE0[rsp + 10h]
14FDE0[rsp + 10h] + 4h = 14FDE4[rsp + 14h]
Same pattern in
000000014000102D mov eax,4
0000000140001032 imul rax,rax,1
0000000140001036 movsxd rax,dword ptr [rsp+rax+10h]
000000014000103B add rax,qword ptr [rsp+8]
0000000140001040 mov ecx,4
0000000140001045 imul rcx,rcx,4
0000000140001049 mov dword ptr [rsp+rcx+10h],eax
we get the value in b[1] -> [rsp + 14h] and move it to rax.
add rax to [rsp + 8] -> rax = b[1] + 0BA1B0AB1EDB100Dh;
000000014000100D mov rax,0BA1B0AB1EDB100Dh
0000000140001017 mov qword ptr [rsp+8],rax
0BA1B0AB1EDB100Dh is a variable at [rsp + 8]
So now rax = b[1] + c
.
0000000140001040 mov ecx,4
0000000140001045 imul rcx,rcx,4
0000000140001049 mov dword ptr [rsp+rcx+10h],eax
locate b[4] and use 32 bits(LSB) of rax which is eax.
C/C++ code
b[4] = b[1] + c;
Move eax to [rsp + 20h] but why 20h?
0000000140001040 mov ecx,4
0000000140001045 imul rcx,rcx,4
rcx = 16 decimal, which is 0x10h(16 bytes). So [rsp + rcx + 10h] = [rsp + 20h]
14FDD0[rsp] + 10h = 14FDE0[rsp + 10h]
14FDE0[rsp + 10h] + 10h = 14FDD0[rsp + 20h]
main()
0000000140001020 sub rsp,28h
0000000140001024 mov ecx,11h
0000000140001029 call 0000000140001000
000000014000102E add rsp,28h
0000000140001032 ret
func()
0000000140001000 mov dword ptr [rsp+8],ecx
0000000140001004 sub rsp,18h
0000000140001008 mov eax,dword ptr [rsp+20h]
000000014000100C mov dword ptr [rsp],eax
000000014000100F mov eax,dword ptr [rsp]
0000000140001012 add rsp,18h
0000000140001016 ret
There is weird behaviour happens. Lets check what it is.
0000000140001024 mov ecx,11h
In main() function we move 11h to ecx register which is the first argument of caller function.
What is caller function ?
Caller function is function that calls in the first place, which sent arguments, it is main() and callee function is func() which gets parameters.
In x64 Calling Convention of Windows we use
func1(int a, int b, int c, int d, int e, int f);
//a in RCX, b in RDX, c in R8, d in R9, f then e pushed on stack
https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170
So we store the first argument (11h) in ecx(32bit) which is LSB of rcx(64bit).
0000000140001000 mov dword ptr [rsp+8],ecx
Before func() starts creating its space and continue executing instructions, first thing it does is moving the value in [ecx] to a variable which is inside main() frame.
Normally if this situation happens when executing main(), we think that there is a integer variable created, but now it happens when executing func()'s instructions. This is where we need to think, a variable is sent to func() as a parameter.
0000000140001004 sub rsp,18h
Than we create undefined memories in func()
0000000140001008 mov eax,dword ptr [rsp+20h]
Now our rsp is in rsp4 position. We are getting our parameter from *main() [rsp + 20h] *
14FDC0 + 20h = 14FDE0
000000014000100C mov dword ptr [rsp],eax
store the value in integer variable, which have a memory address [rsp].
C / C++ code will be
int func(int a) {
int i = a;
}
So we are getting close to explore what are those 4x8 bytes of undefined grey colored memory. Lets see one more example..
main()
0000000140001050 sub rsp,38h
0000000140001054 mov qword ptr [rsp+20h],55h
000000014000105D mov r9d,44h
0000000140001063 mov r8d,33h
0000000140001069 mov edx,22h
000000014000106E mov ecx,11h
0000000140001073 call 0000000140001000
0000000140001078 add rsp,38h
000000014000107C ret
func()
0000000140001000 mov qword ptr [rsp+20h],r9
0000000140001005 mov qword ptr [rsp+18h],r8
000000014000100A mov qword ptr [rsp+10h],rdx
000000014000100F mov qword ptr [rsp+8],rcx
0000000140001014 sub rsp,18h
0000000140001018 mov rax,qword ptr [rsp+28h]
000000014000101D mov rcx,qword ptr [rsp+20h]
0000000140001022 add rcx,rax
0000000140001025 mov rax,rcx
0000000140001028 add rax,qword ptr [rsp+30h]
000000014000102D add rax,qword ptr [rsp+38h]
0000000140001032 add rax,qword ptr [rsp+40h]
0000000140001037 mov dword ptr [rsp],eax
000000014000103A mov eax,dword ptr [rsp]
000000014000103D add rsp,18h
0000000140001041 ret
Now that we know how to read the assembly code from previous examples we understand, that 4 x 8byte space are allocated for the first 4 arguments and for the 5th argument main() frame create another 16 byte space. 8 byte for 5th argument and 8 byte for 16 byte stack alignment padding.
0000000140001000 mov qword ptr [rsp+20h],r9
0000000140001005 mov qword ptr [rsp+18h],r8
000000014000100A mov qword ptr [rsp+10h],rdx
000000014000100F mov qword ptr [rsp+8],rcx
Another important thing is 4th parameter pushed deeper than 1st.
USEFUL LINKS
Top comments (0)