Hello everyone. Mirrai here. In Part 1 we covered registers, the Windows x64 calling convention, shadow space, and how RSP and RIP work. If you haven't read that I recommend starting there. Today we're going to cover the actual instructions or syntax of assembly. By the end you'll be able to read most of what a debugger shows you and understand what each instruction is doing and why. Keep in mind there are a lot of other instructions I won't cover but I'll cover the basic ones.
With that said, let's get into it.
Moving Data Around
The most common instruction you'll see is mov. It copies a value from a source to a destination. That's it.
mov rax, 5 ; put the value 5 into RAX
mov rbx, rax ; copy RAX into RBX
mov rax, [rbx] ; load the value at the memory address in RBX into RAX
mov [rbx], rax ; store RAX's value into memory at the address in RBX
The square brackets mean "the memory at this address". Without brackets you're working with the address or whatever is stored there directly. With brackets you're dereferencing that address which basically means going to that address and reading or writing what's there. It's like pointers in C. What if there's no address in rax? Well you crash that's what.
One thing to keep in mind: you can't move memory directly to memory.
mov [rax], [rbx] ; INVALID — assembler will reject this
You always need a register in between.
LEA - Load Effective Address
lea looks similar to mov but it does something different. Instead of loading the value at an address it loads the address itself.
lea rdx, [text_1] ; put the address of text_1 into RDX
mov rdx, [text_1] ; put the value AT text_1 into RDX
You saw lea in Part 1 when we loaded the string pointers for MessageBoxA. We needed the address of the string not whatever bytes happened to be at that address. That's when you use lea.
Stack Operations
You already know RSP points to the top of the stack and that it grows downward. push and pop are how you interact with it directly.
push rax ; RSP -= 8, then stores RAX at the new RSP
pop rbx ; loads value at RSP into RBX, then RSP += 8
Every push decrements RSP by 8. Every pop increments it by 8. This is why when you're debugging and you see RSP changing you can count how many pushes have happened.
Arithmetic
add rax, 5 ; RAX = RAX + 5
add rax, rbx ; RAX = RAX + RBX
sub rax, 3 ; RAX = RAX - 3
sub rsp, 40 ; the shadow space allocation from Part 1
Simple enough. What matters is that arithmetic instructions affect the FLAGS register. FLAGS is a special register that stores the results of operations as individual bits. The important ones are:
- ZF (Zero Flag) — set to 1 if the result was zero
- SF (Sign Flag) — set to 1 if the result was negative
- CF (Carry Flag) — set if there was an unsigned overflow
- OF (Overflow Flag) — set if there was a signed overflow
You don't set these manually. They get set automatically whenever arithmetic or comparison instructions run. Conditional jumps read them. That's how branching works in assembly. In shellcode and related areas you barely use signed values unless of course you find a use case for it.
Logical Operations
and rax, rbx ; RAX = RAX AND RBX (bitwise)
or rax, rbx ; RAX = RAX OR RBX (bitwise)
xor rax, rbx ; RAX = RAX XOR RBX (bitwise)
not rax ; flip every bit in RAX
XOR deserves special attention because of one idiom you'll see everywhere in shellcode and compiled code.
xor rcx, rcx ; zero out RCX
Why use this instead of mov rcx, 0? Two reasons.
Firstly, it's shorter in the encoding. mov rcx, 0 encodes to multiple bytes including a null byte 0x00. Shellcode can't have null bytes because many string functions like strcpy treat null as a terminator and will stop copying. xor rcx, rcx avoids this entirely.
Second XOR of any value with itself is always zero regardless of what was in the register. It's guaranteed and the CPU handles it efficiently.
You'll see this pattern constantly. Any time you need to zero a register look for xor reg, reg.
Comparisons and Conditional Jumps
This is where FLAGS becomes important. cmp subtracts one value from another but throws away the result. It only keeps the FLAGS side effects.
cmp rax, 5 ; compute RAX - 5, discard result, update FLAGS
After cmp you use a conditional jump to act on the result.
cmp rax, 5
je/jz equal_label ; jump if ZF=1 (result was zero, meaning RAX == 5)
jne/jnz not_equal ; jump if ZF=0 (RAX != 5)
If rax is 5 then the zero flag (ZF) is set to one because the operation is well, zero. If it were 4 it would be -1 which isn't zero so ZF will not be set.
jmp is the unconditional version — it always jumps.
jmp some_label ; always go here
Here's a simple loop in assembly. It will add one to RCX until it reaches five then return
xor rcx, rcx ; counter = 0
loop_start:
cmp rcx, 5 ; Is rcx = 5?
je loop_end ; if true, exit loop, else continue
inc rcx ; Increments rcx by 1
jmp loop_start ; Jumps to loop_start until condition is met
loop_end:
ret
Keep in mind I don't have to use shadow space or alignment here because im not calling any windows functions.
Call and Ret
If you saw Part 1 you would have noticed the code I shared used the callinstruction and I just used ret a minute ago. It's time to explain them more in-depth.
call SomeFunction
This is equivalent to:
push rip + instruction_size ; push the return address
jmp SomeFunction ; jump to the function
The return address is the address of the instruction immediately after the call. When the function finishes it uses ret which pops that address off the stack and jumps to it. This is why RSP has to be correct when ret executes — if something corrupted the stack the return address is wrong and execution goes somewhere unexpected. Buffer overflow exploitation works exactly by corrupting that return address intentionally.
Putting It Together
Here's an extended version of the Hello World from Part 1. This time with a loop that shows the messagebox twice.
BITS 64
default rel
global main
extern ExitProcess
extern MessageBoxA
section .data
text_1 db "Hello World", 0
text_2 db "Hello from Mirrai", 0
section .text
main:
sub rsp, 40 ; shadow space + alignment
xor r12, r12 ; Set r12 to zero. Our counter register
loop_start:
cmp r12, 2 ; check if r12 == 2
je loop_end ; if so, exit loop
xor rcx, rcx ; hWnd = NULL
lea rdx, [text_1] ; lpText
lea r8, [text_2] ; lpCaption
mov r9, 1 ; uType = MB_OKCANCEL
call MessageBoxA
inc r12 ; increments r12 by 1
jmp loop_start
loop_end:
xor rcx, rcx
call ExitProcess
Notice we used R12 for the counter instead of RCX. R12 is non-volatile so MessageBoxA won't trash it.
Load this in x64dbg. Step through it and watch R12 increment. Watch RSP change when you enter and exit the shadow space. Watch RIP move through the loop. This is how assembly internalizes.
ASM Cheat-sheet
| Instruction | What it does |
|---|---|
mov dst, src |
copy src into dst |
lea dst, [addr] |
load address into dst |
push reg |
RSP -= 8 then store reg in stack |
pop reg |
load RSP value into reg then RSP += 8 |
add dst, src |
dst = dst + src |
sub dst, src |
dst = dst - src |
xor dst, dst |
zero dst's value |
cmp a, b |
set FLAGS based on a - b |
jmp label |
unconditional jump |
je/jz - jne/jnz |
conditional jumps |
call func |
push return addr, jump |
ret |
pop return addr, jump |
inc reg |
increment 1 to reg |
dec reg |
decrement 1 from reg |
What's Next
Practice. It might seem hard at first but trust me it gets easier with time. All you need is persistence. As usual leave questions in the comments and see ya next time.
Top comments (0)