Mirrai

Posted on Jun 13 • Edited on Jul 19

x64 Windows Assembly Fundamentals Part 2: Learning the Language

#beginners #tutorial #infosec #security

Hello everyone. Mirrai here. In Part 1 we covered registers, the Windows x64 calling convention, shadow space, and how RSP and RIP work. If you haven't read that I recommend starting there. Today we're going to cover the actual instructions or syntax of assembly. By the end you'll be able to read most of what a debugger shows you and understand what each instruction is doing and why. Keep in mind there are a lot of other instructions I won't cover but I'll cover the basic ones.

With that said, let's get into it.

Moving Data Around

The most common instruction you'll see is mov. It copies a value from a source to a destination. That's it.

mov rax, 5          ; put the value 5 into RAX
mov rbx, rax        ; copy RAX into RBX
mov rax, [rbx]      ; load the value at the memory address in RBX into RAX
mov [rbx], rax      ; store RAX's value into memory at the address in RBX

The square brackets mean "the memory at this address". Without brackets you're working with the address or whatever is stored there directly. With brackets you're dereferencing that address which basically means going to that address and reading or writing what's there. It's like pointers in C. What if there's no address in rax? Well you crash that's what.

One thing to keep in mind: you can't move memory directly to memory.

mov [rax], [rbx]    ; INVALID — assembler will reject this

You always need a register in between.

LEA - Load Effective Address

lea looks similar to mov but it does something different. Instead of loading the value at an address it loads the address itself.

lea rdx, [text_1]   ; put the address of text_1 into RDX
mov rdx, [text_1]   ; put the value AT text_1 into RDX

You saw lea in Part 1 when we loaded the string pointers for MessageBoxA. We needed the address of the string not whatever bytes happened to be at that address. That's when you use lea.

Stack Operations

You already know RSP points to the top of the stack and that it grows downward. push and pop are how you interact with it directly.

push rax        ; RSP -= 8, then stores RAX at the new RSP
pop rbx         ; loads value at RSP into RBX, then RSP += 8

Every push decrements RSP by 8. Every pop increments it by 8. This is why when you're debugging and you see RSP changing you can count how many pushes have happened.

Arithmetic

add rax, 5      ; RAX = RAX + 5
add rax, rbx    ; RAX = RAX + RBX
sub rax, 3      ; RAX = RAX - 3
sub rsp, 40     ; the shadow space allocation from Part 1

Simple enough. What matters is that arithmetic instructions affect the FLAGS register. FLAGS is a special register that stores the results of operations as individual bits. The important ones are:

ZF (Zero Flag) — set to 1 if the result was zero
SF (Sign Flag) — set to 1 if the result was negative
CF (Carry Flag) — set if there was an unsigned overflow
OF (Overflow Flag) — set if there was a signed overflow

You don't set these manually. They get set automatically whenever arithmetic or comparison instructions run. Conditional jumps read them. That's how branching works in assembly. In shellcode and related areas you barely use signed values unless of course you find a use case for it.

Logical Operations

and rax, rbx    ; RAX = RAX AND RBX (bitwise)
or  rax, rbx    ; RAX = RAX OR RBX  (bitwise)
xor rax, rbx    ; RAX = RAX XOR RBX (bitwise)
not rax         ; flip every bit in RAX

XOR deserves special attention because of one idiom you'll see everywhere in shellcode and compiled code.

xor rcx, rcx    ; zero out RCX

Why use this instead of mov rcx, 0? Two reasons.

Firstly, it's shorter in the encoding. mov rcx, 0 encodes to multiple bytes including a null byte 0x00. Shellcode can't have null bytes because many string functions like strcpy treat null as a terminator and will stop copying. xor rcx, rcx avoids this entirely.

Second XOR of any value with itself is always zero regardless of what was in the register. It's guaranteed and the CPU handles it efficiently.

You'll see this pattern constantly. Any time you need to zero a register look for xor reg, reg.

Comparisons and Conditional Jumps

This is where FLAGS becomes important. cmp subtracts one value from another but throws away the result. It only keeps the FLAGS side effects.

cmp rax, 5      ; compute RAX - 5, discard result, update FLAGS

After cmp you use a conditional jump to act on the result.

cmp rax, 5
je/jz  equal_label     ; jump if ZF=1 (result was zero, meaning RAX == 5)
jne/jnz not_equal       ; jump if ZF=0 (RAX != 5)

If rax is 5 then the zero flag (ZF) is set to one because the operation is well, zero. If it were 4 it would be -1 which isn't zero so ZF will not be set.

jmp is the unconditional version — it always jumps.

jmp some_label      ; always go here

Here's a simple loop in assembly. It will add one to RCX until it reaches five then return

xor rcx, rcx            ; counter = 0

loop_start:
    cmp rcx, 5         ; Is rcx = 5? 
    je loop_end        ; if true, exit loop, else continue
    inc rcx            ; Increments rcx by 1
    jmp loop_start     ; Jumps to loop_start until condition is met



loop_end:
    ret

Keep in mind I don't have to use shadow space or alignment here because im not calling any windows functions.

Call and Ret

If you saw Part 1 you would have noticed the code I shared used the callinstruction and I just used ret a minute ago. It's time to explain them more in-depth.

call SomeFunction

This is equivalent to:

push rip + instruction_size     ; push the return address
jmp SomeFunction                ; jump to the function

The return address is the address of the instruction immediately after the call. When the function finishes it uses ret which pops that address off the stack and jumps to it. This is why RSP has to be correct when ret executes — if something corrupted the stack the return address is wrong and execution goes somewhere unexpected. Buffer overflow exploitation works exactly by corrupting that return address intentionally.

Putting It Together

Here's an extended version of the Hello World from Part 1. This time with a loop that shows the messagebox twice.

BITS 64
default rel
global main

extern ExitProcess
extern MessageBoxA

section .data
text_1  db "Hello World", 0
text_2  db "Hello from Mirrai", 0

section .text
main:
    sub rsp, 40             ; shadow space + alignment
    xor r12, r12            ; Set r12 to zero. Our counter register

loop_start:
    cmp r12, 2              ; check if r12 == 2
    je loop_end             ; if so, exit loop

    xor rcx, rcx            ; hWnd = NULL
    lea rdx, [text_1]       ; lpText
    lea r8,  [text_2]       ; lpCaption
    mov r9,  1              ; uType = MB_OKCANCEL
    call MessageBoxA

    inc r12                 ; increments r12 by 1
    jmp loop_start

loop_end:
    xor rcx, rcx
    call ExitProcess

Notice we used R12 for the counter instead of RCX. R12 is non-volatile so MessageBoxA won't trash it.

Load this in x64dbg. Step through it and watch R12 increment. Watch RSP change when you enter and exit the shadow space. Watch RIP move through the loop. This is how assembly internalizes.

ASM Cheat-sheet

Instruction	What it does
`mov dst, src`	copy src into dst
`lea dst, [addr]`	load address into dst
`push reg`	RSP -= 8 then store reg in stack
`pop reg`	load RSP value into reg then RSP += 8
`add dst, src`	dst = dst + src
`sub dst, src`	dst = dst - src
`xor dst, dst`	zero dst's value
`cmp a, b`	set FLAGS based on a - b
`jmp label`	unconditional jump
`je/jz - jne/jnz`	conditional jumps
`call func`	push return addr, jump
`ret`	pop return addr, jump
`inc reg`	increment 1 to reg
`dec reg`	decrement 1 from reg

What's Next

Practice. It might seem hard at first but trust me it gets easier with time. All you need is persistence. As usual leave questions in the comments and see ya next time.

DEV Community