DEV Community

Cover image for picoCTF "MATRIX" Walkthrough (Caution: an extremely lengthy post)
SJ W
SJ W

Posted on • Updated on

picoCTF "MATRIX" Walkthrough (Caution: an extremely lengthy post)

Introduction

For this post, we are going to tackle one of the hardest challenges when it comes to the category of Reverse Engineering on picoCTF that is worth 500 points. Certainly, it was a lot harder and took more time to understand and ultimately obtain the flag than two previous challenges I completed, but it was a pretty great learning experience I would say, which taught me a good deal about the mindset and technique I should use to approach reverse engineering in general. Since the challenge is a lot harder than others, the post may be a bit longer than the usual to explain some of the findings, but I will do my best to explain things concisely and right to the point. Without further ado, let's dive in to it!

1. Analyzing the Binary

Start

Just with two previous challenges, this requires one to download the binary and examine the content inside. Let's download and import it to Ghidra for analysis.

Image description

Image description

  • Here are various details regarding the binary. In some cases, there may be some noteworthy details that might aid one in finding out what one is working with, but I don't really see anything too interesting in it, so we move on to the next phase where Ghidra commences analysis.

Image description

  • Let's use the default options for the analysis of the binary.

Image description

  • Here we are in the part of Ghidra for examining the innards of the binary. You can check the bottom-right corner to check the progress of an analysis. Once it's done, we start analyzing it ourselves!

"Main" Function

So, the very first thing I do is to check a list of functions that might give away some hints on how we should approach the challenge. Let's check out what kind of functions it has.

Image description

Image description

I don't even see anything too noteworthy that piques my interest, and in fact, there is no main function that we usually encounter in a binary written in C. Hmm, I think this binary is written in C and compiled with GCC, but I don't see any function called main. However, there is a function called __libc_start_main, which is called inside the function entry. Let's check out the function entry.

Image description

The function __libc_start_main is provided with the total 7 arguments. There are a few interesting arguments worth checking out. In Ghidra, any function that cannot be named, due to it being stripped during the compilation process or for some reason, gets assigned a random name that starts with the substring FUN_. As you can see, there are a few arguments whose name starts with the substring FUN_. So what one can deduce is that one of the functions passed to the function __libc_start_main is called when the program starts? Rummaging through the Internet, I come across this help page for the function __libc_start_main.

Image description

The page basically states that the function __libc_start_main is called to initialize the environment for a process, prior to executing the main function we call. The very first argument passed to the function is actually a function pointer for the main function that will be executed, once the preparation for running a process is done. The name of the function we must check out is FUN_001010d0. Let's move on to checking out the very function FUN_001010d0.

Image description

Image description

Seems like we are at the right place to start figuring out how things work!

  • In the screenshot right above this paragraph, line 39 outputs the string Have a flag!, which subsequently in the next line outputs the flag we need to complete the challenge. Since we are in the initial stage of analysis, it is still very unclear what we should do to lead the binary to output the flag in general. Let's investigate the function for more clues.
  • From line 32 to 34, it repeatedly calls the function FUN_00101350 inside the do...while block. The function FUN_00101350 is provided with total two arguments, &local_168 and local_16c. In line 37, it checks the first value of local_16c to see if it amounts to a NULL byte (0), and once it checks that it is indeed a NULL byte, it proceeds to the next line to check whether the value of the variable local_16a is a NULL byte as well. Based on the information I laid out, you can presume that the variable local_16c is passed to the function FUN_00101350, and somehow gets modified after each execution. The character & indicates that it accepts the address of the variable local_168. So what does the variable &local_168, the first argument of the function FUN_00101350, contain?

Image description

  • The variable local_16a gets assigned the address of the data DAT_001020f0. Since its name starts with the substring DAT_, we can presume that it may be pointing to a constant data that is located somewhere within the binary. Let's check out what data it contains.

Image description

Okay, we end up at the address of 0x1020f0, and it contains a byte, which is equal to the value of 0x81. And in the next bytes, the values 0x75, 0x00, 0x80 are laid out. Still not sure about its purpose. After further investigating and scrolling around, I find something interesting.

Image description

  • Let's read this thing from bottom to top: m a k e i t o u t a l i v e ?.

Image description

  • Another interesting artifacts that read like the following, when you read it from bottom to top: Congratulations,.

Actually, let's run the binary in a VM to check out how it works at face value.

Image description

  • Okay, at the outset, it actually outputs the message WELCOME to the M A T R I X \n Can you make it out alive?. Did you notice that at the last part of the output, it contains the substring make it out alive? that we found during the analysis of the binary?
  • Subsequently, I input a random string to see what happens, and press ENTER, and am instantly greeted with the message You were eaten by a grue.
  • So, I basically have to provide the right password to the binary to eventually obtain the flag. What's the right password? Let's go back to Ghidra and investigate further statically.

Exploring the function "FUN_00101350"

Image description

Back to Ghidra! Since the function FUN_00101350 seems to be the one that dictates the overall flow of the binary, it's time to check out its contents.

Image description

Image description

Okay, so this is quite a lot to digest, with nearly 200 lines of code welcoming us right before our eyes. Nonetheless, we have to go over the entirety of the function, in order to figure out what's going on.

Image description

Here are some references that you should know prior to continuing. These will be referenced throughout the post, and will greatly help you understand how things work basically:

  • param_1[0] (param_1 + 0 bytes == 0x00007fffffffdfc0) - holds the address 0x1020f0 (0x00005555555560f0), from which the real program starts. When the function FUN_00101350 gets invoked, the binary passes the address 0x00007fffffffdfc0 to the FUN_00101350 as the first argument.
  • param_1[1] (param_1 + 8 bytes == 0x00007fffffffdfc8) - holds the offset of two bytes, with which the address, at which the current command is located, is calculated, by adding 0x1020f0 to the offset.
  • param_1[2] (param_1 + 16 bytes == 0x00007fffffffdfd0) - holds the address of the top of one of the makeshift stacks. This stack gets used the most often. The starting address of the stack is at
  • param_1[3] (param_1 + 24 bytes == 0x00007fffffffdfd8) - holds the address of the top of one of the makeshift stacks. This stack gets used the least often and is mainly for temporarily storing values needed in another stack.

Image description

  • The register r12 stores the lowest address of the stack at param_1[3], which is 0x0000555555559ab0.
  • The register r13 stores the lowest address of the stack at param_1[2], which is 0x00005555555592a0.
    • Every item on the stack is 2 bytes worth, which means that an item occupies 4 digits at once. And also, considering the fact that the binary is in little-endian, the value starts from the right, so you can read like 0x0001|0001|0000|0061. In the screenshot above, the address at 0x00005555555592a6 holds the value 0061, which is basically the hexadecimal value of a in ASCII. a is the very first character of the input I provided to the binary. So I assume that every character gets placed at the exact address of 0x00005555555592a6 for processing. What about 0x0001|0001|0000| in the front? We will see what they represent.

Image description

  • At this point, all we know is that the function receives two arguments, the first one being a pointer to another pointer - &local_168 gets passed as the first argument, and & indicates the memory address of the variable local_168. local_168, by the way, is assigned an address of 0x1020f0 prior to entering this function - local_168 = &DAT_001020f0; - so we know for sure that it is a pointer.
  • In line 17, the binary dereferences the pointer param_1, whose value is 0x00007fffffffdfc0, and assigns it to the variable lVar4. lVar4 ends up with the address of 0x1020f0.
  • In line 18, it adds 1 to the argument param_1, which becomes 0x00007fffffffdfc8. Since param_1 is a pointer of the long type (8 bytes), when you add 1 to it, it is equal to adding 8. And then it change its type from long * to ushort *, which means that it turns into a pointer that can reference up to two bytes only. And lastly, it dereferences the pointer - the first two bytes from the address of param_1 + 1 (0x00007fffffffdfc8) - and assigns it to the variable uVar2.
  • In line 19, the binary adds the value of the variable uVar2 to 1, and assigns it to the variable uVar9.
  • In line 20, the binary assigns the value of uVar9 to the address param_1 + 1 (0x00007fffffffdfc8). The value at param_1 + 1 increments.
  • In line 21, it adds the address 0x1020f0 (0x00005555555560f0) stored in the variable lVar4 to the value of uVar2, which results in the address bigger than 0x1020f0, subsequently converts its type from long * to byte *, and lastly dereference it to fetch a byte from the calculated address.
  • Summary - The variable uVar2 in line 18 holds the offset, which in line 21 is added to the address 0x1020f0 and subsequently dereferenced for a command that needs to be executed. Let's see what kind of commands there are by going through the screenshots below.

Image description

Image description

Image description

Image description

Image description

Going through the screenshots, you may notice that there are some notable case clauses with the following values:

  • 0 (0x00)
  • 1 (0x01)
  • 0x10 - 0x14
  • 0x20 - 0x21
  • 0x30 - 0x34
  • default - 0x80 - 0x81, 0xc0 - 0xc1

Each case clause performs different functions for the binary, but one thing in common is that the most of them use the line at the beginning to fetch the address of the top of the stack at param_1[2]. The binary actually utilizes two makeshift stacks to store various data needed in calculation. The memory address at param_1[2] stores the top address of one of the makeshift stacks that the binary uses. Stacks used in this binary consist of values up to 2 bytes each, so when the binary pushes or pops the stack, the address of the top of the stack increases or decreases by two bytes.

Explaining what each value does in detail may be way too time-consuming, so instead, I will give you the brief explanations of what each does.

  • 0 - It doesn't do anything

  • 1 - Executing this results in the end of the binary. This is where the binary determines whether you have successfully finished the challenge.

  • 0x10 - Copies the value at the top of the stack (-0x2) and pushes to the stack at param_1[2], basically duplicating the value. The value at param_1[2] increases by 2.

  • 0x11 - Pops the stack, whose value at param_1[2] decreases by 2.

  • 0x12 - Adds the value at the top of the stack (-0x2) to the value of the item right below (-0x4), and store it at the offset (-0x4). Then pops the stack.

  • 0x13 - Subtracts the value at the top of the stack (-0x2) to the value of the item right after (-0x4), and store it at the offset (-0x4). Then pops the stack.

  • 0x14 - Swap the value at the top of the stack (-0x2) with one right after (-0x4).

  • 0x20 - This involves two stacks - the one at param_1[2] and param_1[3]. The top of the stack param_1[2] pops and pushes to the stack at param_1[3].

  • 0x21 - This involves two stacks - the one at param_1[2] and param_1[3]. The top of the stack param_1[3] pops and pushes to the stack at param_1[2].

  • [0x30] - Pops the top of the stack at param_1[2] and puts the offset at param_1[1]. This is where we increments and stores the value in line 20 of the function. It basically changes the flow of the program.

  • [0x31] - Pops the top of the stack at param_1[2] TWICE. If the second popped item has the value of 0x0000, then change the offset at param_1[1] to that of the first popped item.

  • 0x80 - Pushes the value of a byte to the stack param_1[2]. Increases the offset at param_1[1] by 2.

  • 0x81 - Pushes the value of two bytes to the stack param_1[2]. Increases the offset at param_1[1] by 3.

  • 0xc0 - Accepts the input from a user.

  • 0xc1 - Outputs whatever is on the stack param_1[2].

Most of the commands above only increment the offset at param_1[1] by a number from one to three at most, but ones that are surrounded by square brackets (0x30 and 0x31) drastically change the offset using the value stored on the stack param_1[2], which means that they kind of act like the JMP instruction in Assembly.

Okay, so basically the offset at param_1[1] starts with the value 0, which means that if you add the value 0 to the base address 0x1020f0, it results in the address of 0x1020f0. The very first command located at that address is the following:

Image description

  • There is a byte value of 0x81 at the address 0x1020f0. Revisiting the explanation on the command related to the byte 0x81, we know that it has to do with pushing the value of two bytes to the stack param_1[2]. So which value does it push to the stack? It combines the next two bytes (0x1020f1 => 0x75, 0x1020f2 => 0x00) after the current address 0x1020f0, pushes the very combined value 0x0075 to the stack param_1[2], and lastly, increases the offset at param_1[1] by 3, which results in the next command at the address 0x1020f3 => 0x80.

The very next command 0x80 at 0x1020f3 is also really similar to that of 0x81, albeit it pushes only the value of one byte located right after the given address to the top of the stack, and increases the offset at param_1[1] by 2. So why does it try to achieve from pushing a bunch of bytes to the stack? When you run the binary, it actually outputs the intro message before you have to input an answer. Basically, it is preparing to output the message Welcome to the M A T R I X\nCan you make it out alive?!

Image description

Once it fully pushes all the necessary bytes to the stack, it eventually reaches the address 0x102161, which holds the byte 0x81. It pushes the value 0x013b to the top of the stack, increases the offset by 3, and proceeds to the next command 0x30 at 0x102164. At this point, the very value at the top of the stack at param_1[2] is 0x013b. The command 0x30 pops the stack at param_1[2] to param_1[1]. The next command after this will be calculated by adding 0x013b at param_1[1] to the base address 0x1020f0, which results in the address 0x10222b.

Having learned how the binary works, one thing that comes to my mind is basically this: At which address does it accept our input? Let's go back to the dynamic analysis of the binary and check the address at which we wind up for providing our answer to the binary.

One way to figure out the offset at which the command for accepting an input from a user is to stop the process upon the invocation of the function that accepts the input! Upon going through Ghidra, we discover the following code.

Image description

  • Inside the main function FUN_001010d0, we discover the lines above, at which the pointers to some unknown functions are passed to the variables. Let's click each function and where we end up at.

Image description

Clicking the function FUN_00101320, we end up at the address 0x101320. Inside it, we discover the line that calls the function getc whose argument is stdin. The function getc alongside stdin as the argument causes the program to stop and accept an input from a user! So what we have to do right now is to stop the program upon the process calling the function getc. And then skipping a few steps, we end up finding out the exact address, at which the command for accepting an input from a user gets invoked.

Image description

Image description

The process stops upon entering the function getc. With that, I type the GDB command next multiple times, until the program accepts an input from a user. I type in asdfasdfasdfasdf and press ENTER to continue.

Image description

Eventually, the process exits the function getc and returns back to the function FUN_00101320. One thing to note is that the instruction for calling the function getc is located at 0x55555555532b inside the function FUN_00101320. Let's briefly go back to check out the line that is responsible for calling the function FUN_00101320. In order to do that, we go back to GDB and type the command si a few times, until we exit the function FUN_00101320.

Image description

We wind up at the address 0x5555555555c5. Going back to Ghidra and searching for any address that ends with 5c5, we find the following:

Image description

The last two screenshots contain the identical instructions, and upon clicking the instruction at the address of 001015c5 in Ghidra, we learn that the command c0 is responsible for accepting an input from a user.

Image description

Image description

Image description

Image description

The very next offset at param_1[1] after accepting an input from a user is 0x7c, and it is stored in the rax register. We know this for sure, for the fact that inside the function FUN_00101350, it passes the offset to the variable uVar2. When you click the variable uVar2 and checks the assembly code to see which line is associated with the line, you see that the instruction at the address 00101357 passes the value to the rax register.

Processing input from a user

Adding the offset 0x7c to the base address 0x1020f0 results in the address 0x10216c.

Image description

Let's basically go through what is going on above. The only valid characters that the binary accepts for an input are the followings:

  • u
  • d
  • l
  • r

Providing a character other than those above results in the binary outputting the message You were eaten by a grue. This is how it works.

  1. After accepting an input, the binary pushes the first character of your input to the stack at param_1[2].
  2. It first pushes the value 0x75, a hex value for the character u, to the top of the stack at param_1[2].
  3. Using the command 0x13, it subtracts the first character of your input from the value at the top of the stack, which is 0x75, pops the stack, and replaces the value at the top of the stack with the calculation result.
  4. It then pushes the new offset value that may potentially replace the offset at param_1[1] to the stack.
  5. Using the command 0x31, if the calculation result is 0, it replaces the offset at param_1[1] with the new offset value that was pushed to the stack at param_1[2] in the previous step. Lastly, the stack at param_1[2] pops twice. If the calculation result is not zero, proceeds to the next step:
    • u --> 0x102190
    • d --> 0x10219a
    • l --> 0x1021a4
    • r --> 0x1021b0
  6. The steps 2-5 repeat until a character matches 0x64 == d, 0x6c == l, or 0x72 == r. If not, you fail the challenge, and are greeted with the message You were eaten by a grue. Once the binary pushes the value 0x00FB to the stack at param_1[2], you can assure that you are doomed to fail the challenge, because the offset 0xFB added to the base address 0x1020f0 is the very address for printing out the message for the failure of the challenge!

A series of characters consisting of those characters? u means UP, d means DOWN, l means LEFT, and r means RIGHT. Yes, you are basically navigating the maze that a single wrong step leads to your demise.

Creating the script that simulates the binary

Image description

Image description

Image description

So far, we have extensively covered the binary. It's really grueling to manually go through every single command to fully keep track of how the binary works. Starting at the base address 0x1020f0, it ends at the address 0x1026c1, which means that there are about a thousand of the commands to go through to fully understand the inner-workings of the binary. Going through this manually might take forever to cover every little detail it contains. What about creating a Python script that simulates the inner-workings of the binary? That's what I set out to do.

Why creating a script that replicates the inner-workings of the binary? I can fully control and observe how the state of both stacks at param_1[2] and param_1[3] changes after processing each character, without the help of GDB, outside the VM.

In order to replicate the inner-workings of the binary, we have to sort of understand how the binary works right after it accepts an input from a user. Some of the things we will take into considerations are the followings:

  • The state of the stack at param_1[2] - With every character being processed by the binary, the binary produces a sort of accumulative results and saved them to the stack for future calculations.
  • The state of the stack at param_1[3], which is temporarily used for storing data from the stack at param_1[2].
  • How each command manipulates those two stacks above - The commands we discussed above directly manipulates the stack at param_1[2].

Here is how I came up with the script that simulates the binary:

  1. Parse all the available commands starting from the address of 0x1020f0 to the address of 0x1026c1, since those are what the function FUN_00101350 executes into the JSON file.
  2. Create a script that simulates the binary.

Creating a map of the maze inside the binary

The only allowed characters are u, d, l, and r, as an input, and we can presume that those are characters used to navigate the maze that somehow leads to the exit. So, if there is the maze, can we actually map it? One thing that we have to take into consideration is that any step that leads the offset at param_1[1] to become 0x00FB is to be avoided! Let's go back to Ghidra and go through the list of commands a bit to find all the the addresses with the value of fb.

Image description

Image description

  • There are a lot of 0xfb in a list of commands from the base address 0x1020f0 to 0x1026c1.
  • The first address with 0xfb is 0x0010218d. This address actually is not that relevant to the maze, since 0xfb at this address is pushed to the stack at param_1[2] upon finding out that one of the characters of an input of a user is not u, d, l, r.
  • The really interesting address with the value of 0xfb starts at the very next address 0x102265. The next values of 0xfb are found at the address 0x102269, 0x10226d, 0x102271, etc. 0xfb is placed next to each other at the offset of 4 bytes consistently, which continues until the address of 0x102667, the one with the last value of 0xfb.

Image description

Starting from the first relevant address 0x102265 that contains the value of 0xfb and scrolling down the list of commands, you notice that every value 0xfb is preceded by the value of 0x81, which makes sense since the command 0x81 is used to push the value of two bytes located right next to it. Basically, 0x81 pushes the value of 0x00fb to the stack, prior to the command 0x30 changing the value of the offset at param_1[1].

Image description

Okay, so what we are sure about is that the actual map may be starting from the address of 0x102264, which holds the value of 0x81. Increasing the address in increments of 4, you notice that each address ends up with either the value of 0x81 or 0x30. 0x30's are definitely safe, for the fact that it doesn't replace the offset at param_1[1] with the value of 0xFB.

Image description

Image description

The address 0x102664 is the last address in increments of 4 starting from the address 0x102264, before the last value of 0xFB at the address 0x102267. So if I were to subtract 0x102264 from 0x102664, I get the value of 1024 in decimal. Considering the fact that the binary increases the address in increments of 4, dividing 1024 by 4 results in the value of 256. If you multiply 16 by 16, you get 256. Are we dealing with a 16 * 16 grid? With that very conjecture, let's create a simple Python script for visualizing the maze.

Here is what I did:

  1. Create a two dimensional array for storing 16 arrays of 16 elements.
  2. Starting from the address at 0x102264, you evaluate the followings:
    • If the value at the address is 0x81, and the value at the very next address is 0xFB, you are dead. Mark it with -
    • If the value at the address is 0x81, and the value at the very next address is not 0xFB, you are alive. Mark it with O
    • If the value at the address is 0x30, you are alive. Mark it with O
  3. Increase the address in increments of 4, and repeat the step 2, until the last address of 0x102664.

When you run the script above and output the grid, you get the following result.
Image description

Image description

That looks like a breakthrough! So, where do we actually start the maze from? It's got to be the very first corner at the top left of the maze, and our goal is to get out of the maze through the only exit at the bottom right corner.

If we were to start from the top left, we can technically exit the maze with the following: rrrddrrrrrrddddddlllllddrrrrdddrruuuruuuuuuurrddddddddlddrd. Let's see if it works or not.

Image description

It is wrong! Frustrating. Let's check out why we got it wrong and how we can fix this, by analyzing the binary. The key to analyzing why we got it wrong is checking the states of those two stacks at param_1[2] amd param_1[3].

Analysis of the first failed attempt

Let's modify the script that simulates the binary for debugging purposes, by having a look at the stacks.

Image description

Image description

Image description

Image description

Image description

So moving right three times seems perfectly okay, and interestingly increments the value at the lowest item on the stack at param_1[2] by 1 with each move to the right. The problem occurs when we try to move downwards. Let's see what is being pushed at the very address that seems to be causing the problem. Starting from the very first item in the 2-D array we created for visualizing the grid, we wind up at the address 0x1022F4.

Image description

The address 0x1022F4 holds the value of 0x81, and it seems to push the value of 0x0574 and ultimately replace the offset at param_1[1] with it. Are there several addresses with the value of 0x0574 in the list of commands?

Image description

There are about 5 addresses between the address of 0x102264 and 0x102664 with the value of 0x74. Let's factor that into the map, fix the script, and see the difference between the previous one and the new one.

Image description

I marked every address that pushes 0x574 with the plus sign, and yes, they are all placed in the path to the exit. So, how do we possibly pass this? In midst of trying various methods, I find something very, very interesting.

Image description

So I know for sure that, from the starting point, I can move to the right the five times max, and when I do that, the third value from the lowest on the stack at param_1[2] increments by 1. Is the third value from the lowest on the stack the key to passing through the plus sign that we discovered earlier? Let's go back to the starting point, and see if we can move downward this time.

Image description

I was able to move downward with the input rrrrrlllllrrrdd! The third value from the lowest on the stack decrements by 1, whenever you go past the plus sign! So, since there are total five plus signs that we have to get past, in order to exit the maze, that means that the third value from the lowest on the stack must be at least 5. With that in mind, let's create the input and check if it works.

Image description

I provide the input rrrrrlllllrrrrrlllllrrrrrlllllrrrrrlllllrrrrrlllllrrrddrrrrrrddddddlllllddrrrrdddrruuuruuuuuuurrddddddddlddrd, and yes, it seems like I got it right! Let's try it on the binary being run on the server by picoCTF, and obtain the flag!

Image description

Yay! I got the flag, and submit it for the whopping 500 points!

This is the end of the walkthrough, but I feel like I didn't explain about how I came to the conclusions in some parts. I will review this post and revise and fill in some of the missing parts from time to time! Thanks for reading!

Link to the scripts I made for this challenge

Top comments (0)