## DEV Community

Gealber Morales

Posted on • Originally published at gealber.com

# Challenge RE #8

At this point we don't need further introduction, I'll bring you my solution to the 8th challenge.

## Description

The description of the challenge can be found in Challenge RE #8. In the previous ones I forgot the share the link, my bad.

## Introduction

Assembly code to understand

``````
<f>:
0:       push   r12
2:       test   rsi,rsi
5:       mov    r12,rdi
8:       push   rbp
9:       mov    rbp,rdx
c:       push   rbx
d:       mov    rbx,rsi
10:       je     32
12:       nop    WORD PTR [rax+rax*1+0x0]
18:       mov    rsi,QWORD PTR [rbx]
1b:       mov    rdi,rbp
1e:       call   r12
21:       test   eax,eax
23:       je     56
25:       js     40
27:       mov    rbx,QWORD PTR [rbx+0x18]
2b:       test   rbx,rbx
2e:       xchg   ax,ax
30:       jne    18
32:       pop    rbx
33:       pop    rbp
34:       xor    eax,eax
36:       pop    r12
38:       ret
39:       nop    DWORD PTR [rax+0x0]
40:       mov    rbx,QWORD PTR [rbx+0x10]
44:       test   rbx,rbx
47:       je     32
49:       mov    rsi,QWORD PTR [rbx]
4c:       mov    rdi,rbp
4f:       call   r12
52:       test   eax,eax
54:       jne    25
56:       mov    rax,rbx
59:       pop    rbx
5a:       pop    rbp
5b:       pop    r12
5d:       ret
``````

This time `f` it's way more longer than in our previous challenges, instead of complaining about this and give up at the first moment. It's better to try to understand small pieces of this code.

Let's start from instructions in memory 0 to instruction in memory 0x10. From these instructions we can notice, that our parameters came from registers `rdi`, `rsi` and `rdx`. We perform a check on one of them, to see if it's equal to 0.

Now interesting enough, we have that in `rdi` we receive a callback function. If you take a look, `rdi` it's moved to `r12` who later it's been called on instruction in 1e. For that reason we can infer, that one of the arguments passed to `f` it's a callback function.

What can we tell about the signature of this callback function? Well tracking the use of register `r12` you can notice that the function takes two arguments, `rsi` and `rdi` and return the value int `eax`. Every `call r12` it's followed by a check on `eax`, where if we have zero or NULL value we jump to the end of the program, otherwise we jump to position 0x25. Looking at the use of its arguments, we can take a save assumption that the first one can be number while the second...well the second one is not so straight forward, let's take a look at its use in the program.

### Identifying `rbx` layout

The second argument it's been taken from `rbx`, this register it's only modified once, when we copied `rsi` into it on instruction `mov rbx,rsi`. After that we've been calling it in this three times:

``````;; first time
mov rsi,QWORD PTR [rbx]

;; more down in position 0x27
mov rbx,QWORD PTR [rbx+0x18]

;; more down in position 0x40
mov rbx,QWORD PTR [rbx+0x10]
``````

In the last two cases, we are overwriting the value of `rbx`, with `QWORD PTR [rbx+0x18]` and `QWORD PTR [rbx+0x10]`. The padding or offset from `rbx` it's 0x18 = 24 and 0x10 = 16, so we can assume the data stored in `rbx` has to following arrangement.

``````[0......|0x10.....|0x18....]
``````

And the data in `[rbx+0x10]` and `[rbx+0x18]` can overwrite `rbx`. I don't know you, but two me this smells like a binary tree, where `[rbx+0x10]` and `[rbx+0x18]` are the left and right children of the node `rbx`.

This is a huge insight. If we are on the right path, we can be looking at a binary search here, but's let's not take hasty conclusions.

### First resume

At the moment, we have that:

1. `r12` contains a callback function
2. `rbx` contains a binary tree

With this in mind we can start declaring our `Node` structure and our callback function.

``````typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;

// signature of callback function
int g_callback(int val, Node *n);
``````

### Signature of f

We already know that the arguments for `f` comes from registers:

1. `rdi`, which is a callback function
2. `rsi`, which is our binary tree
3. `rdx`, which it's data of a node of our binary tree, let's assume the data are just numbers, for simplicity. In any case later we can check if this is the case.

What it's missing it's our returned value. For this, looking at the last `ret` you can notice that what we return it's actually the value in `rbx` which hold a `Node` of our binary tree.

With this in mind we can assume the following signature for `f`

``````
Node *f(int (int, Node *), Node *, int);

``````

### Main logic

With the signature, will be easier to infer the logic in the program. Having a clear idea of what are the structures involved also help us a lot. I mean, we haven't even looked at the logic, and you might be guessing that this has to be some kind of search in a binary tree. Maybe I'm the only one imagining that, given that I'm my only user in my blog π. Anyway let's continue.

Let's gather all our finding and put it in C code

``````typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;

int g_callback(int val, Node *n);

Node *f(int cb(int, Node *), struct Node *n, int data)
{
return NULL;
}

``````

Ok, this looks good. Does it? Don't doubt on yourself great Gealber, this is amazing.

Let's analyze the logic by small chunks, first chunk

``````  18:       mov    rsi,QWORD PTR [rbx]
1b:       mov    rdi,rbp
1e:       call   r12
21:       test   eax,eax
23:       je     56
25:       js     40
;; in 56 we just return the Node
``````

Nothing fancy, now that we know more about the layout of `rbx`, we know it's a `Node`, so this code can be interpreted as a call to to the callback function we passed as an argument. In C code would be:

``````int ret = cb(n, data);
if (ret == 0)
return n;
if (ret < 0) {
// do something
}
``````

Second chunk of code...maybe chunk is not the right word Gealber π€

``````  27:       mov    rbx,QWORD PTR [rbx+0x18]
2b:       test   rbx,rbx
2e:       xchg   ax,ax
30:       jne    18
``````

In this case, we first overwrite our current node for our right child, and later perform a check on it, followed by a jump to 18 if it's not `NULL`. Interesting, the code in 0x18 it's our previous chunk of code. This seems as a recursion call, because the code in 0x18 it's actually the start of our `f` method. Given the nature of binary trees and the operations we normally perform with them, is not a crazy assumption to say that this is a recursion call.

Taking this into C, would be like this

``````Node *f(int cb(int, Node *), struct Node *n, int data)
{
int ret = cb(n, data);
if (ret == 0)
return n;

if (ret < 0) {
// do something
}

n = n->right;
if (n != NULL)
n = f(cb, n, data);

return NULL;
}
``````

The next most important part, is the code in 0x40. This code it's quite similar to our previous one, just that in this case we check with our left node. The way to reach this code is from instruction 0x25, with `js 40`.

A way to put this in C would be

``````n = n->left;
if (n != NULL)
n = f(cb, n, data);
``````

We already can make a correct forecast of what this function does, in a simple way this code performs a search in a binary tree, with a supplied comparison callback function.

Let's put all this together in a clean way

``````typedef struct Node
{
int data;
struct Node *left;
struct Node *right;
} Node;

// f performs a search in a binary tree
// using cb callback function as our comparison data
Node *f(int cb(int, Node *), struct Node *n, int data)
{
int ret = cb(n, data);
if (ret == 0)
return n;

if (ret < 0) {
return f(cb, n->left, data);
}

return f(cb, n->right, data);
}
``````

## Final notes

Take into account that `data` doesn't has to be an int, given that we are supplying a comparison function we can declare data as `void *`. As long as the comparison function can handle it, won't be a problem.

As a personal note, I found this challenge quite instructive, not because it taught me a new data structure. What I found valuable it's that it showed me how a structure is defined in assembly, basically there's just an offset between field and another. The more I do this challenges the more I realize that C is so close to assembly π.