DEV Community

maximilian feldthusen
maximilian feldthusen

Posted on • Edited on

Howto turn a x86 binary executable back into C source code

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 }

Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 

Enter fullscreen mode Exit fullscreen mode

becomes:


temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

  • Objective: turn a x86 binary executable back into C source code.

  • Understand how the compiler turns C into assembly code.

  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 

Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode
becomes
   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:

for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

Enter fullscreen mode Exit fullscreen mode

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

Enter fullscreen mode Exit fullscreen mode

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay