DEV Community

maximilian feldthusen
maximilian feldthusen

Posted on • Edited on

Howto turn a x86 binary executable back into C source code

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 }

Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 

Enter fullscreen mode Exit fullscreen mode

becomes:


temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

  • Objective: turn a x86 binary executable back into C source code.

  • Understand how the compiler turns C into assembly code.

  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 

Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode
becomes
   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:

for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

  • Objective: turn a x86 binary executable back into C source code.
  • Understand how the compiler turns C into assembly code.
  • Low-level OS structures and executable file format.

Arithmetic Instructions

mov eax,2 ; eax = 2 
mov ebx,3 ; ebx = 3
add eax,ebx ; eax = eax + ebx 
sub ebx, 2 ; ebx = ebx - 2
Enter fullscreen mode Exit fullscreen mode

Accessing Memory

mox eax, [1234] ; eax = *(int*)1234 
mov ebx, 1234 ; ebx = 1234 
mov eax, [ebx] ; eax = *ebx 
mov [ebx], eax ; *ebx = eax 

Enter fullscreen mode Exit fullscreen mode

Conditional Branches

cmp eax, 2 ; compare eax with 2 
je label1 ; if(eax==2) goto label1
 ja label2 ; if(eax>2) goto label2
jb label3 ; if(eax<2) goto label3 
jbe label4 ; if(eax<=2) goto label4
 jne label5 ; if(eax!=2) goto label5
 jmp label6 ; unconditional goto label6

Enter fullscreen mode Exit fullscreen mode

Function calls

First calling a function:
call func ; store return address on the stack and jump to func
The first operations is to save the return pointer:

pop esi ; save esi 
Right before leaving the function:
pop esi ; restore esi
ret ; read return address from the stack and jump to it 
Enter fullscreen mode Exit fullscreen mode

Modern Compiler Architecture

C code --> Parsing --> Intermediate representation --> optimization -->
Low-level intermediate representation --> register allocation --> x86 assembly

High-level Optimizations

Inlining

For example, the function c:

int foo(int a, int b){
     return a+b }
 c = foo(a, b+1) 
Enter fullscreen mode Exit fullscreen mode

translates to

c = a+b+1
Enter fullscreen mode Exit fullscreen mode

Loop unrolling

The loop:

for(i=0; i<2; i++){
      a[i]=0;
 } 
Enter fullscreen mode Exit fullscreen mode

becomes

   a[0]=0; 
   a[1]=0; 

Enter fullscreen mode Exit fullscreen mode

Loop-invariant code motion

The loop:
for (i = 0; i < 2; i++) {
 a[i] = p + q; 
} 
Enter fullscreen mode Exit fullscreen mode

becomes:

temp = p + q;
for (i = 0; i < 2; i++) {
    a[i] = temp;
}

Enter fullscreen mode Exit fullscreen mode

Common subexpression elimination

The variable attributions:

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

a = b + (z + 1)
p = q + (z + 1)
Enter fullscreen mode Exit fullscreen mode

becomes

temp = z + 1
a = b + z
p = q + z

Enter fullscreen mode Exit fullscreen mode

Constant folding and propagation

The assignments:

a = 3 + 5
b = a + 1
func(b)
Enter fullscreen mode Exit fullscreen mode

Becomes:

func(9)

Enter fullscreen mode Exit fullscreen mode

Dead code elimination

Delete unnecessary code:

a = 1
if (a < 0) {
printf(ERROR!)
}
Enter fullscreen mode Exit fullscreen mode

to

a = 1

Enter fullscreen mode Exit fullscreen mode

Low-Level Optimizations

Strength reduction

Codes such as:

y = x * 2
y = x * 15
Enter fullscreen mode Exit fullscreen mode

Becomes:

y = x + x
y = (x << 4) - x

Enter fullscreen mode Exit fullscreen mode

Code block reordering

Codes such as :

if (a < 10) goto l1
printf(ERROR)
goto label2
l1:
    printf(OK)
l2:
    return;
Enter fullscreen mode Exit fullscreen mode

Becomes:

if (a > 10) goto l1
printf(OK)
l2:
return
l1:
printf(ERROR)
goto l2

Enter fullscreen mode Exit fullscreen mode

Register allocation

  • Memory access is slower than registers.
  • Try to fit as many as local variables as possible in registers.
  • The mapping of local variables to stack location and registers is not constant.

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

Enter fullscreen mode Exit fullscreen mode

Instruction scheduling

Assembly code like:

mov eax, [esi]
add eax, 1
mov ebx, [edi]
add ebx, 1
Enter fullscreen mode Exit fullscreen mode

Becomes:

mov eax, [esi]
mov ebx, [edi]
add eax, 1
add ebx, 1

Enter fullscreen mode Exit fullscreen mode

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay