DEV Community

Aviral Singh
Aviral Singh

Posted on

Are binaries really executable code ?

What is the difference between a binary file and an executable( machine code )?

so this questions has been in my head for a long time and they are almost used interchangabely right but are they so lets find out ..and i want to learn this from the very first principle . so in order to understand this we would use C and cause its simple and fun to mess around with without unnecessary abstractions . so first we would need to get some fundamentals clear that i am rusty on .

to answer this we need to understand how c actually turns your code into something you can run — and that journey is where the answer lives.

char text[100] = "hello";
char other[100] = "hello";
// this is a simple declaration in c for declaring 2 array of char 
// This compares addresses, NOT content - always false
if (text == other) {
 printf("match"); // never runs
}
// This compares the actual characters - works correctly
if (strcmp(text, other) == 0) {
 printf("match"); // runs!
}
Enter fullscreen mode Exit fullscreen mode

strcmp? what is the full form ? and what about strncpy? i kind of wonder what these are made of ? ← FIXED: added strncpy mention here

ok lets try to look at the source code of strncpy i hopes its called .. ← FIXED: says strncpy now

/* Copyright © 1991–2018 Free Software Foundation, Inc. This file is part of the GNU C Library. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, see http://www.gnu.org/licenses/. */

please dont sue me or anything i know i dont own any of this i am just an idiot that is it and if you do please tell me before so i can delete it

#include <string.h>
#undef strncpy
#ifndef STRNCPY
 #define STRNCPY strncpy
#endif
Enter fullscreen mode Exit fullscreen mode

what is this ? what is a preprocessor directive anyways ? what is it telling the compiler ?


char *
STRNCPY (char *s1, const char *s2, size_t n)
{
 size_t size = __strnlen (s2, n);
 if (size != n)
 memset (s1 + size, '\0', n - size);
 return memcpy (s1, s2, size);
}
libc_hidden_builtin_def (strncpy)
Enter fullscreen mode Exit fullscreen mode

so here is the first thing that is bugging me fom this example .what the hell is a preprocessor directive ? so what is a preprocessor ? a preprocessor is like a butler that formats and changes the code into a more “refined” version of the code that is then fed to the compiler . so that would mean that if the preprocessor is not related to the compilation of the language then its not in the compilation of the code . making this # stuff something that would help the compiler do its job somehow . so here is what happens in c during the compilation and before the compilation -

the preprocessor just copy pastes all the headers that you have imported in the code in the main c program .. i am not even kidding its just a big copy and paste if you have used a library like #include or something its gonna literally copy and paste all the functions that are in that header file into your main program . to see this with your own eyes write a program and import all the headers that you want and then just write this in the terminal

gcc -E main.c

i am assuming that your file name is main.c alright dont be dumb like me as to copy and paste the entire line for your file name and then thinking why doesnt it work this -E only activates the preprocessor and hence you can see the copy and paste happening with your own eyes . then whatever the remaining of the code is there usually in one file is gonna be fed to the compiler for turning that into machine code .

one thing really caught my eye is that c doesnt allow you to have two functions definitions even though they are literally the same in terms of definition the c compiler will throw an error as soon as it comes across . now this might look harmless cause well we wont define a functions twice right but it has stuff that can be dangerous in the grand scheme of things . the preprocessor will copy the functions right it doesnt care if they are the same or not its just does copy and paste and some of the other simmilar stuff that we will talk about in a sec . so before we talk about it i want you to answer a question for me — Q. I just said that the compiler wont let you define something twice right? and the implications of that is that lets say that i use a popular header in the c program right something like stdio.h right and lets say that another header file that i am using in my code also have that … then shouldnt it give out an error ? lets try it — ← FIXED: “declare” → “define”

this is the main.c right and here i am importing some of the header file without particularly using them but that is not the point alright-

#include <stdio.h>
#include <stdlib.h>
#include "math_my.h"
int main(){
printf("%d\n",add(4,6));
// the add function is defined in the math_my.h 
//this is defined below ..
}
Enter fullscreen mode Exit fullscreen mode

this is the math_my.h

#include <stdio.h>
int add(int a , int b ){
 return a+b;
}
Enter fullscreen mode Exit fullscreen mode

now notice something that i told you all before and that is that the is included twice and yet the compiler wont give out any errors ? why is that ? and this question is really interesting … ← FIXED: “declared” → “included”

Learn about Medium’s values
here look at the compilation message

avirals554@Avirals-MacBook-Air c-term % gcc explore.c -o explore
avirals554@Avirals-MacBook-Air c-term % ./explore
10
42
look it compiled ? but why is that ? how ? i mean it shouldnt right ? but why did it ? even though we are literally using the stdio.h twice ? hmmm… ok lets read the stdio.h itself alright

avirals554@Avirals-MacBook-Air c-term % cat /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdio.h
i just hope i dont get copyright violation or something . that is why i have added this so the important thing in all of this is this like #include <_stdio.h> so this is importing another header file ? and what are these stuff ??


#ifdef _USE_EXTENDED_LOCALES_
#include <xlocale/_stdio.h>
#endif /* _USE_EXTENDED_LOCALES_ */
Enter fullscreen mode Exit fullscreen mode

hmm iteresting .. lets read the _stdio.h file right … ok apart from the copyright stuff this is what i am able to extract from this file -

#ifndef _STDIO_H_
#define _STDIO_H_
void clearerr(FILE *);
int fclose(FILE *);
int feof(FILE *);
int fgetc(FILE *);
int fprintf(FILE * __restrict, const char * __restrict, );
int fputc(int, FILE *);
int scanf(const char * __restrict, );
int printf(const char * __restrict, );
#include <_types.h>
#include <sys/_types/_va_list.h>
#include <sys/_types/_size_t.h>
#if __DARWIN_C_LEVEL >= 200809L
 ssize_t getline();
#endif
Enter fullscreen mode Exit fullscreen mode

alright i get some of it 2nd one is the declaration of the functions that it has right but not the definitions themselves ? so where is it getting the actual definitions from ? i mean if the preprocessor is just gonna copy and paste this then … where does it get the definitions of these functions ? …

so here is the fun thing . the compiler actually compiles this as it is like this into the machine code meaning the compiler doesnt get the definition of those functions at all … so you might ask wait …. then how does the code works at all just from the declarations of those functions ?? doesnt it needs to get those definitions ? it does it AFTER the compilation.. through a little thing called a linker .. which finds those definitions and in this there is a file that i found named this — libSystem.dylib and i am not sure what this library has accurately but my best guess is that its must have those definitions but as its not in the .h format they are stored in some other format . now here is what bugs me if the code is already compiled then it cannot be in the c code right ? i mean it must be in the machine code cause the compilations is already done . now i need to find the step just before the linker and ask the gcc to show my that only , so the process is like this -

code → preprocessor → compiler → linker → executable ← FIXED: “machine_code” → “executable”

alright we know what is (2) right ? from this command that we talked about above — gcc -E main.c

alright so this is a code that is what the compiler sees right and as i said about the same name problem so i am guessing that the compiler dont have a problem with declarations being done twice — and that is the key insight. stdio.h only contains declarations (just the list of functions without the actual code), and you can declare something as many times as you want. What you cant do is define it twice — the full function with the curly braces and the code inside. Thats what would cause an error. ← FIXED: clarified the declaration vs definition point

ok but to know that i need to see what the compiler produces right ? so i looked it up and got this -

gcc -c main.c

this creates a .o file with the same name so we get a main.o file and so if i try to read it via cat i am getting garbage meaning its a machine code probably ,so we will use this -

objdump -d explore.o

this objdump thing is just something that can apperently read BINARIES, and if you see the topic of this blog i am asking the difference between binaries and executable so here we are .. this .o file is an incomplete binary. meaning that it still does the thing was in the code that the preprocessor gave it but doesnt really have the definitions yet . so we must read it the .o file i mean .. so after running the command i get this -


objdump -d explore.o
explore.o: file format mach-o arm64
Disassembly of section __TEXT,__text:
0000000000000000 <ltmp0>:
 0: d10043ff sub sp, sp, #0x10
 4: b9000fe0 str w0, [sp, #0xc]
 8: b9000be1 str w1, [sp, #0x8]
 c: b9400fe8 ldr w8, [sp, #0xc]
 10: b9400be9 ldr w9, [sp, #0x8]
 14: 0b090100 add w0, w8, w9
 18: 910043ff add sp, sp, #0x10
 1c: d65f03c0 ret
0000000000000020 <_square>:
 20: d10083ff sub sp, sp, #0x20
 24: a9017bfd stp x29, x30, [sp, #0x10]
 28: 910043fd add x29, sp, #0x10
 2c: b81fc3a0 stur w0, [x29, #-0x4]
 30: b85fc3a0 ldur w0, [x29, #-0x4]
 34: b85fc3a1 ldur w1, [x29, #-0x4]
 38: 94000000 bl 0x38 <_square+0x18>
 3c: b9000be0 str w0, [sp, #0x8]
 40: b9400be8 ldr w8, [sp, #0x8]
 44: 11008908 add w8, w8, #0x22
 48: b9000be8 str w8, [sp, #0x8]
 4c: b9400be0 ldr w0, [sp, #0x8]
 50: a9417bfd ldp x29, x30, [sp, #0x10]
 54: 910083ff add sp, sp, #0x20
 58: d65f03c0 ret
000000000000005c <_main>:
 5c: d10103ff sub sp, sp, #0x40
 60: a9037bfd stp x29, x30, [sp, #0x30]
 64: 9100c3fd add x29, sp, #0x30
 68: 52800008 mov w8, #0x0 ; =0
 6c: b81ec3a8 stur w8, [x29, #-0x14]
 70: b81fc3bf stur wzr, [x29, #-0x4]
 74: b81f83a0 stur w0, [x29, #-0x8]
 78: f81f03a1 stur x1, [x29, #-0x10]
 7c: 52800080 mov w0, #0x4 ; =4
 80: b9000fe0 str w0, [sp, #0xc]
 84: 528000c1 mov w1, #0x6 ; =6
 88: 94000000 bl 0x88 <_main+0x2c>
 8c: 910003e9 mov x9, sp
 90: aa0003e8 mov x8, x0
 94: f9000128 str x8, [x9]
 98: 90000000 adrp x0, 0x0 <ltmp0>
 9c: 91000000 add x0, x0, #0x0
 a0: f9000be0 str x0, [sp, #0x10]
 a4: 94000000 bl 0xa4 <_main+0x48>
 a8: b9400fe0 ldr w0, [sp, #0xc]
 ac: 94000000 bl 0xac <_main+0x50>
 b0: aa0003ea mov x10, x0
 b4: f9400be0 ldr x0, [sp, #0x10]
 b8: 910003e9 mov x9, sp
 bc: aa0a03e8 mov x8, x10
 c0: f9000128 str x8, [x9]
 c4: 94000000 bl 0xc4 <_main+0x68>
 c8: b85ec3a0 ldur w0, [x29, #-0x14]
 cc: a9437bfd ldp x29, x30, [sp, #0x30]
 d0: 910103ff add sp, sp, #0x40
 d4: d65f03c0 ret
Enter fullscreen mode Exit fullscreen mode

now this is a machine code right ? but its very very small and you can kind of see the tags as main , square and all that

`88: 94000000 bl … ← call to add()
a4: 94000000 bl … ← call to printf()
ac: 94000000 bl … ← call to square()
c4: 94000000 bl … ← call to printf()`
Enter fullscreen mode Exit fullscreen mode

so i just googled where in this code are the “call” to the functions in that library right and here they are these are literally the stuff that are getting called . but notice something else as they are getting called its calling a machine code .. that would mean that all the code is compiled to machine code and our code is just calling it .. making it an EXECUTABLE get it ? that is the difference between an executable and a binary .

a binary is any file that contains machine-readable bytes — not human-readable text. your .o file is a binary. a JPEG image is a binary. a PDF is a binary. none of them are text.

an executable is a specific type of binary — one that is complete and ready to run as a program. the .o file is binary but NOT executable because it still has holes (those 94000000 placeholders). after the linker fills in those holes, THATS when it becomes an executable.

every executable is a binary. but not every binary is an executable. ← FIXED: expanded the ending

ps : the definitions are more nuanced than i have just told right cause there are many many types of binaries right

Top comments (0)