DEV Community

Cover image for Can the Source codes be reproduced from Machine codes, if lost?
Ifeanyi Okeakwalam
Ifeanyi Okeakwalam

Posted on

Can the Source codes be reproduced from Machine codes, if lost?

When i started by career as a developer 10 years ago i often as myself a simple question, "ifeanyi can you regenerate your source code from its byte code". The answer to me then was pretty straight, i felt this was technically impossible because i have never heard of a tool that can help me do that.

Before i div into this topic properly i would like to explain some basic concept, we all understand that in computing, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine.

We also know that Source code is the fundamental component of a computer program that is created by a programmer. It can be read and easily understood by a human being. When a programmer types a sequence of C programming language statements into Windows Notepad, for example, and saves the sequence as a text file, the text file is said to contain the source code. 

Now the big question is, why can't programmers figure out the source code or basic algorithm of a program based on the machine code?

Because it’s often illegal. The 9/11 destruction of the Twin Trade Towers caused Morgan Stanley to lose the source code of their flagship financial application written in Visual Age Smalltalk (VAST).

Over the years, a couple of people had written decompilers for VAST but had been threatened with legal action by IBM. Smalltalk-80 (descending from Xerox) has always included a decompiler, and Dan Ingalls extended the implementation to record local variable names, the only names that can’t be inferred from the bytecode.

That way, one could use the system without a source file, and without the ability to record comments. But VAST, descending from Smalltalk-V, never had a decompiler. A colleague of mine then implemented a decompiler that was used to recover a good portion of the Morgan Stanley application.

Recovering source from machine code is typically more difficult than from Smalltalk bytecode, and if symbolic information has been stripped from an executable, then names will have to be invented. But there is no theoretical impossibility here.

A processor “makes sense” of machine code when it executes it. A decompiler, likewise, makes sense of the machine code, but does so by constructing some form of parse tree, which can then be printed as source. So the real issue is the legality of the process, not its technical feasibility. If one doesn’t have the right to decompile (a form of reverse engineering), one is taking the risk of prosecution by doing so.

Hence, the answer to this question is pretty straight forward at this junction. Source codes can be reproduced from machine code and this can be achieved by using Decompilers. Decompilers are usually unable to perfectly reconstruct the original source code, thus frequently will produce obfuscated code. Nevertheless, decompilers remain an important tool in the reverse engineering of computer software.

Just imagine that you can regenerate the source code from any machine code you can lay your hands on, imagine the super power to reproduce the source code of your favourite software, sounds very illegal right, lol. Reverse engineering is generally legal. In trade secret law, similar to independent developing, reverse engineering is considered an allowed method to discover a trade secret. However, in patent law, because the patent owner has exclusive rights to use, own or develop the patent, reverse engineering is not a defense.

If you gained something from this article, take a minute of your time, hit the share button and share this piece with your network on social media, also give me a follow on all social media platform via @ifycoool .

Visit my official blog via blog.ifeanyiokeakwam.com

Top comments (0)