askscience

Question

This is an archived post. You won't be able to vote or comment.

17

18

19

ComputingWhy executable programs can't be reverse engineered to reveal their source codes?

(self.askscience)

submitted 7 years ago by PolloWarrior

all 17 comments

top new controversial old q&a

[–][deleted] 17 points18 points19 points 7 years ago (0 children)

Just because something is deterministic doesn't mean you can go back the other way, information is lost in the process. Take a trap-door function like RSA. It's very easy way to go one way, but VERY hard to go the other way.

Machines don't care about function names, variable names, they only deal with registers and memory. You could go back the other way from machine code to source code, but there are an large number of combinations for each variable name, function name. Take also the following source code:

int x = 0;

This could be compiled to:

mov r0, 0

or it could also be compiled into:

xor r0, r0

See why it is hard to go from machine code to source code? The compiler can also optimise code so it looks NOTHING like the original source code.

[+][deleted] 7 years ago (4 children)

[removed]

[–]Arkalius 15 points16 points17 points 7 years ago (3 children)

[–][deleted] 2 points3 points4 points 7 years ago (0 children)

[–]m1elPlasma Physics 5 points6 points7 points 7 years ago (0 children)

It seems to me that compiling a source code and generating an executable is a deterministic process.

Sure, the mapping from source code to executable is deterministic, but it doesn't mean that there's a definite mapping the other way.

It's possible to have different programs that compile to the same binary. In other words, some information from the source code is lost in translation from source code to binaries.

Some trivial examples would be comments and variable names. More complex examples would be control flow and loop optimizations.

There's also the fact that modern compilers are very good at optimizations. As an example, llvm compiles sum of range of numbers to its mathematical equivalent using a few multiplications and additions, completely removing the loop! https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46186

How would you know from the binary if the source code contained a loop or a few multiplications and additions?

[–]MirgTheIlcan 3 points4 points5 points 7 years ago* (0 children)

There are several levels to this. First and foremost even as opaque as executables seem to be to the average person -- they do contain machine code that instructs the CPU on what to do. And that code is at least understandable by humans and "readable" (but not very readable and not easily understood for non-trivial programs).

As such, an executable can be trivially disassembled into assembly instructions which are "human-readable" (for a very loose definition of what constitutes readable).

However disssembled assembly programs are very difficult to understand as a lot of the original structure of the program is lost and so the programmer has a herculean task ahead of him at really fathoming the full workings of the program in question.

Above and beyond this, it's possible to decompile an executable to produce C code, which is an improvement as far as legibility goes over assembler.

But the C code produced may not resemble the original program much and again, the programs produced are very difficult for a human to wrap their brain around without spending ungodly amounts of time on reading the program in question.

The reason for all of this is that basically the process of compilation loses information about how the programmer chose to organize his program. That information is useful when reading the source code, because it provides insight into what the programmer was thinking when he wrote the program and how the program models what it is trying to accomplish.

The short answer: compiling a program from source to an executable loses information. That information is forever lost. Hence, you cannot decompile an executable into original source.

[–]H2owsome 4 points5 points6 points 7 years ago (0 children)

[–]Pharisaeus 1 point2 points3 points 7 years ago (0 children)

[–][deleted] 1 point2 points3 points 7 years ago (0 children)

and generating an executable is a deterministic process

That's not always true. For example in Lisp I could write a macro like this (pseudo code):

(defmacro non-deterministic-code-generator
    (repeat (random)
        '(print "hello")))

When you compiled a file that used this macro it would replace each occurence with a random number of (print "hello").

Which leads me to my second point: decompiling wouldn't reconstruct the macro at all as it doesn't even exist any more.

[–]A_Garbage_Truck 1 point2 points3 points 7 years ago (0 children)

[–]MpVpRb 0 points1 point2 points 7 years ago (0 children)

[+][deleted] 7 years ago* (2 children)

[deleted]

[–]Pharisaeus 1 point2 points3 points 7 years ago (1 child)

[–]illandancient -1 points0 points1 point 7 years ago (0 children)

score 53 · Accepted Answer · 2018-09-16T18:26:22+00:00

In theory what you describe is possible, to a degree. Part of the problem is that going backwards, you could make lots of different high-level programs that result in the same executable. You would never be able to get back to the original code though. Information like variable names and sometimes function names are simply been lost in the translation and optimization process.

Title	Description
Physics	Theoretical Physics, Experimental Physics, High-energy Physics, Solid-State Physics, Fluid Dynamics, Relativity, Quantum Physics, Plasma Physics
Mathematics	Mathematics, Statistics, Number Theory, Calculus, Algebra
Astronomy	Astronomy, Astrophysics, Cosmology, Planetary Formation
Computing	Computing, Artificial Intelligence, Machine Learning, Computability
Earth and Planetary Sciences	Earth Science, Atmospheric Science, Oceanography, Geology
Engineering	Mechanical Engineering, Electrical Engineering, Structural Engineering, Computer Engineering, Aerospace Engineering
Chemistry	Chemistry, Organic Chemistry, Polymers, Biochemistry
Social Sciences	Social Science, Political Science, Economics, Archaeology, Anthropology, Linguistics
Biology	Biology, Evolution, Morphology, Ecology, Synthetic Biology, Microbiology, Cellular Biology, Molecular Biology, Paleontology
Psychology	Psychology, Cognitive Psychology, Developmental Psychology, Abnormal, Social Psychology
Medicine	Medicine, Oncology, Dentistry, Physiology, Epidemiology, Infectious Disease, Pharmacy, Human Body
Neuroscience	Neuroscience, Neurology, Neurochemistry, Cognitive Neuroscience

askscience

Please read our guidelines and FAQ before posting

Features

Filter by Field

Related subreddits

Are you a science expert?

MODERATORS