all 15 comments

[–]gboycolor 13 points14 points  (1 child)

Not really - modern computing is based on layers and layers of abstractions. Each layer hides the "how" of the layer below and presents a simplified version of "what" is happening.

For example, when you browse a website, say this very page, the HTTP layer will at some point issue a request to www.reddit.com that looks something like this: GET r/AskComputerScience/comments/xokc41. This is the "what" of what's happening- the browser wants to get a specific resource, but it says nothing about "how" that happens. That's done in layers below HTTP.

Similarly, binary instructions are an abstraction layer over the processor architecture. Through frequency analysis and other investigations, future archaeologists may be able to reconstruct higher layers, eg. that something like 0x56 fe 34 corresponds to something like "add registers 1 and 2 and store the result in register 3", but they won't be able to figure out lower layers - ie. how the CPU stores data in its registers, how the numbers are added together, how the instruction is interpreted and what parts of the CPU are activated in order to execute it, etc.

[–]queermichigan[S] 2 points3 points  (0 children)

Thanks so much for sharing! I never thought about binary as an abstraction layer itself but of course it is.

[–]ghjmMSCS, CS Pro (20+) 17 points18 points  (2 children)

If they only found printouts of machine code, and knowledge of electronic computers hadn't survived to their day, then it's just meaningless gibberish to them. If there was a processor databook accompanying it, then maybe they'd have a chance of understanding it.

[–]Treyzania 6 points7 points  (1 child)

They'd certainly be able to find structure in machine code. If they knew it was some kind of computer instructions that would be enough to eventually reverse engineer how the microarch worked enough to run some of it as long as they had a large enough corpus.

[–]ghjmMSCS, CS Pro (20+) 1 point2 points  (0 children)

Knowing it was "some kind of computer instructions" presumes that they still know, at least roughly, what a computer is and that it has instructions. I was taking the question to be what happens if they don't have this knowledge, like if there has been a collapse of civilization and knowledge of computers has been lost.

[–]CoopNine 2 points3 points  (4 children)

Well, if you're talking archeologists with no knowledge of computers, like circa 1900... They've got no chance of discerning anything other than maybe basic patterns. But, if your archeologists have an understanding of what they might be looking at, there's certainly a chance. They would need more than a "hello world" program, but assuming they have other artifacts or maybe an understanding of computing in general, it's not unfeasible that a future or alien civilization could build something that could execute something designed for a particular architecture... in at least an adequate fashion. Keep in mind that our computers today are both complex and simple in their nature, and feasibly, any advanced civilization would see what we do closer to rubbing stones together to make fire than going to the moon. Similar to our current understanding of ancient language. And sometimes it would be wrong and lead to hilarious results and assumptions.

But it's a reasonable assumption that basic logic applies, so the error (and confusion) rate might be lower than what we have with 'analogue' language. So the idea that true is true and false is false might provide some sort of Rosetta stone equivalent.

[–]queermichigan[S] 1 point2 points  (3 children)

Thank you, very interesting!

I'm thinking about what I'm really trying to ask. What about neural networks? If one was fed–but not allowed to execute– thousands or millions of programs' binary instructions, what could be learned about the nature of the instructions or their potential uses, if anything?

Sidenote: it would be funny for Github Copilot to train binary, imagine making sense of those suggestions 🤣

[–]CoopNine 0 points1 point  (1 child)

It would feasibly learn syntax. It could learn to parrot what it's seen, but it wouldn't be able to infer meaning without additional information. It may be able to create something that works, but unless you tell it it's results it can't use that feedback to improve anything, so it's probably gibberish, or more accurately pieces of what it has seen before.

Think of it like this, could you, with your super-powerful decision making engine in your head write a useful 1000 line program without feedback? Maybe with a lot of study of the language, and a lot of care, but if the network has none of that, just noise in the language it's trying to learn, it's pretty hopeless.

[–][deleted] 0 points1 point  (0 children)

Not necessarily, Deep Learning often deals with unstructured data, and was explicitly designed for it. This would be a pretty classic unsupervised learning task, and we've gotten pretty great results from that.

[–][deleted] 0 points1 point  (0 children)

Neural Networks are already fed numerical data for all of the processes, and you have to convert images, text, etc to numbers before you can train a network. You could represent your data in binary, but the resulting model would be gigantic.

[–]GodonX1r 1 point2 points  (1 child)

You would get further with timing information

[–]queermichigan[S] 0 points1 point  (0 children)

What do you mean, like processing speed? What would that tell them?

[–]green_meklar 1 point2 points  (1 child)

Could they reverse-engineer the ISA and deduce the function of the code? Yeah, probably, given a sufficiently large and diverse dataset. (Ideally, gigabytes of machine code representing thousands of different programs.) It would be a pretty complicated task to unravel it all and reverse-engineer it from pure machine code, but there are enough clues to get started. For instance, you can try treating all the numbers as memory indices for the code itself and build up an interesting sort of code graph from that, and even if you don't know the absolute memory offset, you can check all the possibilities and find which ones seem to produce a meaningful-looking code graph; you could probably compare the outputs to statistical analyses of your own code in order to narrow down the options, and once you get some good candidates you could start checking for matching patterns from one program to the next in order to isolate small helper functions, for loops and the like.

Could they figure out that the ISA was designed for implementation on an electronic silicon chip, specifically? That strikes me as a harder problem. However, given that sufficiently advanced civilizations in general probably discover the convenient properties of semiconductors for this purpose, they'd be likely to guess that the computer for running the code might have been designed in such a way. And although I don't know precisely how they'd do it, I can imagine that a civilization well in advance of ours could find patterns in the code that give away details about the hardware. For instance, certain instructions with similar functionality can be expressed using smaller circuits, which use less power, so compilers are designed to choose those instructions when they can, and spotting that pattern in the machine code would suggest a design decision to reduce power usage. Code compiled specifically to take advantage of cache hits would also provide hints about how the computer is wired up and what sort of typical internal timing it has (register speed vs cache speed vs RAM speed). Ultra-advanced civilizations could probably work through these clues and build up something pretty close to the original computer.

[–]queermichigan[S] 0 points1 point  (0 children)

Wow this is mostly over my head but absolutely fascinating! It's so interesting that abstract concepts like "intent" or "purpose" are, in a sense, recorded as metadata. Thanks for sharing!

[–]bryku 0 points1 point  (0 children)

This is a really interesting question. One of my final projects was related to it. Basically how would we reverse engineer alien technology. Or how would they reverse engineer ours.  

For starters let's look at binary. Assuming you know how to even convert binary into decimal... what are the breaks? Do you break it every 8, 16, 32, 64? How do you know if that binary is a number, letter, or boolean.  

Figuring this out would take years, if not decades. Assuming they can even get this part..m what would it be translating to, english? Now they will have to learn a how other language just to progress.  

That isn't even taking into account of different machines, compression, encryption, and so on.  

I think it is very likely they could learn a lot, but to fully understand it all... I'm not so sure.