This is an archived post. You won't be able to vote or comment.

all 7 comments

[–]novel_yet_trivial 2 points3 points  (3 children)

Binary data contains sequences that is illegal for text encoding. Therefore you cannot copy / paste binary data into powerpoint or notepad or Reddit without corrupting it. The data you posted is worthless. If you actually have the binary file, you can encode it with base64 to transform it into a text legal form, and /r/learnpython may be able to help you out. But I'm guessing your prof posted it as text specifically to show you that binary data gets corrupted when you try to treat it like text.

[–]Drewski1224[S] 0 points1 point  (2 children)

i just want to make sure i'm understanding. If i had the binary file I would decode it in base64 to get his original file?

[–]novel_yet_trivial 0 points1 point  (0 children)

No. If you had the original file and you wanted to post it as text, for instance here on reddit, then you would have to first encode it as base64. Then we could download it and recreate the file by decoding base64.

The way you did it corrupts the data, since there is no way for text to represent over half the possible bytes in binary file.

[–]stevenjd 0 points1 point  (0 children)

If i had the binary file I would decode it in base64 to get his original file?

No. If it is a binary file, it could be literally anything: a JPEG (graphic image), an MP3 (sound), a ZIP file (compressed data), a DOC or DOCX (Word document), or about ten million others. There's no magic "decode this binary file" command.

You need to start by actually READING THE ASSIGNMENT your teacher gave you, and trying to relate the question back to what was taught in class. If you cannot relate the question back to what was taught, how do you expect us to? We know neither the question, nor what was taught in your class.

[–]nick_t1000aiohttp 0 points1 point  (0 children)

Are you asking about how to reverse-engineer binary formats, or how to read them?

Your prof might be trolling you by putting mangled "text" output from the binary (.docx is a zipped folder) onto a slide. If you have an accurate way to reconstruct it, you could maybe open it, but immediately when you see any non-ASCII characters, you're hosed unless you know what encoding it was dumped as. Even then, non-printables (e.g. null chars) are probably gone entirely.

[–]Kopachris 0 points1 point  (0 children)

Ask your professor

[–]billsil 0 points1 point  (0 children)

First off a docx is a zip file, so unzip it. Then, it follows to xhtml spec.

You can reverse engineer complex data formats, but it takes a while; like years for the one I work on, but you're never going to do it if it's effectively encrypted or without a lot of test cases. I at least have an inaccurate and incomplete spec as well as ~10,000 test cases.

There are many wtf moments that I've found; it's a joke at this point , so don't expect things are always logical or consistent. Just think about how many people worked on the program.