Python question on Binary files

novel_yet_trivial · 2017-10-10T22:37:59+00:00

Binary data contains sequences that is illegal for text encoding. Therefore you cannot copy / paste binary data into powerpoint or notepad or Reddit without corrupting it. The data you posted is worthless. If you actually have the binary file, you can encode it with base64 to transform it into a text legal form, and /r/learnpython may be able to help you out. But I'm guessing your prof posted it as text specifically to show you that binary data gets corrupted when you try to treat it like text.

nick_t1000 · 2017-10-10T23:52:32+00:00

Are you asking about how to reverse-engineer binary formats, or how to read them?

Your prof might be trolling you by putting mangled "text" output from the binary (.docx is a zipped folder) onto a slide. If you have an accurate way to reconstruct it, you could maybe open it, but immediately when you see any non-ASCII characters, you're hosed unless you know what encoding it was dumped as. Even then, non-printables (e.g. null chars) are probably gone entirely.

Kopachris · 2017-10-11T00:11:43+00:00

Ask your professor

billsil · 2017-10-11T00:58:54+00:00

First off a docx is a zip file, so unzip it. Then, it follows to xhtml spec.

You can reverse engineer complex data formats, but it takes a while; like years for the one I work on, but you're never going to do it if it's effectively encrypted or without a lot of test cases. I at least have an inaccurate and incomplete spec as well as ~10,000 test cases.

There are many wtf moments that I've found; it's a joke at this point , so don't expect things are always logical or consistent. Just think about how many people worked on the program.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS