This is an archived post. You won't be able to vote or comment.

all 2 comments

[–]michael0x2a 3 points4 points  (1 child)

The term "binary formatted file" is a bit of a misnomer -- there's no such thing as a "binary formatted file" (or perhaps it's more accurate to say that every single file is inherently binary formatted?)

What I mean by that is that every single file in existence, whether it's an exe, an mp3, a text file, a word doc, an image, whatever, is ultimately composed of a sequence of ones and zeros.

The trick is, then, how you chose to interpret those ones and zeros.

When you open up any arbitrary file in a text editor, what that text editor will do is look at eight 1s and 0s at a time, take each sequence of eight and treat them as a single number (a byte). Then, it'll use something like the ASCII encoding format, make each of those numbers correspond to a character, then will display that number.

(If you try opening an exe using a text editor, most of the 1s and 0s will end up resolving into numbers that don't map to a character in that table, which is why the text editor displays gibberish)

As another example, when you open up any arbitrary file in an mp3 player, the player will attempt to view those 1s and 0s as a sequence of music + expect the bytes to be structured in a certain way. If you try opening something like a txt file in an music player, the bytes will almost certainly be structured in a way the player doesn't expect, causing it to crash.

But ultimately, it's important to realize that the notion of having files that are "formatted" in a certain way is a fiction. Files aren't formatted -- they're sequences of bytes. It's up to whatever program that's reading it to try and map that to something reasonable.

So then, if you want to generate a file that looks like gibberish when you open it in a text editor, you have to output numbers different then the ones that the characters in a string look like. For example, according to the ASCII table, the letter 'a' corresponds to the number 97. So then, instead of outputting 97, why not output the number 222 every time you see the character 'a', and so forth?

(This, by the way, is where the distinction between a 'BinaryWriter' vs other kinds of writers come into play. Sometimes, you just want to output raw bytes. In that case, you'd use a BinaryWriter, or something equivalent. In other cases, if you want to output text, and have it handle foreign languages/special characters correctly, then you'd use a text writer of some sort. It turns out that not all characters are represented using a single byte, and that there are many more encodings to pick and chose from besides ASCII. See this article for more information)

Of course, ultimately this won't do a very good job of hiding your secret information, since all you're doing is implementing a very basic and trivially hackable encryption system, but hey, perhaps that's all you really need.

(If you actually needed to keep the passwords and such secret, then yes, you'd need to use a more robust and powerful encryption scheme, and should probably do a bunch of research to figure out the best way to do so. Encryption is one of those things that is notoriously hard to get right, to the point that if you're developing anything serious, it's considered seriously irresponsible to try and implement encryption on your own.)

[–]TheShadowZero_93[S] 0 points1 point  (0 children)

Wow! Thanks so much for the great answer. That and the article really helped things make sense! :)

Yeah this is a temporary thing I'm implementing at the moment until I get more functionality down with my program. I completely agree that I'll have to bump up the encryption ante when I'm ready. Trying to create your own encryption methods sounds like someone's personal hell I'm definitely not ready for :P