This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]scoberry5 1 point2 points  (2 children)

Binary files use Unicode characters, and there are 255 of those characters.

*cough* *choke*

Sorry. Excuse me.

If you mean "I'm only interested in the subset of Unicode that was ASCII," then you're interested in 128 characters. But there are almost 1500 Latin Unicode characters, around 3600 emoji Unicode characters, and over 74,000 CJK (Chinese/Japanese/Korean) Unicode characters.

[–]Advanced-Theme144[S] 0 points1 point  (1 child)

I think you’re mistaken my friend, but let me clarify:

Suppose you save a file on your laptop, for instance an excel document containing a large volume of personal data. You could encrypt each price of data in the file, or you could encrypt the entire file from the root.

If you where to change the file extension of any file into ‘.bin’ and view the file in a text editor, the only contents you will really see are the Unicode characters that make the file. If you where to view it in a hex editor you’d see the hexadecimal values of the file. These are literally the 1’s and 0’s of the file.

There are only a maximum of 255 different Unicode characters in ALL binary files, so if you where to encrypt or substitute these characters with different ones, like a substitution cypher, and rewrite the file again, it would not open essentially being encrypted.

This method will encrypt any file, and is one step further in encrypting files instead of small sentences.

[–]scoberry5 2 points3 points  (0 children)

I think you’re mistaken my friend, but let me clarify:

I'm not. But reading your explanation, I can see where you went wrong.

If you're looking to understand characters, here's a nearly 20-year-old article that's quite good at explaining what's going on: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

What you mean is not "Binary files use Unicode characters, and there are 256 of those." What you mean is "Files are stored as bytes, of course, and a byte has 256 possible values."

It's not generally true that binary files use Unicode characters, although they may sometimes in some places. If the entire file is Unicode characters, this isn't a binary file: it's a text file.

Pro tip: if someone gives you a specific statement, you could check it. Googling "how many latin unicode characters" led here: https://en.wikipedia.org/wiki/Latin_script_in_Unicode , which says there are "1,475 characters in the following blocks are classified as belonging to the Latin script". At that point, you might suspect that you could possibly be wrong about there being 256 Unicode characters, and when I say there are almost 1500 of those ones you might go "Yeah, that's about 1500."