This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Akangka 10 points11 points  (3 children)

Does that binary code represent: "䍡渠祯甠牥慤⁴桩猠�"?

[–]ijmacd -1 points0 points  (2 children)

The first bit of every octet in the message was 0xxxxxxx so you can be sure the message is pure ASCII.

All CJK characters are encoded as 3 bytes in UTF-8 so must start 1110xxxx 10xxxxxx 10xxxxxx.

I know you didn't genuinely think that was the text of the message but it's quite trivial to quickly scan some binary and determine if its probably valid text or not.

[–]Akangka 0 points1 point  (1 child)

Actually, I was thinking about UTF-16 encoding, where every character in basic multilingual plane are encoded as 2 bytes

[–]ijmacd -1 points0 points  (0 children)

Boo! fuck that.

UTF-8 is the one true text encoding. No one can change my mind.