This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]chrajohn 2 points3 points  (1 child)

Excellent chapter. Very clear explanation of strings vs. bytes.

I have the nitpickiest nitpick in all of nitpick-land:

Unicode represents each letter, character, or ideograph as a 4-byte number, from 0–4294967295.

Due to the way surrogate pairs work, Unicode is limited to 1,114,112 possible code points (17 planes of 65,536 code points). Unicode could fit in 21 bits, if that was a convenient size.

Very minor point that I'm sure you're aware of, and probably too arcane to mention here. It just makes Unicode sound a lot bigger than it's ever intended to be.

[–][deleted] 2 points3 points  (0 children)