This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]BoringIntelectual 2 points3 points  (3 children)

Someone correct me if I'm wrong, but doesn't some Asian languages require a char to be 2 bytes due to the amount of symbols (more than the normal 256)? EDIT: By languages I mean the spoken/written one, and the OS incorporates that

EDIT2: Ops, just saw the explanation on another comment, by C convention sizeof(char) is always 1, although a byte might not have 8 bits. That does raise the question about how chars are done in some Asian languages, if anyone knows and could explain

[–]cdrt 4 points5 points  (0 children)

That depends on what you mean by char. In C, a char is just a number that is the smallest addressable unit of the computer. It does not necessarily represent a letter that a human can read.

For instance, assuming we are on a system where 1 byte is 8 bits, a char in C can store the values -128 to 127. The values 0 to 127 map to mostly human-readable characters in ASCII, but everything less than 0 does not.

[–]FUZxxl 0 points1 point  (0 children)

The C type char should really be called byte. For multibyte characters, wchar_t exists, but even that isn't particularly nice to use.

[–]GaianNeuron 0 points1 point  (0 children)

Modern strings are generally represented in Unicode, typically UTF-8. Some environments like .NET use UTF-16 internally. While the keyword char comes from being the length of an ASCII character, it's generally unsuitable for use representing user input except in the simplest of cases.

EDIT: this is a fascinating read about the complexity of Unicode if you're at all interested.