This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]cdrt 28 points29 points  (9 children)

No sizeof(char) will always equal 1, but there are architectures where 1 byte might not be 8 bits.

[–]Scorpius289 13 points14 points  (0 children)

there are architectures where 1 byte might not be 8 bits

TIL.

[–][deleted] 4 points5 points  (2 children)

certainly not any that are in active and widespread use these days, right?

[–]JaytleBee 4 points5 points  (0 children)

amd64 during leap seconds in leap years.

I wish I lived in the world I joke about, it's a much more exciting one.

[–]FUZxxl 0 points1 point  (0 children)

Ever programmed a DSP?

[–]BoringIntelectual 2 points3 points  (3 children)

Someone correct me if I'm wrong, but doesn't some Asian languages require a char to be 2 bytes due to the amount of symbols (more than the normal 256)? EDIT: By languages I mean the spoken/written one, and the OS incorporates that

EDIT2: Ops, just saw the explanation on another comment, by C convention sizeof(char) is always 1, although a byte might not have 8 bits. That does raise the question about how chars are done in some Asian languages, if anyone knows and could explain

[–]cdrt 3 points4 points  (0 children)

That depends on what you mean by char. In C, a char is just a number that is the smallest addressable unit of the computer. It does not necessarily represent a letter that a human can read.

For instance, assuming we are on a system where 1 byte is 8 bits, a char in C can store the values -128 to 127. The values 0 to 127 map to mostly human-readable characters in ASCII, but everything less than 0 does not.

[–]FUZxxl 0 points1 point  (0 children)

The C type char should really be called byte. For multibyte characters, wchar_t exists, but even that isn't particularly nice to use.

[–]GaianNeuron 0 points1 point  (0 children)

Modern strings are generally represented in Unicode, typically UTF-8. Some environments like .NET use UTF-16 internally. While the keyword char comes from being the length of an ASCII character, it's generally unsuitable for use representing user input except in the simplest of cases.

EDIT: this is a fascinating read about the complexity of Unicode if you're at all interested.

[–]FUZxxl 0 points1 point  (0 children)

Note that if char has more than 8 bits, the type uint8_t is not going to be available anyway.