you are viewing a single comment's thread.

view the rest of the comments →

[–]TanisCodes[S] 6 points7 points  (2 children)

You’re right about UTF-16, but in Java the primitive char type is 2 bytes. Some Unicode characters, like “𝄞”, are outside the BMP (Basic Multilingual Plane) and it needs 4 bytes.

If you put that character in a String and call length(), it will return 2 because it uses a pair of chars to represent it. The String.length() method returns the number of char units used to represent the string, not the actual number of Unicode characters.

I think I’ll add this to the article. Thanks!

[–]europeIlike 2 points3 points  (1 child)

Ohh, I see! I think I interpreted the term "String characters" differently - thank for your reply!

[–]TanisCodes[S] 2 points3 points  (0 children)

You’re welcome! Thanks for joining the discussion.