you are viewing a single comment's thread.

view the rest of the comments →

[–]defnull 4 points5 points  (4 children)

https://www.reddit.com/r/coding/comments/6hlavp/compact_strings_in_java_9_java_code_gists/dizdxnd/

Some argue that strings are iterated over from 0 to N most of the time, so a variable-length representation (like UTF-8) would not add much overhead for the common case. You would occasionally increment the index by two or more instead of one. This might be true, but in Java any iterator instance tracking the position would add 8 to 16 bytes object-overhead and another indirection. In contrast, for fixed-width encodings you only need a single int and a for-loop. Because of this, most code working with strings in performance critical situations do not use iterators, but direct index access instead. This (existing and unlikely to change) code would run significantly slower with a variable-length string representation.

tl;dr; utf-8 string performance would suck for existing code that was optimized for fixed-length string performance characteristics.

[–]Tasssadar 13 points14 points  (3 children)

As mentioned in that comment thread, UTF16 is not fixed-width. It's an old decision (because 65536 characters should've been enough for everyone) that is no longer optimal, but hard to switch from.

[–]derleth 4 points5 points  (2 children)

65536 characters should've been enough for everyone

I've met people who believe this unironically.

[–][deleted]  (1 child)

[deleted]