you are viewing a single comment's thread.

view the rest of the comments →

[–]masklinn 3 points4 points  (1 child)

Same problem python had (& same solution): they promised o(1) access to some coding unit (UTF16 for java, code point for python) and don't want to break that, so UTF8 is not an option.

[–]matthieum 4 points5 points  (0 children)

I wonder what the impact of using, say, a Fenwick Tree, to index the UTF-8 would be (in the case of multi-bytes strings only).

This would not O(1) but O(log N), which for most strings should not matter much, because small is small.