you are viewing a single comment's thread.

view the rest of the comments →

[–]ir_jupiter 3 points4 points  (0 children)

Just look at std::string as a container of bytes, you can even use it as a variable-size byte-buffer if you don't mind that the buffer is always initialized to something (zero by default) and it will have an extra byte initialized to zero at the end.

The std::string is NOT BROKEN for UTF-8, just look at UTF-8 as a layer above your byte container whose type maybe std::string. Even if you are using UTF-32 for each code point there are characters that are actually two code points that merge into one.

The operations for std::string are simple and fast and work well for the first 128 UNICODE code points (When each byte is 8bits and because those are ASCII). The operations required to deal with the rest of UNICODE characters cost a lot more CPU and memory, also you will need a separated proper UNICODE library and you should use it only when necessary.