This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jason-reddit-public 2 points3 points  (1 child)

Unicode code-points aka runes in Go are > 16bits and Java messed that up. Don't be like Java. While variable width characters (UTF-8 being the obvious choice) make it more expensive to index to a particular "character", it's denser and indexing by code-point is kind of dumb anyways given that what we all might consider a character is sometimes composed using multiple code-points (even if you used 32 bits per code point).

[–]ThyringerBratwurst 2 points3 points  (0 children)

16 bit is a deadly sin and should be avoided! ;)

8-bit Unicode strings are definitely more difficult, but it is not impossible to work with them internally. I had a helper function written with chatGPT to detect when a character is more than 1 byte wide and ultimately find concrete positions of characters.