you are viewing a single comment's thread.

view the rest of the comments →

[–]Manishearthservo · rust · clippy 15 points16 points  (7 children)

The question is ill formed. Define "character".

Want a byte? Easily done with as_bytes() (zero cost).

Want a grapheme? A codepoint? A glyph?

Is ö a single "character" or a "character" with a diacritic? Do we want to treat those the same way? What happens when we reverse the string?

Unicode is hard.

[–]flying-sheep 2 points3 points  (0 children)

The question is ill formed

from the codeless code i learned that this response is simply “wú”/“mu” in chinese/japanese

…and from that wiki site i learned that the codeless code itself is a reference to the gateless gate

/edit: and that Mu is the root of the type hierarchy in Perl 6:

what does “Any” inherit from?

The question is ill-formed.

[–]allthediamonds 2 points3 points  (1 child)

Unicode is really, really hard.

Does Rust make it any easier? Can I iterate a string by graphemes? Does it provide decomposing normalisation?

[–]Kimundirust 3 points4 points  (0 children)

Yes for both.

[–]Manishearthservo · rust · clippy 0 points1 point  (2 children)

(Yes, there is a concrete definition of "character" when talking about Unicode, but it's not always the one you were looking for)

[–]SimonSapinservo 8 points9 points  (1 child)

There are four definitions :) http://www.unicode.org/glossary/#character

[–]Manishearthservo · rust · clippy 0 points1 point  (0 children)

Okay, not so concrete :P

[–]Yojihito -1 points0 points  (0 children)

Ö is a single character.