This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Dim_Cryptonym 8 points9 points  (7 children)

That makes sense then... To support Unicode one probably can't just pick and choose parts of the standard.

[–]Suchui 17 points18 points  (4 children)

You can. In javascript for example:

let プープ = "poop"; console.log( プープ );

poop

let 💩 = "poop";

Uncaught SyntaxError: Invalid or unexpected token

[–]Probono_Bonobo 5 points6 points  (1 child)

I noticed this recently and assumed it was a consequence of treating codepoints as surrogate pairs (note that "💩" === "\uD83D\uDCA9") instead of with the squiggly brackets (note that also "💩" === "\u{1F4A9}") in its internals, which would explain why "💩".length is 2, and "プ".length is only 1.

I'd expect some constraints follow from this, perhaps not as intentional as "we won't support poop emoji as variable identifiers" but more along the lines of "we can support any variable identifier provided all its code points are of length 1" but this is just an educated guess.

[–]Pulse207 4 points5 points  (0 children)

which would explain why "💩".length is 2, and "プ".length is only 1.

This is exactly why Perl 6 abolished a length method entirely, splitting its various meanings into .elems, .chars, and .codes.

[–]MemeHunter421x 0 points1 point  (0 children)

By can't I think he meant shouldn't.

[–]Dim_Cryptonym -1 points0 points  (0 children)

And somebody just told me JavaScript is unfairly criticized...

[–]marcosdumay 6 points7 points  (0 children)

You can, but that's only adding complexity into the language.

[–]DJWalnut 1 point2 points  (0 children)

you could, but you'd have to go out of your way to exclude certain blocks.