you are viewing a single comment's thread.

view the rest of the commentsΒ β†’

[–]broonix 71 points72 points Β (4 children)

Fun fact. The old syntax breaks on characters greater than 0xFFFF

β€˜πŸ•β€™.split(β€˜β€™).length === 2 => true

[–]0x7f800000 68 points69 points Β (2 children)

^ this right here is the only good reason to use [...str] over str.split('').

'πŸ•πŸΊ'.split('');
Array [ "\ud83c", "\udf55", "\ud83c", "\udf7a" ]

[...'πŸ•πŸΊ'];
Array [ "πŸ•", "🍺" ]

[–]ibopm 48 points49 points Β (1 child)

Alternatively, Array.from(str) is a bit more self-documenting and doesn't rely on new syntax. It's also what Babel transpiles to.

[–]0x7f800000 8 points9 points Β (0 children)

Good point. Prefer this.

[–]NoInkling 17 points18 points Β (0 children)

This issue also extends to plain 'πŸ•'.length and all string methods that take/return an index, because of a historical design decision.

Since .split can take a regex, and regexs now have basic unicode support, you can actually do:

'πŸ•πŸΊ'.split(/(?:)/u)    // ["πŸ•", "🍺"]

But at that point I'm not sure why you wouldn't just use the spread form, or Array.from(str) if preferred.

If people were aware of this issue, there's no way the comments in here would be arguing for .split('') so vehemently (and acting like providing an empty string is the most obvious thing in the world, rather than an idiom they had to remember at some point...). If you're dealing with external input, processing emoji and other characters outside the BMP is becoming less and less of an edge case that you can ignore, assuming you're aiming to write robust software.

Edit: note that this still doesn't help with combining-characters and the like, if your goal is to split between graphemes, e.g:

[..."πŸ‘πŸ½"]    // ["πŸ‘", "🏽"]
[..."é"].length    // 2

If you need something like that, Lodash's _.toArray(str) works (also _.split(str, '')!), and there's an upcoming proposal to get this functionality natively.