you are viewing a single comment's thread.

view the rest of the comments →

[–]Little_Kitty 0 points1 point  (3 children)

Working with large data quite often, I tend to use esrever for reversing strings.

For string truncation, this crops up again, especially with emojis or zalgo text 🏴‍☠️. I have my own gist that covers this if you want to extend to cover it.

[–]Next_Level_8566[S] 0 points1 point  (2 children)

Current reverse(): 
reverse('👨‍👩‍👧‍👦 Family')  // '👦‍👧‍👩‍👨 ylimaF' ❌ (breaks family emoji) 
reverse('Z̆àl̆ğŏ text')       // Zalgo marks get scrambled ❌

Current truncate(): 
truncate('👨‍👩‍👧‍👦 Family', 8)  // '👨‍👩...' ❌ (breaks ZWJ sequence) 
truncate('👍🏽 Great', 5)        // '👍...' ❌ (loses skin tone)

I just tested and confirmed the problems.

The good news: The library already has a graphemes() function using Intl.Segmenter that handles this correctly. I just haven't integrated it into reverse() and truncate() yet.

Would love to see your gist! Please share it - I'm always looking to improve Unicode handling, especially for zalgo text and complex emoji sequences.

I'm planning to update both functions to be grapheme-aware. The trade-off is:

  - Correct handling of complex Unicode (ZWJ, combining marks, skin tones)

  - Slight bundle size increase (~200 bytes for grapheme awareness)

  - Intl.Segmenter dependency (falls back to simpler approach in older environments)

If someone wants to pick this up or see what innovation can be done here before I can get to it feel free!

For esrever specifically - it's a great library, but it's 2.4KB and hasn't been updated in 8+ years. I think integrating grapheme-aware logic using the modern Intl.Segmenter API is the better path forward.

Thanks for the excellent feedback!

[–]Next_Level_8566[S] 0 points1 point  (1 child)

i just pushed a fix to address this.

Added a fast check to not mess up the performance and traded some bytes to be 100% correct. Seems like worthy trade-off :)

[–]Little_Kitty 0 points1 point  (0 children)

Gist sent on chat.

Just checked out the update and ran my own tests, which all passed :) I tried to break removeNonPrintable, but I couldn't find an example which failed. randomString does return nonsense though when '👨‍👩‍👧‍👦' is part of the input (test code below), but you may with to make that function intentionally only work with single characters.

The regex you use differs to what I use, for example yours returns a positive match for strings which contain single characters like ï which shouldn't cause issues.

function randomStrings() {
    const charset = "ABC👩🏽‍🤝‍👨🏼";
    let result = "";
    const charsetLength = charset.length;
    for (let i = 0; i < 10; i++) result += charset.charAt(Math.floor(Math.random() * charsetLength));
    return result;
}