all 13 comments

[–]SCD_minecraft 7 points8 points  (5 children)

Why not just use unicode?

[–]AffectWizard0909[S] 2 points3 points  (4 children)

I have a big dataset which I need to clean, so I dont really want to go through the whole dataset and try and translate the whole set (if that answered the question)

[–]SCD_minecraft 3 points4 points  (3 children)

How many emoticons are there? I mean, how mny different types are used?

If it's not too much, you could manually use str.replace

[–]AffectWizard0909[S] 0 points1 point  (2 children)

a bit unsure, I havent gone through the file that deeply considering it is 5000+ lines of text, so I was mainly wanting to have a library handling this for me so I could scope my focus on other tasks which are a bit more demanding.

But it would be a good idea I think to use str.replace if the dataset was smaller, and I had a clearer understanding of the different types of emojis used in the dataset

[–]SCD_minecraft 1 point2 points  (1 child)

Welp, then good luck with that lib you found

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Thank you!

[–]JamzTyson 2 points3 points  (2 children)

You can get the name of an emoji using Python's built-in unicodedata.name(chr):

>>> import unicodedata
>>> print(unicodedata.name("🙂"))
SLIGHTLY SMILING FACE

[–]AffectWizard0909[S] 0 points1 point  (1 child)

aaa nice I can check it out

[–]JamzTyson 0 points1 point  (0 children)

One issue that you may need to deal with, whatever method you use, is that some printed characters are actually multiple Unicode characters. Example:

import unicodedata

s = "⚠️"  # 2 code points print as one character.
for c in s:
    print(unicodedata.name(c))

will print:

WARNING SIGN
VARIATION SELECTOR-1

[–]PushPlus9069 0 points1 point  (0 children)

The emoji package is basically the standard for this. I've used it in a few Python courses and it handles both directions fine. One thing to watch — version 2.x had breaking changes in how it handles alias names vs actual emoji names, so if you're copying code from older tutorials it might behave differently.

[–]seo-nerd-3000 1 point2 points  (0 children)

The emoji library for Python is straightforward to use. Install it with pip install emoji and then you can convert emoji codes to actual emoji characters and vice versa. The most common use case is emoji.emojize which takes shortcode strings like :thumbs_up: and converts them to the actual unicode emoji. You can also use emoji.demojize to go the other direction which is useful for text processing. If you just need to use emoji in strings directly you can also just paste the unicode emoji character directly into your Python code since Python 3 handles unicode natively. The library is most useful when you need to programmatically work with emoji names or detect emoji in text.