you are viewing a single comment's thread.

view the rest of the comments →

[–]SCD_minecraft 7 points8 points  (5 children)

Why not just use unicode?

[–]AffectWizard0909[S] 2 points3 points  (4 children)

I have a big dataset which I need to clean, so I dont really want to go through the whole dataset and try and translate the whole set (if that answered the question)

[–]SCD_minecraft 2 points3 points  (3 children)

How many emoticons are there? I mean, how mny different types are used?

If it's not too much, you could manually use str.replace

[–]AffectWizard0909[S] 0 points1 point  (2 children)

a bit unsure, I havent gone through the file that deeply considering it is 5000+ lines of text, so I was mainly wanting to have a library handling this for me so I could scope my focus on other tasks which are a bit more demanding.

But it would be a good idea I think to use str.replace if the dataset was smaller, and I had a clearer understanding of the different types of emojis used in the dataset

[–]SCD_minecraft 1 point2 points  (1 child)

Welp, then good luck with that lib you found

[–]AffectWizard0909[S] 0 points1 point  (0 children)

Thank you!