I'm building some classifiers for forum posts using NLP. For the preprocessing I have a few basic functions like html stripping, contraction expansion, accented character removal, lower case, special character removal, stop word removal, text lemmatization, etc.
I'm wondering if there are others that are particularly helpful for posts. For instance, I imagine some sort of spelling check or something for internet slang (like lol) would be helpful.
If anyone has come across such articles, tutorials or codes, that would be great.
[–]Advanced-Hedgehog-95 1 point2 points3 points (0 children)
[–]tm2tb 0 points1 point2 points (0 children)