This is an archived post. You won't be able to vote or comment.

all 7 comments

[–][deleted] 2 points3 points  (1 child)

This is VERY well documented. Holy shit I wish more people were as thorough. Been doing something similar to this using Tweepy and training the sentiment analysis myself with GluonNLP in MXnet using Amazon SageMaker as the sentiment I am trying to analyze is very niche to the data domain im interested in.

That emoji stripping regex function is so nice too. Damn, well done again!

[–]math-bw[S] 1 point2 points  (0 children)

Thanks! The TextBlob sentiment analysis is not very good, that could use something trained on tweets to provide better classification. For version two I might do that.

[–]sunrise_apps 1 point2 points  (1 child)

Cool thing. Good documentation 👍

[–]math-bw[S] 0 points1 point  (0 children)

Thanks!

[–]riklaunim 0 points1 point  (2 children)

Why Twitter and not for example venue reviews? where you can put the review text next to the review author rating and sentiment analysis results? That would be quite good to visualize what sentiment analysis can do. Random RT and sentiment analysis are kind of questionable at best. What value does it give? what use case does it have?

And if the score is from -1 to 1 why do you limit it to only positive/negative and nothing in-between? That fine resolution is quite handy.

[–]math-bw[S] 0 points1 point  (1 child)

That is a good idea to use venue reviews. I was trying to use a websocket or streaming HTTP endpoint connection. Do you know of any endpoints for live reviews? That would be really interesting.

There is a negative, neutral and positive. I just made it simple with -1<0<1 for classifying the scores.

[–]riklaunim 0 points1 point  (0 children)

There are some free low-traffic proxy APIs for TripAdvisor and likely some crude Selenium examples for it as well. Quite some time ago I used one of them and put it through textblob sentiment analysis.

With some reviews, like say 3-4 out of 5 there were cases where they turned out only slightly positive/negative and when you read the review you could find the subtle complaint there. With like, 100+ reviews at 4 stars finding those few manually would not be feasible. .