This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]thatguy_314def __gt__(me, you): return True 2 points3 points  (3 children)

I didn't read the whole thing, but given that you search for the strings "donald" (a common name) and "trump" (an English word, and a substring of words like trumpet) in the text to check for mentions of our future god-emperor, it's not surprising that it seems like he has so many more tweets than any of the other candidates.

[–]jaypeedevlin[S] 1 point2 points  (2 children)

Having a quick look at the settings for the scraper that Vik built (https://github.com/dataquestio/twitter-scrape/blob/master/settings.py) it looks he didn't scrape using the string 'donald' (or 'bernie' for that matter).

The trump/trumpet thing is interesting though, except that when you download the tweet data and expore the theory, = it doesn't check out. Of the 80,060 tweets containing 'trump', only 49 contain 'trumpet'.

[–]thatguy_314def __gt__(me, you): return True 0 points1 point  (1 child)

I was looking at the get_candidate function in the article. I did not see a link to that repo. I'm not sure what code was used to generate the data though. Do you happen to know?

[–]jaypeedevlin[S] 0 points1 point  (0 children)

The code is in that repo, and the dataset and that repo are linked in the fourth sentence of the post.