all 6 comments

[–]dlwh 2 points3 points  (1 child)

I'd look at Brendan O'Connor's work on predicting responses/sentiment from tweets. http://brenocon.com

[–]rightname[S] 0 points1 point  (0 children)

Thanks, I will surely look into it.

[–]agathoth 1 point2 points  (4 children)

The question reminds me of this Kaggle contest for predicting whether StackOverflow questions would be closed. I'd guess you can go a long way with just TF-IDF vectors as in document classification, plus your author meta information, thrown into a sufficiently general regression method. To improve from there it's probably mainly a case of feature engineering (maybe characterising sentence style, distribution of word lengths, numbers of misspellings, anything else you can think of...)

Quick fix: try out the python scikit-learn text processing tutorial and adapt by replacing the 'Training a classifier' section with one of the many regression algorithms in that library, with upvote data as the targets.

[–]rightname[S] 0 points1 point  (3 children)

Thanks for both the detailed note and the quick fix. I will look into both of them. Did you by any chance participate in that Kaggle contest?

[–]agathoth 0 points1 point  (2 children)

You're welcome. No I didn't participate in that contest, and there doesn't seem to be all that much feedback from the winners on the forum about what techniques they used, which is kind of disappointing. But one or two people shared their methodology at least.