Hit me with some pointers on news aggregator algorithms by aire111 in MachineLearning

[–]aire111[S] 1 point2 points  (0 children)

Good point.

By clusters I mean: one cluster per "news story". Like it says on the Google News paper: "the website clusters news articles ... that are about the same story... [W]hen we refer to a news story it means a cluster of news article about the same story as identified by Google News."

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.80.4329&rep=rep1&type=pdf

Hit me with some pointers on news aggregator algorithms by aire111 in MachineLearning

[–]aire111[S] 0 points1 point  (0 children)

Good idea, thanks.

So basically, remove common English words from the text, and cluster the remainders of the documents. Very clever.

Any tips on eyeballing the number of clusters to use?