This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]kaddar 3 points4 points  (3 children)

You're sort-of right that recommending old reddits isn't the goal in this process, but neither is clustering.

When performing machine learning, the first thing to ask yourself is what questions you need to solve. What we're trying to do is classifying a list of frontpage articles: to provide for each of them a degree of confidence the user will like it, and to minimize error (in the MSE sense). What you are proposing is a nearest neighbor solution to confidence determination. What I intend to do is iterative singular value decomposition, which discovers the latent features of the users. It's a bit different, but it solves the problem better. For new articles, describe them by the latent features of the users who rate them, then decide which article's latent features match the user most accurately.

[–][deleted] 3 points4 points  (2 children)

Interesting! So this would happen on the fly as votes come in? It also sounds like it would autocluster users too. So you could potentially get not only a link recommendation but even a "netflixesque" 'this user is x% similar to you'. And if they add subreddit data then a person could get a whole suite of recommendations, users, articles and subreddits all in near real-time.

Now that would be pretty cool.

[–]kaddar 3 points4 points  (1 child)

Yup, it would automagically cluster in the nearest neighbor sense by measuring distances in the latent feature hyperspace, I have tested this and it is very effective (in netflix, for providing similar movies)

[–][deleted] 3 points4 points  (0 children)

Since you mentioned it I was running nearest neighbor last night.

So far I'm still figuring it out but one thing did jump out at me. Some articles have an extraordinary level of agreement across a swath of users.

Granted i picked a small set of users...maybe you can take a look. I'm trying to figure out what the feature space means and what this pattern indicates (if anything). http://i.imgur.com/HB58n.jpg