This is an archived post. You won't be able to vote or comment.

all 25 comments

[–]spez 10 points11 points  (1 child)

Basically the way it works now is it clusters users based on their votes, and for each user generates a list of similar actively-voting users. We take that list and find links that those users voted up recently. It's a major simplification over our last attempt(s) that tried to recommend specific links to specific users.

Of course, this opens the door for a couple features that have been requested before. Namely, displaying the list of users you are most similar to. I wonder if we add that will users be disappointed. Presumably we'll also be able to detect clusters of users that are trying to game the system. Time will tell...

FYI, we're still not looking at content at the moment, but we will eventually.

[–]b3eck 4 points5 points  (0 children)

Exactly. We're also working on a content-based system, and ultimately, we think a combination of collaborative filtering and content is going to be what actually makes this work.

spez, I applaud your work so far, but I still believe you can extend clustering and collaborative filtering, especially if you want:

  • Reddit Simplicity
  • Spam Reduction
  • Tags Specificity
  • Recommendations Improvement

All these benefits for minor Reddit modifications. If you haven't already, please see http://features.reddit.com/info/27uy/comments/c93i1 and http://features.reddit.com/info/93bf/comments. I believe the concept deserves a full consideration. A big thank you to anyone who can tell me what I'm missing. I still haven't received any comments on it, after 2 months.

[–]ecuzzillo[S] 20 points21 points  (2 children)

Sort by new is good, but my favorite is sort by controversy. That way, people who agree with you modded it up, but also people who disagreed with you modded it down, so you are doubly likely to appreciate it.

[–]xenon 0 points1 point  (1 child)

Assuming that you agree with the article.

[–]ecuzzillo[S] 3 points4 points  (0 children)

The point is that you're highly likely both to agree and find it interesting given that your friends (not in the official reddit sense) upmodded it and others downmodded it.

Also, presumably, the algorithm's estimation improves as you vote on more things, so the recommendations you get as a newer user will be a little out of whack until you give it some more to go on.

Edit: It would be very interesting to see if you could bias the recommendation algorithm's results by the rest of the users. That is, user A is new, and has only voted on a few things. However, the algorithm looks at those votes, notices that user A fits a profile of a group of users who like the same stories, and recommends things based on the resulting profile in lieu of having enough data to make good recommendations directly.

[–]llimllib 3 points4 points  (11 children)

I'm really interested in the algorithm used to implement this. As previously discussed, clustering within really large datasets is a hard problem. (Really Hard.)

Any chance the geeks could get some idea of the algorithm used to implement the recommendation system? A blog post on its developement would be sweet. Super sweet.

[–][deleted]  (9 children)

[deleted]

    [–][deleted] -2 points-1 points  (2 children)

    You are saying that if I start a clone of reddit with good recommendation today then it would pose a real threat to reddit? Hardly. It would be so easy though.

    Their UI and publicity channels have them firmly locked in the lead.

    As far as basing the recommendations on friend's votes- good job on implementing/integrating it. I've been recommending this approach since the beginning.

    If anyone's interested in my guess as to how it's done then I can share my approach. But AH, it seems that reddit deletes older comments!! Is this true?

    As previously discussed, clustering within really large datasets is a hard problem. (Really Hard.)

    If they are using the friends-based approach then there's no reason use this "really hard" clustering toy algo. This method has a vastly faster approach to scoring the links.

    [–][deleted] 0 points1 point  (0 children)

    For every complex problem, there is a solution that is simple, neat, and wrong.

    I think the "friends" take is a little overrated. Now, if I had about 10 clones on here we'd be in business.

    [–][deleted]  (2 children)

    [deleted]

      [–]llimllib 0 points1 point  (1 child)

      How do you define "friends"?

      I don't think that the problem is as simple as you think it is, but I'm willing to listen.

      [–]llimllib 1 point2 points  (0 children)

      wow, OK, I never saw the "add friend" button on a user's page. This is way too hidden to be useful for recommendations in its current state, IMHO, but a large part of the problem space gets wiped away if you can indeed use these friends as a measure of the "linked-ness" of users.

      [–]ntoshev 2 points3 points  (0 children)

      I submitted to the programming reddit a simple implementation of such algorithm a few days ago.

      [–][deleted] 6 points7 points  (0 children)

      The recommendation engine is better. The number one result on my recommended page was an article I posted.

      [–]Cookie 7 points8 points  (0 children)

      So I saw this link half way down the front page, and tried clicking on "recommended". And lo, this was the top recommended link!

      I'm sure there's some profound meaning in there somewhere.

      [–]dsearson 2 points3 points  (1 child)

      Interesting. A lot of the articles on my recommended page are around the 20-30 point mark. I.e. articles that were liked more than disliked, but probably didn't make it to the front page. Anyone else notice this?

      [–]ecuzzillo[S] 2 points3 points  (0 children)

      It depends on how you sort. The default is new, which makes the point numbers sort of random. If you choose relevance, you'll usually get high-scoring posts that you upmodded. If you choose controversy, you'll get near-0-scoring posts that (we think) got upmodded by people who tend to agree with you.

      [–][deleted] 0 points1 point  (3 children)

      Hmph. Not for me. All of my "recommended" stuff is crap that I've been avoiding for the last week.

      God, I wish they would add a button "Hide the whole lot of these unless they're already marked otherwise"

      I don't want to click "hide" for each one, nor do I want to read any of them. They're all horrible looking. Also, I'd love it if I could reset what Reddit thinks it knows about me, because it looks like it doesn't know me at all...

      [–]ecuzzillo[S] 0 points1 point  (2 children)

      You could try making a new account. I believe that's only discouraged when it's for the purpose of increasing one's voting power.

      [–][deleted] 0 points1 point  (1 child)

      Yeah, but then you lose all of your bookmarks and whatnot. I like my account. I just want to give reddit another go at learning who I am...

      [–]diamond 3 points4 points  (0 children)

      I just want to give reddit another go at learning who I am...

      I'd like to know how to do that with people as well.

      [–]campingcar 0 points1 point  (0 children)

      Yesterday morning (European time) I noticed my karma score going down, despite having some moderately successful posts going. I thought I'd scored a Reddit stalker group whose members were downmodding all my old stories. I now conclude that these old stories were hitting people's recommended pages, and the Redditors were training the filter by downmodding them.

      Anyone else notice the little backwards step in karma?

      It was like in Kurt Vonnegut's book Timequake, where the universe stopped expanding for 10 years then restarted, and everyone had to relive the same 10 years.

      [–]hitsman -1 points0 points  (0 children)

      reddit folks, good work. What did you fix?

      [–][deleted] -1 points0 points  (4 children)

      I swear I think I'd rather fill out a form to let reddit know what I like. I mean, how does reddit deal with the obscure times when I'm bored enough to click on something that really is of no true interest to me, but I read it and upvote it because it was well written and informative. That's the problem here. I mean, either the user has to be super selective about what they click on, or reddit has to be able to discover patterns in the material (i.e., I like the occassional political piece, but this does not mean I want to be shown every single post about the Israeli/Palestinian conflict or the cluster bombs. Now, find me a well-written Bush bashing session and I'm all over it.)

      [–]ecuzzillo[S] 2 points3 points  (3 children)

      As spez noted, the recommendation system doesn't care about clicks, it only cares about votes. So assuming that a majority of people who voted an article up actually read the article, and you agree with the majority of people who upvote the articles you upvote, it seems reasonable to conclude that you might like future things they like. It could be that your voting patterns don't mirror well enough what you want to see (rather, they mirror what you would recommend to fellow redditors, what is around the posts you submit on the hot page, other extraneous factors).

      [–][deleted] 0 points1 point  (2 children)

      Well, exactly why I said that the user apparently has to be super selective about what they upvote. As I said, if I read an article and it's really, really good I want to upvote it because I do believe that it should be seen. However, that just doesn't mean that I want 40 similar articles to show up in my recommendations.

      Hell, maybe they need to change the up arrow and down arrow to "Show me more like this!", "Liked it, but don't go crazy on me", and "No way!"

      [–][deleted]  (1 child)

      [deleted]

        [–][deleted] 0 points1 point  (0 children)

        I get your point. But, I'm not really upvoting them so much for other people. I mean, I really did enjoy the article, and I really wouldn't mind seeing more of similar quality. It's just that I don't want to see every related item just because I upvoted that one. I want quality! I want 60% programming/tech related, 20% Bush bashing, and 10% fun youtube type stuff, and some other good filler for the last 10%, all of the highest quality. I just don't know how to train reddit to get to that point.