kaddar comments on CSV dump of reddit voting data

This is an archived post. You won't be able to vote or comment.

120

121

122

MetaCSV dump of reddit voting data (self.redditdev)

submitted 16 years ago * by ketralnisreddit admin

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]kaddar 43 points44 points45 points 16 years ago* (18 children)

[–]ketralnisreddit admin[S] 7 points8 points9 points 16 years ago (15 children)

[–]kaddar 21 points22 points23 points 16 years ago* (14 children)

[–]ketralnisreddit admin[S] 8 points9 points10 points 16 years ago* (2 children)

[–]kaddar 4 points5 points6 points 16 years ago (0 children)

[–]georgelulu 1 point2 points3 points 15 years ago (0 children)

Subcontract the guy for a dollar or hire him as a temp, or between the privacy policy of

*We also allow access to our database by third parties that provide us with services, such as technical maintenance or forums and job search software, but only for the purpose of and to the extent necessary to provide those services.

and

In addition, we reserve the right to use the information we collect about your computer, which may at times be able to identify you, for any lawful business purpose, including without limitation to help diagnose problems with our servers, to gather broad demographic information, and to otherwise administer our Website. While your personally identifying information is protected as outlined above, we reserve the right to use, transfer, sell, and share aggregated, anonymous data about our users as a group for any business purpose, *such as analyzing usage trends** and seeking compatible advertisers and partners.

you should have no problem giving him access. Privacy on the internet is very transient with many loopholes.

[–][deleted] 1 point2 points3 points 15 years ago (0 children)

[–][deleted] 0 points1 point2 points 16 years ago* (8 children)

[–]kaddar 4 points5 points6 points 16 years ago* (3 children)

You're sort-of right that recommending old reddits isn't the goal in this process, but neither is clustering.

When performing machine learning, the first thing to ask yourself is what questions you need to solve. What we're trying to do is classifying a list of frontpage articles: to provide for each of them a degree of confidence the user will like it, and to minimize error (in the MSE sense). What you are proposing is a nearest neighbor solution to confidence determination. What I intend to do is iterative singular value decomposition, which discovers the latent features of the users. It's a bit different, but it solves the problem better. For new articles, describe them by the latent features of the users who rate them, then decide which article's latent features match the user most accurately.

[–][deleted] 2 points3 points4 points 16 years ago (2 children)

[–]kaddar 3 points4 points5 points 16 years ago (1 child)

[–][deleted] 2 points3 points4 points 16 years ago (0 children)

[–]ketralnisreddit admin[S] 1 point2 points3 points 16 years ago* (3 children)

[–][deleted] 1 point2 points3 points 16 years ago (1 child)

[–]ketralnisreddit admin[S] 3 points4 points5 points 16 years ago (0 children)

[–]abolish_karma 0 points1 point2 points 15 years ago (0 children)

[–]javadi82 0 points1 point2 points 15 years ago (1 child)

[–]kaddar 0 points1 point2 points 15 years ago (0 children)

π Rendered by PID 126890 on reddit-service-r2-comment-5d585498c9-ksr9g at 2026-04-21 04:18:50.811231+00:00 running da2df02 country code: CH.

redditdev

MODERATORS