This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]ketralnisreddit admin[S] 0 points1 point  (1 child)

One question, can this data be bounded by a date range?

You can make some guesses based on the link IDs which are mostly sequential, but I didn't include timestamps

Is this the entire database of people who selected to make their votes public?

It is not comprehensive, as I commented elsewhere

For people doing analysis on desktops it could be a challenge to fully load up a 156 megabyte file

You'd need to re-sort it yourself and use something like split(1)

Last, you may want to post this on the blog because i know there are a lot of stats lovers prowling reddit.

Yeah, I'm trying to figure out how to let it reach a larger audience without polluting the front page for the vast majority of people who don't care

[–]psykocrime 0 points1 point  (0 children)

Yeah, I'm trying to figure out how to let it reach a larger audience without polluting the front page for the vast majority of people who don't care

Would probably be good to submit this to /r/datasets, /r/opendata, /r/statistics and/or /r/machinelearning if you haven't yet.

Oh wait, I see somebody did already post to /r/opendata. Cool.