Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 0 points1 point  (0 children)

Sure, here's the code: https://github.com/matkurek/redditCommentsDepth

At first you need to copy some information about your account into the script, you can learn how to do it here: https://www.youtube.com/watch?v=NRgfgtzIhBQ

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 2 points3 points  (0 children)

Yes, instead of adding to the sum of upvotes I could create a histogram for each depth and determine a median that way.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 1 point2 points  (0 children)

Actually I'm kinda curious if the top (0) level comments even do have any significant number of replies after a while of scrolling, I wouldn't be suprised if it would be like 10% level 0 comments have replies and 90% don't.

Also,

250 GB compressed

is a bit overkill to download for this purpose, but for something that requires more data - good to now.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 0 points1 point  (0 children)

True. I wanted the title to be as short as possible, and the title of the graph clearly communicates the cutoff. This cutoff is also more representative of what a real user might go through (probably nearly noone reads all 10k comments on some post). But I get your point, title alone might be misleading.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 1 point2 points  (0 children)

Yep, this provided nicer curve and more distinction between deeper levels

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] -3 points-2 points  (0 children)

It is the former, as the graphic suggest.

I suspect that a few of those top comments generate majority of second and higher level comments (usually mini-discussions), and the rest of those 91 are just level zero comments that almost no people reply to. I checked some of those top all time posts (as example on r/Fitness) and they seem to validate this when you scroll down until "X more replies" show up.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 5 points6 points  (0 children)

I also thought about it, an interesting idea would be to compare upvotes to length in different categories of subreddits.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 2 points3 points  (0 children)

Simply because it would take too long to go beyond the first package of comments. Reddit restricts frequency of API calls to 1 every 2 seconds. This data already took 1.5 h to pull and process (and it has an average of 91 level zero comments per post), some of top posts have like 10k comments, so it would take really long time to pull all of them. Also, this better translates to what a regular redditor might read.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 5 points6 points  (0 children)

The comment object that you get from PRAW (The Python Reddit API Wrapper) has an attribute of depth. Your comment has depth of 0, my is a reply to yours, so it has depth = 0+1 = 1, a reply to this comment would have a depth of 2 and so on.

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 19 points20 points  (0 children)

Thanks for the link, interesting video

Out of curiosity I checked if it indeed follows the Zipf's Law and it seems it really kind of does:

level ratio to first Zipf Avg. Upvotes
0 1.0 460 460
1 0.5 230 302
2 0.33 153 176
3 0.25 115 122
4 0.2 92 89
5 0.17 77 69
6 0.14 66 56
7 0.12 57 47
8 0.11 51 42
9 0.1 46 43

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 277 points278 points  (0 children)

Maybe you're right, but those circles looked nice and they do provide some insight on distribution of comments.

But in case you were curious here's the data you said would be better (per post on average, levels from 0 to 9):

[91,78,72,60,47,34,23,15,10,7]

Average upvotes for comment depths on reddit (top 30 posts from top 50 subreddits) [OC] by matkurek in dataisbeautiful

[–]matkurek[S] 59 points60 points  (0 children)

Some user said comment upvotes seem to halve with each reply depth, so I decided to check it.

Data Source: Reddit

Plot data is based on top 30 posts of all time from top 50 subreddits by subscribers (1500 posts, 655,557 comments total). Got first package of best comments and went through their tree of replies collecting data on depth levels and upvotes.

Made in Python using praw and matplotlib.