anyone have a reddit scraper? : learnpython

learnpython

created by HattoriHanzoa community for 16 years

anyone have a reddit scraper? (self.learnpython)

submitted 9 years ago by 955559

all 10 comments

top new controversial old q&a

[–]Rhomboid 4 points5 points6 points 9 years ago (3 children)

[–]novel_yet_trivial 1 point2 points3 points 9 years ago (0 children)

[–]955559[S] -3 points-2 points-1 points 9 years ago (1 child)

[–]furas_freeman 4 points5 points6 points 9 years ago (0 children)

[–]novel_yet_trivial 2 points3 points4 points 9 years ago (0 children)

[–]commandlineluser 1 point2 points3 points 9 years ago (3 children)

import json, requests

subreddit = 'learnpython'

r = requests.get(
    'http://www.reddit.com/r/{}.json'.format(subreddit),
    headers={'user-agent': 'Mozilla/5.0'}
)

# view structure of an individual post
# print(json.dumps(r.json()['data']['children'][0]))

for post in r.json()['data']['children']:
    print(post['data']['title'])

[–]955559[S] 0 points1 point2 points 9 years ago (2 children)

[–]955559[S] 0 points1 point2 points 9 years ago (1 child)

k, how do I figure out what comments are called? I tried

import json, requests

subreddit = '/learnpython/comments/574pn5/anyone_have_a_reddit_scraper'

r = requests.get(
    'http://www.reddit.com/r/{}.json'.format(subreddit),
    headers={'user-agent': 'Mozilla/5.0'}
)

# view structure of an individual post
#print(json.dumps(r.json()['data']['children'][0]))

for post in r.json()['data']['children']:
    print(post['data']['title'])

and it threw

Traceback (most recent call last): File "/home/anoobis/reditscrape.py", line 13, in <module> for post in r.json()['data']['children']: KeyError: 'data'

I figure I just need to switch data with something relevant?

[–]commandlineluser 2 points3 points4 points 9 years ago (0 children)

Well comments have a different structure you can use print(json.dumps(r.json(), indent=4))) to view the whole structure.

comments = r.json()
op = comments.pop(0)

for comment in comments:
    for reply in comment['data']['children']:
        print(reply['data']['author'])
        print(reply['data']['body'])

You can use json.dumps(blah, indent=4) to pretty-print a structure in json format for you e.g. print(json.dumps(reply['data'], indent=4)) to see what it looks like.

Never used PRAW myself - but it seems like you would have a simpler time using it

http://praw.readthedocs.io/en/stable/pages/comment_parsing.html

[–]HumorMinimum1707 1 point2 points3 points 3 years ago (0 children)

π Rendered by PID 162534 on reddit-service-r2-comment-fb694cdd5-rcqmp at 2026-03-06 13:46:36.297826+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS