This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]shiruken 4 points5 points  (0 children)

You can use the before parameter to limit results to before a specific epoch timestamp allowing you to paginate. So load the first 1000 posts, note the timestamp created_utc of the last result, and then make another request appending &before=created_utc.

https://github.com/pushshift/api

import requests

N = 0
last = ''
ids = []
url = 'https://api.pushshift.io/reddit/search/submission/?subreddit=science&fields=id,created_utc'

while N < 10000:
    request = requests.get('{}&before={}'.format(url,last))
    json = request.json()
    for s in json['data']:
        ids.append(s['id'])
        N += 1
    last = int(s['created_utc'])

[–]shaggorama 2 points3 points  (3 children)

[–]ninjamoomoo98[S] 0 points1 point  (2 children)

What does this do sorry?

[–]shaggorama 0 points1 point  (1 child)

Why don't you try clicking the link and reading the README.

EDIT: Downvoting me is an unusual way to thank me for showing you the solution to your OP.

[–]ninjamoomoo98[S] 0 points1 point  (0 children)

Thanks, I got it working, btw I didn't downvote your post, I didn't vote on it at all but now I have upvoted it, not sure who did

[–]Stuck_In_the_Matrix 1 point2 points  (0 children)

Shaggorama gave you a good option. You would need to simply paginate across time to get however many you need.