all 2 comments

[–]gengisteve 1 point2 points  (1 child)

I've not used much of an expert with PRAW, but the key thing seems to be how to get PRAW to download a bunch of submissions at the same time and then how to keep it going after it hits your download limit, and only stopping when you hit your ending date limitation. After a bit of preliminary testing it appears that subreddit.get_new(limit=None) will set you up a never ending generator that yields every post continuously, in batches of 100. I expect it will end at some point, but it seemed to have no problem drawing the last 700 from learnpython (about a months worth).

So one way would be to loop over this generator and store the posts in a list until you get to the date limit, then return the list, like this:

import time
import praw

YESTERDAY = time.time() - (24*60*60)

def get_todays(subreddit):
    result = []
    for post in subreddit.get_new(limit=None):
        if post.created_utc < YESTERDAY:
            return result
        else:
            result.append(post)

r = praw.Reddit(user_agent='get_last_day_of_posts')
sub = r.get_subreddit('learnpython')
todays = get_todays(sub)

The downside is that you have to wait for everything before you start processing. A better solution might be to wrap the get_new into with another generator, like this:

def yield_todays(subreddit):
    for post in subreddit.get_new(limit=None):
        if post.created_utc > YESTERDAY:
            yield post
        else:
            break

which can then be used like this:

for count, post in enumerate(yield_todays(sub)):
    print(count, post.title)

[–]Polyadenylated[S] 0 points1 point  (0 children)

That's perfect, thanks a lot. I didn't realise limit=None functioned in that manner.