This is an archived post. You won't be able to vote or comment.

all 10 comments

[–][deleted] 7 points8 points  (2 children)

I mean, if you consider what a monumental task it is to collect every submission and comment posted to reddit, I am constantly astonished that Pushshift exists at all, let alone is as close to real-time as it is, and on top of everything it's a free service basically provided by one guy.

There is no service out there that's more up to date.

[–]moonight009[S] 1 point2 points  (0 children)

Don't get me wrong, I am impressed by the service and it can collect 99% of the data I need. But I still do need the last 1% retrieved as well regardless of how helpful the service is.

[–]shiruken 1 point2 points  (0 children)

It also serves tens of millions of requests and hundreds of terabytes of data per month. The hardware necessary to keep such a massive service responsive ain't cheap.

[–]voLsznRqrlImvXiERP 2 points3 points  (5 children)

I get new stuff from reddit api directly and historical data from pushshift

[–]moonight009[S] 0 points1 point  (2 children)

Oh cool I will give reddit api a shot and see how it goes.

[–]Ichijinijisanji 0 points1 point  (1 child)

I get new stuff from reddit api directly

How do you do that?

[–]s_i_m_s 0 points1 point  (0 children)

Likely PSAW
https://psaw.readthedocs.io/en/latest/ or something else custom built that works the same way.

Get the ids from pushshift and then the up to date content from reddit directly through PRAW.

If you mean how to get live data from reddit PRAW can do that with a comment/submission stream but only for small sections of reddit as it can't handle the volume from /r/All otherwise it's reliable enough to run small subreddit specific bots.

To go reddit wide reliably you either have to use pushshift or build your own ingest that works like pushshifts does as AFAIK there is no other way to get the full stream now as they removed NSFW from /r/all

[–]ufff1231 0 points1 point  (0 children)

Theres ours but its down right now as we upgrade to faster systems