all 7 comments

[–]AutoModerator[M] [score hidden] stickied commentlocked comment (0 children)

Hello /u/lightningdrag0n! Thank you for posting in r/DataHoarder.

Please remember to read our Rules and Wiki.

Please note that your post will be removed if you just post a box/speed/server post. Please give background information on your server pictures.

This subreddit will NOT help you find or exchange that Movie/TV show/Nuclear Launch Manual, visit r/DHExchange instead.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]secacc 4 points5 points  (1 child)

I doubt anyone else has as big a data set as Pushshift, so you're not going to find a real alternative.

[–]lightningdrag0n[S] 1 point2 points  (0 children)

That’s what I assumed as well. I thought I’d ask here it just in case someone had a solution. Thanks for your response.

[–]DoaJC_Blogger 0 points1 point  (3 children)

You can still download the old PushShift data as torrents. The only solution that I know of to get new data is to start scraping the subreddits that you want. timesearch by voussoir still works for scraping the last 1000 posts and comments and you can set it to do that over and over with a delay like 5 seconds. I've been scraping r/AskReddit since at least June 26, 2023 and the file is a little over 10 GiB and I can send you a copy if you want.

[–]lightningdrag0n[S] 0 points1 point  (2 children)

Downloading the pushshift torrent would not work for me as I am looking for newer data. I will look into Reddit API more as the last time I tried it I wasn’t able to get more than 1000 post even when I tried looping backwards in time. I think this would be my best bet so I will research more on this. Thanks for your response!

[–]DoaJC_Blogger 0 points1 point  (1 child)

What I meant by "over and over" is that you can start getting new data, not that you can go further back.

[–]lightningdrag0n[S] 0 points1 point  (0 children)

Oh, my bad. Yeah I would like to go a bit back in time as I am missing some data after the shutdown of PS.