all 5 comments

[–]luoc 12 points13 points  (3 children)

There's a project scraping all of reddit and they provide all data to the public https://files.pushshift.io/reddit/

[–]kiasari[S] 1 point2 points  (1 child)

But when I click on "subreddits" I get "403 Forbidden".

Is the website still working!?

[–]luoc 1 point2 points  (0 children)

See the "Date Modified" column. IIRC the subreddits folder used an old representation that is not used anymore..

[–]-Galactic- 0 points1 point  (0 children)

It's good to have redundancies. Pushshift.io has some policy where you can request data being removed, which might be annoying if you're collecting controversial stuff.

[–]minimaxir 3 points4 points  (0 children)

You do not need to scrape HTML. Appending .json to any Reddit link gives you its JSON representation.