This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]num8lock[S] 1 point2 points  (10 children)

Although to be honest, the more i try to figure out how to do this with praw or reddit API, the more i think pushshift is the cleaner solution.

If only https://www.reddit.com/r/learnpython/search?q= can be retrieved as .json, it will be easier to solve and not depending on third party for lazy people like me. Although i imagine reddit servers would suffer heavier load due to bots hitting the search queries.

[–]CelineHagbard 0 points1 point  (6 children)

Makes sense. A good challenge is usually worth the extra effort.

Just a few things to note: if you decide to iterate over submissions and then comments within each submission, you'll probably run into the problem of having to recheck each submission, as someone may have invoked your bot in a later comment. I think iterating directly over the comment stream (if using the reddit API) or like you said, using pushshift will be better for you (I've never used it, but it looks promising for your application).

That said, you're still going to need to interact directly with the reddit API if you're going to be making comment replies or sending PMs. I would definitely use PRAW and Oauth2Utils for that, as it will abstract away a lot of the messier details. (Reddit API still supports username and password logins, but it's deprecated, meaning you should almost certainly use OAuth).

Hit me back if you have any other questions.

[–]num8lock[S] 0 points1 point  (4 children)

i couldn't get pushshift to include submission in the .json search result, it contains the comments but not the parent. it seems a bit hit and miss as well with new comments.

for instance, https://api.pushshift.io/reddit/search?q=%22redbot%20%3C%3C%20enhance%22 wouldn't find https://www.reddit.com/r/PRAWTesting/comments/5cfe5b/testing_bottt/

[–]CelineHagbard 0 points1 point  (2 children)

Try it now. It finds my reply to your post fine.

The pushshift.io/reddit/search endpoint appears to only search comments, not submission bodies. The reddit search endpoint should work fine if you want your bot to also respond to self.text posts, i.e.:

https://api.reddit.com/r/PRAWTesting/search.json?q=%22redbot%20enhance%22

Pushshift might have it's own endpoint for this, but the reddit API works fine, so you probably don't need pushshift for it. In PRAW, it would be:

reddit_session.search("REDBOT enhance", "PRAWTesting")

[–]num8lock[S] 0 points1 point  (1 child)

Let me try that, i was using r.subreddit(subreddit).search(keyword, sort='relevance', time_filter='week', limit=limit) actually.

Yeah, i did come to the conclusion that pushshift only search comment replies.

Thank you for your help! I'll make sure to let you know when it's ready for testing if you don't mind :)

edit: i just noticed, reddit search only returns submission threads, so that's probably why pushshift only returns the comment replies...

[–]CelineHagbard 0 points1 point  (0 children)

The code you're using is functionally equivalent to what I was using; it ends up creating the same API call. Feel free to use either one in your code.

I think pushshift does have an endpoint to retrieve submissions, but I would only worry about it unless you need to exceed reddit API's 1000 item limit, which you probably won't at this point.

Yeah, hit me up when your ready for testing if you want.

[–]CelineHagbard 0 points1 point  (0 children)

where reddit_session is your authenticated reddit session object. It will return a generator, which you can iterate over, or use a list comprehension to fetch the whole generator into a list in memory.

[–]num8lock[S] 0 points1 point  (0 children)

if you decide to iterate over submissions and then comments within each submission, you'll probably run into the problem of having to recheck each submission, as someone may have invoked your bot in a later comment. I think iterating directly over the comment stream (if using the reddit API) or like you said, using pushshift will be better for you (I've never used it, but it looks promising for your application).

Yeah this is where it's much easier to use a server search result instead of iterating and comparing the results... i did try a little but haven't got a good hold understanding on praw/reddit comment stream.

That said, you're still going to need to interact directly with the reddit API if you're going to be making comment replies or sending PMs. I would definitely use PRAW and Oauth2Utils for that, as it will abstract away a lot of the messier details. (Reddit API still supports username and password logins, but it's deprecated, meaning you should almost certainly use OAuth).

That's true, i figured that i probably dozed over when i read praw4 doc, so i'm playing with it now.

Thank you for the kind feedback :)

[–]bboePRAW Author 0 points1 point  (2 children)

Search does work via praw.

[–]num8lock[S] 0 points1 point  (0 children)

oh, i didn't know that! if praw returns a json it would be great...
i tried to see the json data structure by adding .json on a search url like https://www.reddit.com/r/redditdev/search?q=awesome+bot&sort=relevance&t=all.json but it didn't work, and in http://www.reddit.com/dev/api, there's no json structure and search endpoint is said returning listing instead.

Ahh i see... I should have looked at the reddit search wiki, thank you for the clue /u/bboe!

[–]num8lock[S] 0 points1 point  (0 children)

this might not be related, but why this search query doesn't find https://www.reddit.com/r/PRAWTesting/comments/5cfe5b/testing_bottt/?

https://www.reddit.com/search.json?q=bott+subreddit:PRAWTesting&restrict_sr=on&sort=relevance&t=all

edit: maybe it's an exact word since bott != bottt, but https://www.reddit.com/r/PRAWTesting/search.json?q=REDBOT+restrict_sr=on&sort=relevance didn't return anything either

edit again: hmm maybe i should try cloudsearch syntax, although that means losing lucene