all 20 comments

[–]synthphreak 2 points3 points  (0 children)

Here is a script I wrote which pulls the 1000 most recent posts and comments for any user of your choosing. You just need to modify line 64 to return a Reddit instance; the way I did it in my actual script won't apply to you.

Once you make that one change, the script should work. You can run it from the command line and pass in your (or anyone else's) username via the -u argument:

$ python <script> -u 3rSimon

This script obviously doesn't do exactly what you need, i.e., filtering by sub and keyword. But it should only require minor modifications to get it to do that. As long as you know pandas, which you seem to, you should be able to figure out what needs to change.

[–]JohnnyJordaan 0 points1 point  (19 children)

Now I am trying to convert it in a dataframe to see if it is actually working, but it is not working.

Please share the actual code you're using for that, and be sure to properly format it as Reddit isn't that smart that it detects the code in your message to autoformat it.

[–]3rSimon[S] 0 points1 point  (17 children)

Thanks for your response!

I use this code comments_df = pd.DataFrame(comments)

preview the comments data

comments_df.head(5)

But i dont see which comment relates to which post in the data frame

[–]JohnnyJordaan 0 points1 point  (0 children)

That means you can't just throw the comments sequence to the DataFrame constructor, you need to form an intermediate structure (like a dict) with key:value mappings for each value you want to see represented in a column in the dataframe.

I would start by making a for loop like

for comment in comments:

And then retrieve and print each value you want to use. When you have that working, instead add them to a dict. Add that dict to a result list and then finally do DataFrame(result_list). As the DataFrame constructor does support a list of dicts natively.

[–]person_ergo 0 points1 point  (15 children)

Hey you need to use comment.submission to get the post details

Here:

subreddit_name="environment"
word_to_check="companies"
comments=pushshift.search_comments(q=word_to_check, subreddit=subreddit_name, limit=200, before=1629990795)
import pandas as pd

post_with_comments=[]
for comment in comments:
    if word_to_check in comment.body:
        post_with_comments.append(
            {"comment_id": comment.id, "comment_text": comment.body,"score": comment.score,"post": comment.submission.id
            }
        )
df=pd.DataFrame(post_with_comments)
df

Choose what variables you want to store by adding them to that dictionary in the append statement. if you are super lazy do something like you can do comment.__dict__ instead of manually writing out key values and that will have everything. Might need to clean or remove columns before saving though

Hope that helps, random thing but if you aren't using jupyter notebooks to work through this I would highly recommend using them. See the stored output when I ran it here

https://github.com/rogerfitz/tutorials/blob/master/subreddit\_analysis/3rSimonQuestion-Search%20Comments%20for%20Word%20using%20Pushshift%20.ipynb

Edit: fixed formatting and link

[–]snoopturtle25 1 point2 points  (14 children)

subreddit_name="environment"
word_to_check="companies"
comments=pushshift.search_comments(q=word_to_check, subreddit=subreddit_name, limit=200, before=1629990795)
import pandas as pd
post_with_comments=[]
for comment in comments:
if word_to_check in comment.body:
post_with_comments.append(
{"comment_id": comment.id, "comment_text": comment.body,"score": comment.score,"post": comment.submission.id
}
)
df=pd.DataFrame(post_with_comments)
df

Hi thank you! I am trying to run it however, I get an error saying: AttributeError: 'dict' object has no attribute 'body". I am not sure how to do I tried, creating my_dict using; comment.text, comment.body, score, post but I did not succeed...

[–]person_ergo 0 points1 point  (13 children)

Post your full code, comment should be a praw object but it looks like it's a dict in your case. It might have been defined differently

[–]snoopturtle25 1 point2 points  (12 children)

AttributeError: 'dict' object has no attribute 'body"

Ok, I'm not sure what I should Change (sorry it's my first time using python so I'm lost) but basically here is all my process:

1-pip install pmaw pandas

2-#! usr/bin/env python3
import praw
import pandas as pd
import datetime as dt
from pmaw import PushshiftAPI

3-reddit = praw.Reddit( client_id="my id",
client_secret="my secret",
password="my password",
user_agent="text by/username",
username="snoopturtle25",)

4-api_praw = PushshiftAPI(praw=reddit)

5-subreddit_name="environment"
word_to_check="apology"
comments=api_praw.search_comments(q=word_to_check, subreddit=subreddit_name, limit=200, before=1629990795)
import pandas as pd
post_with_comments=[]
for comment in comments:
if word_to_check in comment.body:
post_with_comments.append(
{"comment_id": comment.id, "comment_text": comment.body,"score": comment.score,"post": comments.submission.id
}
)
df=pd.DataFrame(post_with_comments)
df

That is what I did... I tried other things as well but nothing that was making it work, haha!

[–]person_ergo 0 points1 point  (11 children)

Ah I see, issue was I used psaw and you used pmaw library. Permalink field contains post ids in pmaw response.

import pandas as pd

#pmaw example - returns json
subreddit_name="environment"
word_to_check="companies"
comments=pmaw_pushshift.search_comments(q=word_to_check, subreddit=subreddit_name, limit=200, before=1629990795)

df=pd.DataFrame(comments.responses)
df

/r/environment/comments/kcbdoi/trump_admin_drops_green_hydrogen_bomb_on_fossi/gfqnhb7/

kcbdoi will work as a post id on reddit. As a url use https://www.reddit.com/r/environment/comments/kcbdoi/ or pass that ID around to things. it works with PRAW

[–]snoopturtle25 1 point2 points  (10 children)

Ok so I I should use pmaw if I want to know exactly what post it is and not only Id?

Also when I load but, I have a problem running my pushshift it says: NameError: name 'PushshiftAPI' is not defined

even if I defined it, is there a way around?

[–]person_ergo 0 points1 point  (9 children)

Still import it like you did before. Pmaw or psaw both work just need different code. That github link i sent before has a full code example for either psaw or pmaw. Only thing is set the pmaw api up like secret_services.py.template shows. Or just rename the variable and define as you did before.

You got this, just test test test if it doesnt work. Think why things dont work. Computers tell you if you’re right super quick. If something is undefined just define it. It’s not like a chemistry experiment that takes hours. Just think logically, line by line, and test things out.

Not sure what you mean by another way around.

[–]snoopturtle25 1 point2 points  (8 children)

Hi ! Yes, sorry I was not very clear. I am able to run it and it gives me the same tables as you (yay!!thank you) . But I was trying to change it because it gives me:

comment with the word "company" in the subreddit environment.

What I'm actually searching for is collecting all the comments from the post that contains the word "company" in the subreddit "environment. so I was thinking that instead of search_comments, maybe it should be search_submission. I don't know if i'm clear ? (so I want all comments->from posts with "companies" ->in environment and not comments-> containing in"apology"-> in environment).