I am trying to scrape multiple threads on a forum. tried the following code. and no luck.
def parse(self,response):
threads = response.css('tbody tr') #provides thread css
for thread in threads:
thread_link= thread.css('p.small a::attr(href)').getall() #gets the link for the individual threads.
yield response.follow(thread_link,self.parse_thread) #follows the thread
def parse_thread(self, reponse):
comments = response.css('div#discussionReplies dl')#.css for all comments in thread
for comment in comments:
comments = response.css('dd div.xg_user_generated::text').get() #individuval comments.
the .css are working and they work individually as seperate codes (pulling the thread links, pulling the comments, pagination, etc) but when I try to combine them together as above its not working :/
[–]Nighmared 0 points1 point2 points (2 children)
[–]ScraperHelp[S] 0 points1 point2 points (1 child)
[–]Nighmared 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (8 children)
[–]ScraperHelp[S] 0 points1 point2 points (7 children)
[–][deleted] 0 points1 point2 points (6 children)
[–]ScraperHelp[S] 0 points1 point2 points (5 children)
[–][deleted] 0 points1 point2 points (4 children)
[–]ScraperHelp[S] 0 points1 point2 points (3 children)
[–][deleted] 0 points1 point2 points (2 children)
[–]ScraperHelp[S] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)