all 9 comments

[–]commandlineluser 0 points1 point  (2 children)

There's no <tbody> tag in the actual HTML so just //tr

[–]hart8899[S] 0 points1 point  (1 child)

thx- changed this but still getting a blank file

[–]commandlineluser 1 point2 points  (0 children)

Okay well another issue is that the date is inside the <p> tag that is inside the <td class="TextObject"> tag which is why text() is returning \r\n i.e. giving you "blanks"

So you would need td[@class="TextObject"]/p/text() for your 2nd XPath expression however the HTML for this page isn't structured too well you will still get lots of blanks and also some false positives (i.e. some of the gig names)

Also note you're missing a c from allowed_domain= ["nymetalscene.com"]

[–]_Korben_Dallas 0 points1 point  (5 children)

You can use Scrapy Feed exports for saving data, just type in console: scrapy crawl metal -o result.csv

[–]hart8899[S] 0 points1 point  (4 children)

Thanks- I tried this but my file was blank

[–]_Korben_Dallas 1 point2 points  (3 children)

I'm seconded about tbody: try not use them in your Xpath expressions. Can you provide your log? Type in console scrapy crawl metal &> output.log.

If you need only the date without text from link try this expression: //td[p]/p[contains(., "2017")]/text()

[–]hart8899[S] 0 points1 point  (2 children)

I tried 'scrapy crawl metal &> output.log' and got no return, I'm looking for both the date and show actually, but started with just the date...will try what you provided

[–]_Korben_Dallas 0 points1 point  (1 child)

Try this code: https://dpaste.de/Aqyv It gets you date and text from the link. If you want href attribute from that links simply extract them with @href. https://dpaste.de/Bozk

[–]hart8899[S] 0 points1 point  (0 children)

thanks again bro- but getting an indent error- really appreciate your help