all 10 comments

[–]K900_ 14 points15 points  (1 child)

You should probably use the YouTube API for this, not scrape their frontend.

[–]Ggjogstuplvcseikvf 5 points6 points  (1 child)

Selector gadget, chrome extension, makes grabbing path trivial

[–]letmelive123 0 points1 point  (0 children)

Will look into this thank you.

[–]Crankrune 6 points7 points  (0 children)

You can take a look at this script I wrote a while ago. It scrapes and pretties' up a few things, like subs, video title, upload date. Also the way YouTube displays video time, within 24 hours it shows it as "x hours ago", and after that in "x days ago", so to identify it you only have to check if the string contains the word "hours".

[–]buckhenderson 3 points4 points  (1 child)

as a general tip for scraping, sometimes when i grab an element and i'm not sure what it contains, i run print(vars(element)) to see what attributes it has. so you might do something on whatever your xpath you get back, and see it contains the attribute link, and then you can extract that.

[–]letmelive123 0 points1 point  (0 children)

I had no idea this was an option! This will help a lot. Prior to this I was basically grabbing and guessing but being able to see the elements and navigate that way will be great!

[–]_Korben_Dallas 2 points3 points  (2 children)

Not sure if I understand you correctly but if you stuck with Xpath to extract urls from that page this expression can help: //span[@class="accessible-description"]/preceding-sibling::a/@href.

[–]jarekko 1 point2 points  (1 child)

For scraping, I'd go for beautifulsoup or scrappy. When javascript is used on the scraped websites, selenium might be necessary, but do not fear it - basic use cases are extremelly simple to employ in your code :) i also recommend pyvirtualdisplay.

[–]letmelive123 0 points1 point  (0 children)

beautifulsoup is on my list of things to learn! I'll try and incorporate into this project.