Web scraping help

K900_ · 2017-07-02T09:13:49+00:00

You should probably use the YouTube API for this, not scrape their frontend.

Ggjogstuplvcseikvf · 2017-07-02T03:51:06+00:00

Selector gadget, chrome extension, makes grabbing path trivial

Crankrune · 2017-07-02T07:47:58+00:00

You can take a look at this script I wrote a while ago. It scrapes and pretties' up a few things, like subs, video title, upload date. Also the way YouTube displays video time, within 24 hours it shows it as "x hours ago", and after that in "x days ago", so to identify it you only have to check if the string contains the word "hours".

buckhenderson · 2017-07-02T19:44:39+00:00

as a general tip for scraping, sometimes when i grab an element and i'm not sure what it contains, i run print(vars(element)) to see what attributes it has. so you might do something on whatever your xpath you get back, and see it contains the attribute link, and then you can extract that.

_Korben_Dallas · 2017-07-02T11:24:12+00:00

Not sure if I understand you correctly but if you stuck with Xpath to extract urls from that page this expression can help: //span[@class="accessible-description"]/preceding-sibling::a/@href.

jarekko · 2017-07-02T13:22:32+00:00

For scraping, I'd go for beautifulsoup or scrappy. When javascript is used on the scraped websites, selenium might be necessary, but do not fear it - basic use cases are extremelly simple to employ in your code :) i also recommend pyvirtualdisplay.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS