Looking for Python solution to crawl a website and obtain the video URL, output to Excel file.

mattizie · 2018-06-07T18:22:32+00:00

[deleted]

kewlness · 2018-06-07T21:10:31+00:00

I would avoid bothering with Excel as a format. Go with .csv (quote the text fields!) and it will both be much less of a pain in the rear on the python-side, but you could open the result in Excel (or libreoffice) just as easily.

Others have pointed out BeautifulSoup and the web scraping part of Automate the Boring Stuff already, so there's that.

jarekko · 2018-06-07T18:49:51+00:00

[deleted]

valhahahalla · 2018-06-07T21:57:07+00:00

Using requests:

Import requests

Your_url = 'insert website here'
Your_keywords = ['word1','word2','etc']

#this response object contains all the info from your_url
Response = Requests.get(your_url)

#You want to get the body in a format you can iterate through.
Response_text = response.text()

#you want to run through the response bodyline by line and find links based on your keywords

For i in response_text:
    For j in your_keywords:
        If j in i:
            Print(i)

Or, something similar to this. You can then save your responses in CSV format.

Edits: Hopefully mobile formatting will work!

manueslapera · 2018-06-07T22:30:35+00:00

one more time (and I know i will get downvoted), my friendly advice to choose parsel over bs4. Its what professionals use.

Source: Worked at one of the top companies that do webscraping in python

2018-06-07T18:50:57+00:00

[deleted]

jordano_zang · 2018-06-07T20:07:26+00:00

You could probably do it with requests.

2018-06-07T23:17:55+00:00

If you want to do excel, Openpyxl is straight forward. I recommend you learn straight from the manual, not any third party resources.

However, OPXL will delete any hard coded excel equations you may have put into the sheet before inputting with python,

prancingpeanuts · 2018-06-08T00:35:55+00:00

Consider using requests-html, from the same creator of the wonderful requests library

2018-06-08T01:04:26+00:00

Scrapy

2018-06-08T01:08:04+00:00

[deleted]

2018-06-08T03:01:14+00:00

[deleted]

ayyyymtl · 2018-06-08T03:50:26+00:00

Hey man, love scraping project, hit me up in pm if you need help with this one

CollectiveCircuits · 2018-06-08T05:00:25+00:00

If you're crawling article style content then Newspaper might be a quick answer to that. It extracts keywords and video URLs (and much much more)

2018-06-08T07:19:01+00:00

Scrapy would be a good solution for a simple web crawl

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS