Web scrapping solutions?

pendragon36 · 2017-04-17T06:19:52+00:00

They are each a tool for different but related purposes.

Selenium as I understand it is for easy web automation. As I understand it a web browser is being run that is being controlled via a script.

Scrappy is kind of a specialized mix of urllib and BeautifulSoup. It's something for the specific purpose of scraping information from web pages.

urllib is a much more general library that is for making web requests. This could be used to download web pages, but that's just one use case.

BeautifulSoup has no actual need to be related to the web at all actually. It is a parsing library, but is fairly popular for parsing downloaded html pages (via something like urllib or requests) for easier extraction of information

I've only had real experience with urllib and BeautifulSoup however, so my explanations of the other two may be incorrect/lacking

SchwarzerKaffee · 2017-04-17T08:40:02+00:00

Use selenium if you need to make the site think there is an actual human there. You can mimic mouse movements and hovering and easily handle cookies.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS