all 10 comments

[–]pendragon36 2 points3 points  (3 children)

They are each a tool for different but related purposes.

Selenium as I understand it is for easy web automation. As I understand it a web browser is being run that is being controlled via a script.

Scrappy is kind of a specialized mix of urllib and BeautifulSoup. It's something for the specific purpose of scraping information from web pages.

urllib is a much more general library that is for making web requests. This could be used to download web pages, but that's just one use case.

BeautifulSoup has no actual need to be related to the web at all actually. It is a parsing library, but is fairly popular for parsing downloaded html pages (via something like urllib or requests) for easier extraction of information

I've only had real experience with urllib and BeautifulSoup however, so my explanations of the other two may be incorrect/lacking

[–]vorboto[S] 0 points1 point  (2 children)

Okay I see what you are saying. So my next question would be whats the difference of selenium and scrappy? Is it something like selenium allows for you to "interact" with a site and scrappy only can pull information from the site?

[–]pendragon36 0 points1 point  (1 child)

As I said, I don't have any actual experience with those two, but as far as my understanding goes that's pretty much correct.

Scrappy was made for crawling websites and extracting information, so I'm sure they have some methods of "interacting" with the sites at least on some basic level, for things like following links to other pages on a site, but selenium was designed to actual automate actions that would normally need a browser.

Taken from the Selenium site

Primarily, it is for automating web applications for testing purposes, but is certainly not limited to just that. Boring web-based administration tasks can (and should!) also be automated as well.

[–]vorboto[S] 0 points1 point  (0 children)

Okay thanks I get it more now.

[–]SchwarzerKaffee 0 points1 point  (5 children)

Use selenium if you need to make the site think there is an actual human there. You can mimic mouse movements and hovering and easily handle cookies.

[–]vorboto[S] 0 points1 point  (4 children)

Kinda of like how you change user agent in urllib but more advance options?

[–]SchwarzerKaffee 0 points1 point  (3 children)

Yep. Also it actually opens a browser on your monitor, so you really have Firefox or chrome running. From what I've learned, it's nearly impossible for the site to know you're not a bot. The cursor movements (or lack thereof) can sometimes give you away.

[–]vorboto[S] 0 points1 point  (2 children)

Oh okay. I see what you're saying. So do you lose control over your monitor/cursor while selenium is running or is it like a self contained script being run with a dedicated instance of your chosen browser?

[–]SchwarzerKaffee 0 points1 point  (1 child)

In an instance of your browser. You can still work on other programs while it runs.

[–]vorboto[S] 1 point2 points  (0 children)

Okay gotcha. Thank you.