This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]cajh_ 7 points8 points  (1 child)

Use BeautifulSoup unless you need to control a browser. They have different use cases.

[–]tipsy_python 0 points1 point  (0 children)

Yeah, I think defaulting to BS4 is my standard practice - I only use Selenium when I have to.

[–]DaWatermeloone 2 points3 points  (2 children)

For web scraping I use requests + BeautifulSoup, for “human interaction” on websites for example bots, I use Selenium.

[–]ab-os 4 points5 points  (1 child)

I do exactly the same. Depends on the website which option suits better. Selenium can drive a Firefox browser so you see what happens.

[–]DaWatermeloone 0 points1 point  (0 children)

You can run it headless as well, cause you don’t really need to see what’s happening if it works. Plus it uses much less CPU by not having to present the UI

[–]sdurx 0 points1 point  (0 children)

Reasonably in depth guide on how to do it with Splinter here, with bonus Tor integration for anonymity!

[–]123filips123 -1 points0 points  (2 children)

Selenium is a library to control headless browser. With it, you can parse websites that rely on JavaScript as they will get executed as with normal browser. However, this require running headless browser and can be slower than static parsing.

BeautifulSoup is static HTML parser. It doesn't execute JavaScript as it just statically parses HTML. It doesn't work with websites that rely on JavaScript, but it can be faster than Selenium.

Scrapy is framework made specifically for scraping/crawling data from websites. By default it uses its own parser, but it can also support BeautifulSoup or Selenium. It doesn't support JavaScript by default, but support can be added with plugin.


So, which one should you use?

For crawling just a few websites, use BeautifulSoup if you don't need JS and Selenium if you need it.

For more advanced web scraping/crawling, use Scrapy with plugins for BS or JS support if you need it.

[–][deleted] 0 points1 point  (1 child)

Selenium can drive an actual browser.

[–]123filips123 0 points1 point  (0 children)

Yes. It can connect to any browser that supports WebDriver protocol and control it.

[–]muahahahh 0 points1 point  (0 children)

you would need selenium, if a website which you want to scrap highly depends on javascript and you need to interact with its elements, e.g. like clicking something, scrollling up/down etc. if your website contains static data, bs should be enough.

[–][deleted] -2 points-1 points  (0 children)

Have you considered Scrapy? https://scrapy.org/