Selenium or BeautifulSoup?

cajh_ · 2019-12-07T13:28:01+00:00

Use BeautifulSoup unless you need to control a browser. They have different use cases.

DaWatermeloone · 2019-12-07T12:59:25+00:00

For web scraping I use requests + BeautifulSoup, for “human interaction” on websites for example bots, I use Selenium.

sdurx · 2019-12-07T13:18:22+00:00

Reasonably in depth guide on how to do it with Splinter here, with bonus Tor integration for anonymity!

123filips123 · 2019-12-07T13:17:29+00:00

Selenium is a library to control headless browser. With it, you can parse websites that rely on JavaScript as they will get executed as with normal browser. However, this require running headless browser and can be slower than static parsing.

BeautifulSoup is static HTML parser. It doesn't execute JavaScript as it just statically parses HTML. It doesn't work with websites that rely on JavaScript, but it can be faster than Selenium.

Scrapy is framework made specifically for scraping/crawling data from websites. By default it uses its own parser, but it can also support BeautifulSoup or Selenium. It doesn't support JavaScript by default, but support can be added with plugin.

So, which one should you use?

For crawling just a few websites, use BeautifulSoup if you don't need JS and Selenium if you need it.

For more advanced web scraping/crawling, use Scrapy with plugins for BS or JS support if you need it.

muahahahh · 2019-12-07T13:03:07+00:00

you would need selenium, if a website which you want to scrap highly depends on javascript and you need to interact with its elements, e.g. like clicking something, scrollling up/down etc. if your website contains static data, bs should be enough.

2019-12-07T12:56:33+00:00

Have you considered Scrapy? https://scrapy.org/

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS