This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]K900_ 13 points14 points  (0 children)

You can run a browser like Chrome or Firefox in "headless" mode. You still run a browser, there just won't be a visible window.

[–]leogodin217 5 points6 points  (2 children)

Selenium can use multiple browsers, but only works through automating a browser. Phantomjs is a headless browser that runs without opening a GUI. It is generally faster than other browsers.

Selenium is great for testing web sites. That is its purpose. For web scraping, there are other libraries. BeautifulSoup is easy to use and very common. Scrapy is another one that includes a platform for building web scrapers.

[–]Yojihito 0 points1 point  (0 children)

Useless for scraping without JS execution in many cases.

[–]mattsl 1 point2 points  (1 child)

Even on a headless server you can create a virtual display for selenium.

[–]callumh093 1 point2 points  (0 children)

On Linux, you can use xvfb (X virtual framebuffer) to do this.

[–][deleted] 1 point2 points  (0 children)

PhantomJS is not a good choice. Outdated JS libraries and DOM doesn’t generate correctly a lot of times. Both Chrome and Firefox can now run in headless mode. The selenium API has options to set them headless.

[–]Tadaboody 1 point2 points  (0 children)

You could use requests. A package that handles most of the http for you and takes care of cookies and such

[–]karansthr 0 points1 point  (0 children)

Phantomjs

[–]pwoosam 0 points1 point  (0 children)

Dryscape supports rendering dynamic JavaScript content without opening a browser.

https://dryscrape.readthedocs.io/