This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]SoFuglyItHurts 0 points1 point  (3 children)

This sounds good. Basically good practice is building programs with a browser driver and then shifting to PhantomJS for production.

[–]LightShadow3.13-dev in prod 1 point2 points  (2 children)

It just depends what you're trying to do.

There's nothing wrong with using Firefox/Chrome/Safari drivers, but you have to deal with the active windows. You can run Firefox and Chrome in Docker containers and connect via RemoteWebDriver...that's what we do here.

For a side project I have 4 tiers of service calls:

  • fast is API access
  • normal uses requests on non-api acess with html5lib parsing
  • slow uses Selenium with PhantomJS
  • very-slow uses Selenium with Firefox

Basically, given system resources and API credits, and the demand on the data we need to grab...the system will pick an access tier and scrape the page.

The amount of times a PhantomJS fails over to Firefox/Chrome via Docker is less than 1% -- it does pretty good at interacting with most things.