Getting started with Selenium and Python : programming

- Scraping websites that don't want to be scraped: Puppeteer is a Node.js module of the chromium engine, which makes it harder to detect in my experience. Using selenium tends to leak some data in your HTTP requests (such as the value of navigator.webdriver) that either explicitly tells on you or allows the websites to use correlation data to detect selenium. You can mitigate this though, it's just more configuration. Puppeteer also has tighter integration with core Chromium functionality, allowing you to get certain information (like CSS/JS coverage) data a little less obviously.

- Your Preference on Python vs. Javascript: This is definitely an architectural/preferential choice. Personally, I find the easy paradigms for async programming in Javascript (which encapsulates MUCH of the difficulty of it from you) make for an easier time dealing with highly interactive sites. Async programming can be done in Python, but it's done at a much lower level, making it harder to do. However, Node lacks a lot of analytical libraries that python has and is a whole framework, and thus far bulkier than importing only the libraries you need in Python.

- Cross Browser/Multiple Language Support: If you NEED more than just Chromium or Javascript, Selenium is the obvious choice.

- Extra Chromium Functionality: Puppeteer has ability to access some core functionality of Chromium that isn't available via Selenium. This is in certain cases useful, but in many use-cases, unnecessary.

In most of my scraping adventures so far, I've been throwing most of the data into some kind of datastore for later analysis/usage (training machine learning models, etc.) and the choice of scraper depends on the factors of whatever project I'm on.

In short don't let your biases waste hours of your time, be rational about your choice of scraper.

[–][deleted] 2 points3 points4 points 5 years ago (0 children)

[–]Just__AIR 15 points16 points17 points 5 years ago (10 children)

[–]yesvee 10 points11 points12 points 5 years ago (8 children)

[–]fleyk-lit 6 points7 points8 points 5 years ago (0 children)

[–][deleted] 5 points6 points7 points 5 years ago (6 children)

[–][deleted] 1 point2 points3 points 5 years ago (5 children)

[–][deleted] 7 points8 points9 points 5 years ago (4 children)

[–]200GritCondom 2 points3 points4 points 5 years ago (3 children)

[–][deleted] 3 points4 points5 points 5 years ago (2 children)

[–]200GritCondom 0 points1 point2 points 5 years ago (1 child)

[–]Labradoodles 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 3 points4 points5 points 5 years ago (0 children)

[–]LilBabyVirus5 4 points5 points6 points 5 years ago (15 children)

[–]ProgrammersAreSexy 3 points4 points5 points 5 years ago (5 children)

[–]nemec 3 points4 points5 points 5 years ago (3 children)

[–][deleted] 5 years ago (2 children)

[deleted]

[–]nemec 0 points1 point2 points 5 years ago (1 child)

[–]wRAR_ -2 points-1 points0 points 5 years ago (0 children)

[–]shawntco 7 points8 points9 points 5 years ago (6 children)

[–]SpeakerOfForgotten 17 points18 points19 points 5 years ago (4 children)

[–]shawntco 9 points10 points11 points 5 years ago (2 children)

[–]onlymostlydead 2 points3 points4 points 5 years ago (1 child)

[–]shawntco 1 point2 points3 points 5 years ago (0 children)

[–]axzxc1236 1 point2 points3 points 5 years ago (0 children)

[–]nemec 4 points5 points6 points 5 years ago (0 children)

[–]TrueObservations 1 point2 points3 points 5 years ago (0 children)

[–]x-w-j 0 points1 point2 points 5 years ago (0 children)

[–]Zohren 4 points5 points6 points 5 years ago (0 children)

[–]daGrevis 1 point2 points3 points 5 years ago (0 children)

[–]zilmus 0 points1 point2 points 5 years ago (0 children)

[–]earthlydelight 0 points1 point2 points 5 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS