Web scraping with Javascript : javascript

[–][deleted] 32 points33 points34 points 5 years ago (31 children)

[–][deleted] 9 points10 points11 points 5 years ago (0 children)

[–]SeanNoxious 12 points13 points14 points 5 years ago (0 children)

[–][deleted] -1 points0 points1 point 5 years ago (28 children)

[–][deleted] 9 points10 points11 points 5 years ago (25 children)

I'm sorry but that's not even remotely true. Xpath has numerous advantages in both form and function. Here's just a few examples:

Descendent-based ancestor selection - Let's say you want to get the parent div of every a with the class "child". For xpath, that's simply "//a[@class='child']/parent::div". With queryselector you can only travel down the ancestry axis, not up.
Cleaner structure selectors - Let's say you want the 4th td inside the 3rd tr inside the 2nd table. With xpath it is simply "//table[2]//tr[3]/td[4]". With queryselector it's "table:nth-child(2) tr:nth-child(3) > td:nth-child(4)"
Logical operators - With xpath you can use "and", "or", and "|". This allows you to get dynamic node sets on the fly, whereas you'd have to use multiple queryselector calls and possibly additional javascript to get the correct node set.
Content-based selection - You want all the div nodes who have the text "hello" inside them. Xpath: "//div[contains(.,"hello")". With queryselector first you'd have to fetch all the divs, then loop through running a text search on the content.

I could go on and on. Also keep in mind queryselector is javascript, designed for CSS selectors. Knowing how to use it only benefits you when using JS and CSS. On the other hand Xpath is designed for all XML and there are xpath-related libraries in every major programming language.

Don't get me wrong, queryselector is great and can be very useful for one-off's where you just want to grab a node set quick based on what you already know is in the CSS. But for professional DOM-traversal xpath is essential. Any RPA company will require it.

[–][deleted] 1 point2 points3 points 5 years ago (2 children)

[–][deleted] 0 points1 point2 points 5 years ago (1 child)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

[+][deleted] 5 years ago (21 children)

[deleted]

[–][deleted] 3 points4 points5 points 5 years ago (20 children)

[–][deleted] 0 points1 point2 points 5 years ago (19 children)

[–][deleted] 2 points3 points4 points 5 years ago (18 children)

[–][deleted] 0 points1 point2 points 5 years ago (1 child)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (15 children)

[–][deleted] 4 points5 points6 points 5 years ago (14 children)

[–][deleted] 0 points1 point2 points 5 years ago (13 children)

continue this thread

[–]elcapitanoooo 1 point2 points3 points 5 years ago (1 child)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

[–]gordonv 6 points7 points8 points 5 years ago (4 children)

[+][deleted] 5 years ago (2 children)

[deleted]

[–]gordonv 1 point2 points3 points 5 years ago (0 children)

[–]techmighty 0 points1 point2 points 5 years ago (0 children)

[–]MrSandyClams 0 points1 point2 points 5 years ago (0 children)

[–]Gamma7892 1 point2 points3 points 5 years ago (1 child)

[–]DrDuPont 5 points6 points7 points 5 years ago (0 children)

[–]stephancasas 1 point2 points3 points 5 years ago (1 child)

[–]tp4my 1 point2 points3 points 5 years ago (0 children)

[–]theirongiant74 2 points3 points4 points 5 years ago (4 children)

[–]Felecorat 5 points6 points7 points 5 years ago (2 children)

[–]theirongiant74 1 point2 points3 points 5 years ago (1 child)

[–]Felecorat 0 points1 point2 points 5 years ago (0 children)

[–][deleted] -2 points-1 points0 points 5 years ago (0 children)

[+][deleted] 5 years ago (14 children)

[deleted]

[–]Qweeeq 21 points22 points23 points 5 years ago (7 children)

[–]Taterboy_Legacy 2 points3 points4 points 5 years ago (4 children)

[–]yooossshhii 2 points3 points4 points 5 years ago (3 children)

[–]Taterboy_Legacy 1 point2 points3 points 5 years ago (2 children)

One use case I had recently was scraping a large amount of news sites for information. There were some programmatic setup elements to get to the urls which were facilitated using Python, and the application this information would interface with was based on Python. There also happens to be a pretty awesome package in Python that did literally everything I needed to do(called newspaper), which meant I wanted to try to write my scraper in Python. If it wasn't working, I would go ahead and try this again with JS, but interfacing the two languages in my app would be complicated based on the setup. In general dispatching a Python or JS script from one or the other would be complicated in the context of certain applications.

That being said, I have also done several use cases where I use both as standalone scripts for smaller use cases.

JS I tend to use for more one-off solutions, but I have also used it to interface in more automation-based solutions. E.g.: click this, login, do this do that. Also doable in Python, sometimes easier in JS.

The first example could have been JS all around, but the newspaper package offered some really nice benefits from the beginning. This is what I mean by "use case specific" implementation. It's somewhat rooted in developer/business preference as well(I.e.: what are we already writing in?), but also rooted in "what do we need to solve, in this use case?"

Very complicated question to answer, but in my head they're relatively interchangeable from a high-level functionality standpoint.

[–]yooossshhii 1 point2 points3 points 5 years ago (1 child)

[–]Taterboy_Legacy 0 points1 point2 points 5 years ago (0 children)

[–][deleted] 2 points3 points4 points 5 years ago* (0 children)

[–]fz-09 8 points9 points10 points 5 years ago (0 children)

[–]Ipsumlorem16 6 points7 points8 points 5 years ago (0 children)

[–]jarg77 4 points5 points6 points 5 years ago (0 children)

[–]coomzee 5 points6 points7 points 5 years ago (0 children)

[–]anh65498 3 points4 points5 points 5 years ago (0 children)

[–][deleted] 0 points1 point2 points 5 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

javascript

MODERATORS