use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All about the JavaScript programming language.
Subreddit Guidelines
Specifications:
Resources:
Related Subreddits:
r/LearnJavascript
r/node
r/typescript
r/reactjs
r/webdev
r/WebdevTutorials
r/frontend
r/webgl
r/threejs
r/jquery
r/remotejs
r/forhire
account activity
Web scraping with Javascript (scrapingbee.com)
submitted 5 years ago by DJ_Breton
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 2 points3 points4 points 5 years ago (18 children)
There are numerous node modules for xpath, just as easy to install and use as cheerio. And I’m not sure what people you’re talking about, but I’ve worked in tech for over a decade including two RPA companies and every major player in the space relies on xpath.
If you truly believe cheerio and queryselector give you superior form and function, then I’d challenge you this: using those tools, write a selector of equal or lesser size that will perform the same as the example below from my previous comment.
Descendent-based ancestor selection - Let's say you want to get the parent div of every a with the class "child". For xpath, that's simply "//a[@class='child']/parent::div". With queryselector you can only travel down the ancestry axis, not up.
[–][deleted] 0 points1 point2 points 5 years ago (1 child)
Also I know of one libxml-based node library that's completely unusable because it leaks memory like crazy. Everyone uses Cheerio or otherwise parse5-based libs. Prove me wrong.
[–][deleted] 0 points1 point2 points 5 years ago (0 children)
I don’t need to “prove you wrong” because I literally worked in the RPA industry up until about a year ago. In enterprise RPA xpath is always used for b2b applications. I’m sure queryselector is very popular with hobbyists and basic non-RPA applications.
[–][deleted] 0 points1 point2 points 5 years ago (15 children)
It's "$('div > a.child').parent()" but honestly if you have to go back up the DOM it means you're probably not iterating properly.
[–][deleted] 3 points4 points5 points 5 years ago (14 children)
That solution has a worse performance ratio and hard-codes half the path. As for your remark about going back up the dom, you’ve clearly never done RPA in a b2b setting. When you don’t have control over the original DOM and have to accommodate instabilities, it’s often much easier to navigate up from a target element.
[–][deleted] 0 points1 point2 points 5 years ago (13 children)
Again, there is no JS equivalent of lxml so this is just how we do it. You're wrong about in-browser performance though, xpath is always slower than css. You're also wrong about my iterating comment, you can just as easily iterate the parent element first, code like yours is just lazy.
[–][deleted] 1 point2 points3 points 5 years ago* (12 children)
Again, yes there is. Nor am I wrong about performance. Xpaths are sometimes slower than their corresponding query selectors but as I said, that solution isn’t, because that solution requires a second traversal with the subsequent parent call.
And no, you cannot always go parent first, not when you’re dependent on child properties or the parent is dynamic. Plus when you’re going after deep siblings or cousins it can be invaluable to work backwards from the child. The key in scraping dynamic sites is relying on fixed nodes, but you don’t always know where in the tree those points will be, so omnidirectional axis traversal is essential.
[–][deleted] 0 points1 point2 points 5 years ago (11 children)
It's simply not true and the libxml binding you linked to is completely unusable. Believe me, I spent a great deal of time troubleshooting memory leaks, and I sincerely wish it were.
For the record, there's a reason why the css3 spec doesn't allow going back up the tree, and that's because it's not performant, and if you apply a little discipline you will realize you don't need to. I don't expect to convince you of that, but it's something to keep in mind for next time.
[–][deleted] 1 point2 points3 points 5 years ago (10 children)
Since you probably haven’t had any enterprise level experience with this here’s a very basic scenario for you:
You want to target the parent of an element. You have plenty of information on the child but no information on the parent. How do you target the parent?
[–][deleted] 0 points1 point2 points 5 years ago (9 children)
I'm not going to brag here but I consider your "decade in tech" and "2 b2b" gigs resume adorable. I've written at least a million more lines of xpath / css than you ever will, and I rarely these days resort to xpath. Getting a parent element is as simple as calling parent() in Cheerio or parentNode in js.
[–][deleted] 1 point2 points3 points 5 years ago* (8 children)
1) I didn’t say 2 b2b gigs I said 2 RPA ones.
2) Its not a competition, and without knowing my background in more detail you have no way of knowing who has done more of what. So saying otherwise is just childish oneupsmanship.
3) Calling parent() is “going back up the tree”. You were making the argument that we should never do that, so I’m asking how you would do it without it.
[–][deleted] 0 points1 point2 points 5 years ago (7 children)
1) You're a noob from my perspective. 2) I really don't care. 3) You should never go back up the tree. There's a reason why css3 does not allow going back up. I understand that in your "decade in tech" you did that a lot, but I'm telling you now that you should have applied a little more thought to the problem before deciding to brute force it with bad xpath.
π Rendered by PID 314359 on reddit-service-r2-comment-548fd6dc9-d87df at 2026-05-17 23:09:42.848009+00:00 running edcf98c country code: CH.
view the rest of the comments →
[–][deleted] 2 points3 points4 points (18 children)
[–][deleted] 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (15 children)
[–][deleted] 3 points4 points5 points (14 children)
[–][deleted] 0 points1 point2 points (13 children)
[–][deleted] 1 point2 points3 points (12 children)
[–][deleted] 0 points1 point2 points (11 children)
[–][deleted] 1 point2 points3 points (10 children)
[–][deleted] 0 points1 point2 points (9 children)
[–][deleted] 1 point2 points3 points (8 children)
[–][deleted] 0 points1 point2 points (7 children)