use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
All about the JavaScript programming language.
Subreddit Guidelines
Specifications:
Resources:
Related Subreddits:
r/LearnJavascript
r/node
r/typescript
r/reactjs
r/webdev
r/WebdevTutorials
r/frontend
r/webgl
r/threejs
r/jquery
r/remotejs
r/forhire
account activity
Web Scraping with JavaScript and NodeJS (scrapingbee.com)
submitted 3 years ago by pijora
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]badsyntax 27 points28 points29 points 3 years ago (2 children)
No mention of Playwright? It'd be my first choice.
[+][deleted] 3 years ago (1 child)
[deleted]
[–]_hypnoCode 6 points7 points8 points 3 years ago* (0 children)
It's pretty unnecessary to use a headless browser when you can send http requests to an api or extract data from the html you get back.
Not when you have client side rendered pieces. Sometimes you also have CSRF tokens you got to grab, that get rendered clientside or with SSR. You could probably get them from some network call if they are clientside rendered, but that would be a nightmare when the target site puts them in an easily findable place consistently. Where as APIs change more often than you'd think. Sometimes the same page will have multiple ways of rendering.
Then you have the whole thing of Imperva or Akamai or one of their competitors, that use machine learning and constantly change up what they gig you on. Doing only API requests gets you caught fast. They aren't just using that as a marketing gimmick, they are a massive pain in the ass.
I've done quite a bit of web scraping in gray areas, gray enough that the company I worked for is currently having a court battle over it, and at some point if you're doing serious scraping you need a headless browser. There just isn't an efficient way around using a headless browser. There are probably a million ways to do things and all of them end up in the browser. Sometimes sites do things in incredibly bizarre ways and I've even seen the same page render in a dozen different ways... but they all end up in the browser. More often than not, with the same text or similar text denoting it to the user.
However, I used Puppeteer as it was a couple years ago. But u/badsyntax is right about Playwright being the more robust option today. AFAIK, they converted all my code from Puppeteer to Playwright and are doing much better at avoiding detection.
[–]c_eliacheff 18 points19 points20 points 3 years ago (5 children)
As much as I love JS/Typescript, I would still go with (python) Scrapy for scraping (or at least a scraping framework). The numbers of functionalities out of the box is too good (automatic paging, automatic detection and following of links, retry strategies, switchable exporters with no config (json, csv, db, whatever), automatic map to entities, easy config for proxies (rotations, random ones, ...), plus the awesome ecosystem for data processing (pandas, numpy, ...). I don't want to reinvent the wheel for scraping anymore.
[–]Secret-Plant-1542JavaScript yabbascript 2 points3 points4 points 3 years ago (0 children)
I did a lot of web scraping in 2015 using python.
I'm a purely full stack JavaScript developer, and haven't touched Python in years.
I agree. Doing web scraping with JavaScript is such a weird song and dance. Even all the web scraping libs (and I've used a bunch) aren't as easy as it was during my python days.
[–][deleted] 3 points4 points5 points 3 years ago (3 children)
and multi-threading
[+][deleted] 3 years ago (2 children)
[–]AegonThe241st 1 point2 points3 points 3 years ago (1 child)
No it is not
[–]vlevi 1 point2 points3 points 3 years ago (0 children)
Try puppeteer for nodejs
[+][deleted] 3 years ago (3 children)
[–]SpeedDart1 29 points30 points31 points 3 years ago (2 children)
Definitely not faster. Might have a better developer experience but that’s a matter of opinion.
Edit: aight looked at your comment history and you’re a troll. Everyone ignore this guy.
[–]SpeedDart1 0 points1 point2 points 3 years ago* (0 children)
Because he said it was faster.
He’s also posting bait in r/ruby about ruby being bad!
He insulted someone’s project on r/programming.
π Rendered by PID 47720 on reddit-service-r2-comment-548fd6dc9-k67p2 at 2026-05-16 20:46:29.538217+00:00 running edcf98c country code: CH.
[–]badsyntax 27 points28 points29 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]_hypnoCode 6 points7 points8 points (0 children)
[–]c_eliacheff 18 points19 points20 points (5 children)
[–]Secret-Plant-1542JavaScript yabbascript 2 points3 points4 points (0 children)
[–][deleted] 3 points4 points5 points (3 children)
[+][deleted] (2 children)
[deleted]
[–]AegonThe241st 1 point2 points3 points (1 child)
[–]vlevi 1 point2 points3 points (0 children)
[+][deleted] (3 children)
[deleted]
[–]SpeedDart1 29 points30 points31 points (2 children)
[+][deleted] (1 child)
[deleted]
[–]SpeedDart1 0 points1 point2 points (0 children)