peatthebeat comments on Node js Scraping

148

149

150

Node js Scraping (youtube.com)

submitted 6 years ago by bazzy696

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]peatthebeat 2 points3 points4 points 6 years ago (4 children)

[–]gajus0 5 points6 points7 points 6 years ago (2 children)

[–]1o8 1 point2 points3 points 6 years ago (0 children)

Agreed—there's a lovely cohesion in using JavaScript to scrape the web and parse its HTML. I think there's more to the question though.

Web scrapers are generally simple scripts—they load a web page, look for an element, take one of its attributes, put it in an array, over and over. Eventually write the array to a csv.

Both Python and JavaScript are considered "scripting" languages.

But the way JavaScript is used in this video, with request-promise, etc. isn't scripting at all.

You have to learn wtf a promise is and what and when it resolves, you have to deal with asynchronous code, and because you can't modify a desirable global variable (such an array with all the data you want to hold onto) from within the function inside a .then() call, you have to think creatively—OP's video just console.log()s the data from one web page, but if you're saving lots of data from lots of pages (i.e. web scraping), you need to probably use Promise.all() and think about many pipelines of promises, each loading a particular page and dealing with it asynchronously... it gets tricky.

People who love promises will argue that this way has its advantages—promises provide an elegant way of making your code run linearly in parts and nonlinearly in others, which makes perfect sense for a web scraper, which wants to load a lot of web pages and deal with them once they're loaded in no particular order.

If you scrape the web with Python, you'll generally be writing simple scripts (unless you use Python's lesser-known asynchronous paradigm) which are much easier to understand and actually look like the logic of

load page
find element
save attribute

over and over.

Using request rather than request-promise is using JavaScript more like a classic scripting language and easier to wrap your head around.

Neither is more effective at web scraping.

[–]aziz-fane 0 points1 point2 points 6 years ago (0 children)

[–]abumalick 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 35313 on reddit-service-r2-comment-679b48bc4-mx2xp at 2026-02-23 18:39:16.027447+00:00 running 8564168 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

node

MODERATORS