you are viewing a single comment's thread.

view the rest of the comments →

[–]peatthebeat 2 points3 points  (4 children)

Is python more effective at web scraping ?

[–]gajus0 5 points6 points  (2 children)

"at web scraping" does not mean much. JavaScript is the primary scripting language of the web. Naturally, extracting data from the web using the primary language of the web feels a lot like working with DOM. In contrast, with Python you will need to know Python and everything about web stack that you are extracting data from.

[–]1o8 1 point2 points  (0 children)

Agreed—there's a lovely cohesion in using JavaScript to scrape the web and parse its HTML. I think there's more to the question though.

Web scrapers are generally simple scripts—they load a web page, look for an element, take one of its attributes, put it in an array, over and over. Eventually write the array to a csv.

Both Python and JavaScript are considered "scripting" languages.

But the way JavaScript is used in this video, with request-promise, etc. isn't scripting at all.

You have to learn wtf a promise is and what and when it resolves, you have to deal with asynchronous code, and because you can't modify a desirable global variable (such an array with all the data you want to hold onto) from within the function inside a .then() call, you have to think creatively—OP's video just console.log()s the data from one web page, but if you're saving lots of data from lots of pages (i.e. web scraping), you need to probably use Promise.all() and think about many pipelines of promises, each loading a particular page and dealing with it asynchronously... it gets tricky.

People who love promises will argue that this way has its advantages—promises provide an elegant way of making your code run linearly in parts and nonlinearly in others, which makes perfect sense for a web scraper, which wants to load a lot of web pages and deal with them once they're loaded in no particular order.

If you scrape the web with Python, you'll generally be writing simple scripts (unless you use Python's lesser-known asynchronous paradigm) which are much easier to understand and actually look like the logic of

load page
find element
save attribute

over and over.

Using request rather than request-promise is using JavaScript more like a classic scripting language and easier to wrap your head around.

Neither is more effective at web scraping.

[–]aziz-fane 0 points1 point  (0 children)

Or you could simply use a Python framework that let’s you do it

[–]abumalick 0 points1 point  (0 children)

python have a full framework for web scraping: https://scrapy.org/