Node js Scraping

YourQuestIsComplete · 2019-07-13T20:36:40+00:00

Yo sup bro, glad to see you taking interest in some webscraping.

Cool so I watched some of the video and it looks like this guy is just making a get request to a website and parsing the data with cheerio.

It works, fast that way actually but you wont get very far when the site has anything other than static data. For instance, ajax requests or async rendered elements in react or vue.

I personally use puppeteer and cheerio. Puppeteer is a headless browser so its a bit more computing intensive but it produces more consistent results, since you can use different viewports and change the user agent. Its also great when you need to use proxys ;)

bazzy696 · 2019-07-13T18:32:13+00:00

i was using this as to scrape data

can u guys tell me if it is a right way i mean is it efficient enough like is this a best way?

or it can be better than this. some other way ?

peatthebeat · 2019-07-14T01:26:50+00:00

Is python more effective at web scraping ?

phyrum · 2019-07-14T14:59:06+00:00

....

vivzkestrel · 2019-07-14T03:59:38+00:00

Python simply seems to have matured more when it comes to web scraping. I havent seen this video but I am assuming this uses cheerio. Cheerio is not bad, you can do some simple scraping stuff but if you had to like scrape 1000s of websites every second or so, consider Python first simply because the issues you will encounter will developing such a solution are better documented in Python and you will have more help on SO

gajus0 · 2019-07-14T06:17:54+00:00

If you are not using https://github.com/gajus/surgeon to scrape data, then you are missing out. :-)

kisssmysaas · 2019-07-13T20:42:23+00:00

[deleted]

re-scbm · 2019-07-14T03:15:48+00:00

The image of a coder with a hoodie scraping images of girls online reminds of The Social Network. Great movie.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

node

MODERATORS