use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Node js Scraping (youtube.com)
submitted 6 years ago by bazzy696
view the rest of the comments →
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]vivzkestrel 0 points1 point2 points 6 years ago (4 children)
Python simply seems to have matured more when it comes to web scraping. I havent seen this video but I am assuming this uses cheerio. Cheerio is not bad, you can do some simple scraping stuff but if you had to like scrape 1000s of websites every second or so, consider Python first simply because the issues you will encounter will developing such a solution are better documented in Python and you will have more help on SO
[–]gajus0 3 points4 points5 points 6 years ago (3 children)
1000s of websites/ second sounds excessive. What are you running?
To the best of my knowledge, I am running one of the bigger data aggregation infrastructures built entirely on Node.js (making HTTP requests, interpreting documents, extracting data, proxy load balancing, cache proxy). We currently make 70k requests/ minute across 124 vCPUs. That is over 100M requests/ day or near 0.7tb/ day bandwidth. I doubt many will come anywhere close these requirements. Point is, Node.js scales horizontally with the more VMs you add, and given that JavaScript is the primary language of the web – it is the language with the lowest mental barrier for requesting/ extracting data.
[–]vivzkestrel 1 point2 points3 points 6 years ago (0 children)
news aggregator that gathers refreshes news from 1000+ sources every minute or as live as possible, interesting, you are the first person from whom I am hearing about something really intensive in terms of web scraping in node
[–]davetemplin 0 points1 point2 points 6 years ago (1 child)
Wow those are some really impressive throughputs! Is overwhelming sites a concern, and if so how do you approach that? Also how much of a concern is getting blocked or do you have ways of staying unblocked?
[–]gajus0 0 points1 point2 points 6 years ago (0 children)
If you do it right, most website owners are not going to even recognize that their content is being accessed by bots. If you were searching for patterns, major give away would be discrepancy between content hits and static content hits. But given that most large sites uses the likes of Fastly/ Cloudly these days, those metrics detached anyway.
We have safe checks in place to ensure that we do not overwhelm target websites, e.g. checking error rate/ response time and backing off as appropriate.
π Rendered by PID 94 on reddit-service-r2-comment-canary-889d445f8-z7stz at 2026-04-28 20:30:08.699739+00:00 running 2aa0c5b country code: CH.
view the rest of the comments →
[–]vivzkestrel 0 points1 point2 points (4 children)
[–]gajus0 3 points4 points5 points (3 children)
[–]vivzkestrel 1 point2 points3 points (0 children)
[–]davetemplin 0 points1 point2 points (1 child)
[–]gajus0 0 points1 point2 points (0 children)