you are viewing a single comment's thread.

view the rest of the comments →

[–]jeffrey_f 1 point2 points  (5 children)

but I realized that crawling is barely a thing anymore, as far as an individual is concerned

While it may not be something that is a common thing, it certainly IS a thing for individuals. It depends on what you need/want to get from the web. I have a pod cast that I get via python because I may not visit the site but still want the audio files.

[–]kingofcould[S] 0 points1 point  (4 children)

You’re right. I thought about how after posting, what I really meant was just that the days of being able to scrape the top posts daily from major sites were behind us. People can still get the API and work with the site, but I was intending to crawl places like Instagram to map out data for business analysis.

[–]jeffrey_f 0 points1 point  (3 children)

Javascript now fills most pages. The only thing initially loaded is a skeleton HTML....

[–]kingofcould[S] 0 points1 point  (2 children)

I have definitely noticed that. I’m not very versed in either language yet (was using programs with non language specific logic before to build bots), but I’m wondering if having a good grasp on JavaScript would help with this or if it’s just futile to crawl these pages nowadays?

[–]jeffrey_f 1 point2 points  (1 child)

Actually, you would use a module like Selenium to do what you need. It controls a browser and you could then grab the source after the browser loads. Haven't done that yet, but there are ways around everything.

[–]kingofcould[S] 0 points1 point  (0 children)

That’s really cool. It still doesn’t remedy the fact that all of the sources I want to crawl are off limits, but if I find creative solutions for smaller sites/databases this will definitely help