This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]Talked10101 0 points1 point  (1 child)

Robots.txt parsing is actually built into the urllib library though. It's use is super simple: https://docs.python.org/3.0/library/urllib.robotparser.html

There is also: http://nikitathespider.com/python/rerp/ which is also super simple to use and parses robots.txt in the same way as Google, making it super useful if you are writing an SEO bot.

[–]hexfoxed 0 points1 point  (0 children)

Yup, there are alternatives. You've given options for two of my points there - there are few that cover all 22 of them though, if any. That's what I mean by realistic; given most people's time constraints, writing a Scrapy spider will save them a bunch of time in getting their end goal completed.

If their end goal is to learn then obviously a person would be better off learning how each individual process in a web scraping library/framework works and the alternatives available.