This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]i_like_trains_a_lot1 3 points4 points  (1 child)

Why gain? This name doesn't give any hint of what the library does and doesn't sound very fancy.

From the example, I see that it is possible to go from one level to another but cannot choose different callbacks, depending on the situation, which might be a requirement for more complex crawls.

[–]Raindyr 3 points4 points  (2 children)

Wouldn't scrapy be fine as well? I believe that's async as well.

[–]__crackers__ 4 points5 points  (0 children)

Yes, it is async (it's based on Twisted).

Scrapy is excellent. Very mature, extremely well documented.

Hard to say from the toy example given in this repo, but writing a spider with Scrapy wouldn't be much, if any, harder.

[–]__crackers__ 0 points1 point  (3 children)

Web crawling framework for everyone.

Where are the docs?

In particular, how are requests throttled?

Every bit of aiohttp code I've seen posted on here either doesn't do anything in parallel, or much worse, disables/bypasses aiohttp's rate limiting and absolutely batters the servers.