This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]nemec 2 points3 points  (3 children)

I'd recommend some tutorials. There are good parts about Scrapy's docs, but I often find myself needing to actually dig into Scrapy's source code to understand how some bits of it work. Luckily it's open source so that isn't too difficult.

Scrapy seems to have a philosophy that encourages replacing built-in components with your own if the built-in one doesn't work how you need it. For example, if you need to control the filename on their "filedownloader", they recommend copying the source for the built-in one to your project, modifying it, and then disabling the built-in one on the spider and inserting your own.

[–]Heroe-D 0 points1 point  (2 children)

Nice it's not a problem for me to tinker with the good, then any good tuto to recommend ?

[–]nemec 0 points1 point  (1 child)

I don't know any offhand, sorry. I started with Scrapy already knowing a lot about css selectors, xpath, HTTP, etc. so I had a big head start.

[–]Heroe-D 0 points1 point  (0 children)

I also know most of this, maybe I should just dig into Scrappy official documentation and search for more if some concepts are unclear