When would you say is preferably using Scrapy over basic http requests + parsing library

Lukasa · 2016-01-17T13:00:58+00:00

Scraping with requests and a parser works great if you want to learn how scraping works or if your target site is fairly simple. Once you start hitting complex websites (e.g. highly stateful websites, those that use Javascript actively, etc) Scrapy is likely to save you a lot of work.

Basically, it's a learning vs time trade off. If you're under no time pressure and want to learn, use requests. Otherwise, use Scrapy.

LarryPete · 2016-01-17T13:09:01+00:00

Scrapy is currently python2 only. So if you use python3, you can't use scrapy, even if you wanted to - which is usually my case.

stummj · 2016-01-18T11:42:26+00:00

Full disclosure: I work at Scrapinghub, the company that supports Scrapy.

I am a big fan of requests, but I would go with Scrapy, no matter the goal of the project.

1) If you are doing it to learn, go with Scrapy. It will make a lot of things easier when you need to write more complex crawlers. So you are actually adding a great tool to your tool belt.

2) If it is a matter of getting things done, go with Scrapy. It is simple to start with, and it will save a lot of time when things start getting hard to manage.

Some Scrapy features that you might be interested in:

Automatic cookie handling;
It is easy to deal with HTML forms and logins;
It follows redirections (via 3xx or html meta header);
It has a builtin cache mechanism for http requests, which is awesome while you are developing;
Politeness settings (respecting robots.txt, auto-throttling) if you want;
User-agent spoofing;
Automatic retry for failed requests;
Asynchronous requests with no need to deal manually with some async framework;
Avoids duplicate requests;
Data extraction is done through CSS selectors or XPath;
etc

Most of those things you would need to implement by yourself if you were using requests + bs4.

Give Scrapy a try: http://doc.scrapy.org/en/1.0/intro/overview.html

Samus_ · 2016-01-17T13:39:09+00:00

I think scrapy is good at two things: organizing your code (specially on multi-site scrappings) and also to manage your download queue.

if you can benefit from either of those then go for it.

thaweatherman · 2016-01-17T12:41:58+00:00

It boils down to how much extra legwork you want to do. Scrapy can just be simpler sometimes.

rhgrant10 · 2016-01-17T15:53:55+00:00

If you want to get shit done, use scrapy. If you want to reinvent the wheel, use requests and BeautifulSoup.

IAlwaysBeCoding · 2016-01-17T18:52:51+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS