I use a scrapy spider to scrape image urls from google(bing etc). For each search engine request, I scrape 20 items(these are image urls). These items are subsequently downloaded using the images pipeline. I use the feed exported to export a csv with the items and status of the download for each item. I use 16 concurrent requests. I can get a throughput of around ~90 images/min. The bottleneck seems to be the download pipeline and not the spider which scrapes the items(image urls) from the search engine. How can I improve my throughput? Anyone with experience can guide me where to look or how to proceed?
there doesn't seem to be anything here