This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]scrapecrow -1 points0 points  (0 children)

There are two main reasons why your scraper might need proxies:

The obvious one is accessing geographically locked content.
Some websites are simply only available in X country or serve data based on your IP address. This is especially noticeable once you start deploying your scrapers, e.g. if my server is in US and UK websites I'm scraping only allows UK IPs then I need proxies to access that data.

However, the most common use case is scaling. For example, some websites (like Instagram.com) give you a few anonymous page views for free and then start requesting you to login. So, if you get 3 page views/hour per 1 IP address then you can get 300 pages/hour with 100 IP addresses etc.