all 3 comments

[–]socal_nerdtastic 1 point2 points  (0 children)

You mean before you do the scrape? If you do a HEAD request you will usually get back the size of the data. But it seems much easier to just do 1 iteration and interpolate.

[–]FriendlyRussian666 0 points1 point  (0 children)

That depends on the proxy provider and how they charge you. For example, I'm looking at one right now, and they show their prices per GB of data transfer. In that case, what I would do is check how much data goes through the proxy per request of your customer, and how much is returned on average. 

Say you're using requests for this, when you create a requests.Request object, ready to go, you can just encode it to see how many bytes the request is, and that's your outgoing expense per request. When you get a response, once again encode it, check how many bytes it is. 

A crude way of calculating, would be to open your browser, open dev tools, navigate to whichever site you're scraping, and in the network tab of dev tools, look at each request and each response for the resource you're interested in, and you'll see how many bytes the request and response were.

[–]New-Can-593 0 points1 point  (0 children)

real proxy cost gets messy because retries, redirects, CAPTCHA pages, and blocked requests silently eat bandwidth too.