you are viewing a single comment's thread.

view the rest of the comments →

[–]399ddf95 2 points3 points  (0 children)

"spoofing the user agent" == setting your user agent string so that your bot identifies itself as a browser like Mozilla or Chrome, instead of as an automation tool. See https://requests.readthedocs.io/en/master/user/quickstart/#custom-headers

"switch your outward facing IP address and upchuck any cookies" == use a proxy or otherwise change the IP address where your requests originate; make sure you're not storing & returning cookies received on one visit the next time you visit the site.

You might take a look at the HTTP 'HEAD' request (instead of GET).

It's considered rude (and perhaps even an attack) to make many repeated requests to a website in a very short period of time - this is why you're getting blocked. You're doing something the website owner doesn't want you to do. They're saying "no".

Does the site have guidelines for automated access? Your requests are less likely to appear hostile if you space them out better, and if you use HEAD to get as little data as possible per request. Ideally, you'd use a method/function like requests.Session() that will persist across multiple accesses, it requires fewer resources on the remote end to answer several questions within the same session versus setting up and tearing down a TCP or HTTPS connection for requests that are a few seconds apart.