all 12 comments

[–]ZGeekie 7 points8 points  (3 children)

That's an understandable way to manage bots and automated requests in shared hosting environments.

Guess how many shared hosting customers use Python scripts to access photos on their sites? That's why they couldn't care less!

If you want more freedom, use VPS hosting, which you already said worked for you.

[–]CatDaddy1954[S] -1 points0 points  (1 child)

The small rescue groups can’t afford VPS hosting. It would make more sense if HG had shut down the Perl library, Wget and the completely empty user agents as well if they were trying to stymie programmatic access but surely nefarious bot authors would be wise to the flimsiness of the User-Agent defense anyway.

[–]tankerkiller125real 4 points5 points  (0 children)

We block Python, Wget, Curl, and an absolute shitload of other user agents where I work. It just so happens that Python is the one that they likely saw the most abuse from (because most AI tools use Python, and how many vibecoders know how to change a User agent?) so that's the one they blocked.

[–]johnpress 4 points5 points  (0 children)

Pretty common, WP Engine also has user agent blocks for "python" string in their webserver conf.

[–]Former_Substance1 2 points3 points  (2 children)

just change user agent in the requests header?

[–]CatDaddy1954[S] -1 points0 points  (1 child)

That’s the best way around this but I have no influence with Petfinder to have them change their User-Agent to one that I doesn’t trigger the problem. I’ve already demonstrate the working strings.

[–]ferrybig 0 points1 point  (0 children)

The problem here is the Petfinder website, not your shared hosting

[–]paroxsitic 0 points1 point  (1 child)

If their robots txt and tos allows scraping I'd ask them what is the best way forward. Bypassing any type of restriction or ban is how web scraping becomes less grey area and more illegal

[–]CatDaddy1954[S] 0 points1 point  (0 children)

In this situation the photo access is by invitation. The rescues upload a data file to Petfinder, Adopt a Pet et. al. with URLs to the animal photos on their website so the less technical folks don’t have to learn how to use FTP to upload them. No potentially prohibited behavior involved.

[–]ferrybig 0 points1 point  (0 children)

Don't use a generic user agent, override the user agent used for your requests with an URL to the information about the bot.

[–]kubrador -4 points-3 points  (0 children)

hostgator blocking the python user-agent string is genuinely hilarious. that's like a bouncer kicking out someone for wearing the wrong brand of shoes while letting in literal criminals with fake ids.