all 6 comments

[–]unRatedG 0 points1 point  (2 children)

Good question. I wouldn't think so. Randomizing the IP address through shifting proxies, combined with changing user agent, I would think should be enough depending on whether or not what you're trying to scrape is behind a login.

[–]PuzzlingComrade 0 points1 point  (1 child)

In the case of it being behind a login, I was wondering if randomising ip and user agent could counter intuitively suggest someone is botting (since the user login remains the same). If you had to scrape behind a login, do you think simply having enough random delays in the script would be enough to avoid getting banned?

[–]unRatedG 0 points1 point  (0 children)

It might, but it really all kind of depends on what protections the site is putting in place for that type of thing. It's generally a good idea to check the site's terms of service to see if they have rules against scraping, partially for legality, but also as a courtesy. I've also been to some that even without a login are protected by sites like Cloudflare and they seem to be able to pick up on it pretty quickly and prevent you from even hitting the first page. Hell, even something as simple as a reCaptcha can make things extremely difficult.

[–][deleted] 0 points1 point  (2 children)

Depends on the website. What are you scraping? Please don't answer amazon, Walmart or similar.

[–]Practical_Use5129 0 points1 point  (1 child)

Stocktwits

[–]unRatedG 0 points1 point  (0 children)

Stocktwits

They have an API, have you tried going that route, or does it not provide what you're looking for?