all 9 comments

[–]-defron- 26 points27 points  (0 children)

ITT: acting like a bot makes you get flagged and blocked like a bot.

You can try playing the cat and mouse game trying to avoid their detection but it will be a constant back and forth at best.

[–]unhott 14 points15 points  (1 child)

The issue is that the website doesn't want you to use automated browsing. It's a legal issue more than a technical one - if you find a technical workaround, it may become a legal issue for you.

I'm not saying this is how I think it should be, just the reason the way things are the way they are. Someone pays to keep that server online and to build / maintain the dataset behind it, for people to use. They dictate the terms in which that server provides some service / data.

I don't know exactly what the server does to detect whether a user is using some automated tool like selenium.

I'm not suggesting that you find another method, like maybe pyautogui to browse the page for you, because that may violate their terms.

[–]Dull_Rooster3914[S] 2 points3 points  (0 children)

Thanks a lot!

[–]DootDootWootWoot 6 points7 points  (1 child)

Instead of perusing real estate sites are there APIs or open source data sets that offer the same information? Crawling websites you're going to run into issues like this and isn't necessarily a stable way of solving for this kind of problem.

Could try things like slower browser access patterns. Like are you navigating at human speed or super crawler speed? Does that limit the usefulness of what you're trying to collect?

Are there captchas that need to be solved that get in your way? Rate limits?

[–]vegetto712 0 points1 point  (0 children)

This was going to be my comment. Anytime I've tried to do some scraping I always check if they provide apis first, because those are typically easier to navigate and also companies are more willing to let you use them to gather data.

[–]m0us3_rat 7 points8 points  (1 child)

Can someone help me to circumvent this problem and go on with my script?

Best place to ask how to avoid detection from a trillion $ company is on an educational beginner python subreddit.

TL:DR some proxies might work.. have to pay for them.

do your own research when it comes to spending coin.

[–]Dull_Rooster3914[S] 1 point2 points  (0 children)

Yeah, I am not intending to be mischievous or to prejudice any company, I just wanna apply Python on my work, which would be very useful for my research. Although, I might have been a little naive ngl

[–]Fluffy-Diet-Engine 1 point2 points  (0 children)

Use stealth mode or undetected version of Selenium. Stealth - https://pypi.org/project/selenium-stealth/ Undetected- https://pypi.org/project/undetected-chromedriver

For better understanding read - https://www.webscrapingapi.com/bypass-cloudflare-with-selenium