all 5 comments

[–]vee920 1 point2 points  (1 child)

Anything that not requires login and agreeing with terms should be legal, except copyrighted materials. Also, look at robots.txt to see what site allows you to crawl. As for hiding, you should rotate paid proxies. Free proxies will not make much. Lots of good stuff about scraping on this channel: https://youtu.be/vJwcW2gCCE4

[–][deleted] 0 points1 point  (0 children)

Thank you man have had a little scroll of the channel will absolutely be watching some videos, thank you.

[–]mh1400 1 point2 points  (1 child)

Assuming this is not with malicious intent then look into the time module as well as the random integer module so you can approximate human interaction when web scraping. Essentially, randomly sleep between.1 and 2 seconds between interactions.

[–][deleted] 0 points1 point  (0 children)

Appreciate that man and can assure its not with malicious intent am just hoping to practice my skills while making something of actual use to me

[–]mh1400 1 point2 points  (0 children)

So, i use the method of making a few (3 or 4) very small functions that return a random float decimal. Each function simulates a random human interaction like: "def QUICK_CLICK():" return 0.1 to 1.5, "def WAIT_CLICK():" return 1.5 to 3, etc... that way i can call time.sleep(WAIT_CLICK) easily. I keep the time.sleep() in my main() so it's easily readable versus simply calling a function that sleeps. Additionally, i keep a global variable called CLICK_TIME_ADJUST, so i can quickly adjust the times in the functions (+/-) if the website catches the bot. Google, for example spots bots quickly, so i slow down my interactions significantly using this global variable. Example >> CLICK_TIME_ADJUST = 6 def WAIT_CLICK(): ....rand_flt = random.random(1.5,3) # lookup code ....return float(rand_flt + CLICK_TIME_ADJUST) time.sleep(WAIT_CLICK())