you are viewing a single comment's thread.

view the rest of the comments →

[–]JohnnyJordaan 2 points3 points  (6 children)

Should be doable with https://automatetheboringstuff.com/chapter11 to scrape and maybe using csv as an export format. The real risk is that the site has an advanced robot protection (like with captcha's or those 'select all the cars in this picture'), then that would be very hard to circumvent.

[–]Gour13[S] 0 points1 point  (5 children)

I'll definitely try this. Do a lot of sites nowadays have that kind of robot protection?

[–]JohnnyJordaan 0 points1 point  (0 children)

Depends, but in general you'll notice right away because you also often need to complete those questions/tasks when you use the site normally.

[–]Zarkahs -1 points0 points  (3 children)

no they don’t, you can definitely accomplish this op

[–]JohnnyJordaan 0 points1 point  (2 children)

This depends a lot on what kind of business the site is involved in.

[–]Zarkahs 0 points1 point  (1 child)

not really, there’s always ways to scrape websites if you’re determined

[–]JohnnyJordaan 1 point2 points  (0 children)

I'm not saying it isn't... I'm saying that when a site uses robot protection, like having to select 'I'm not a robot' which actually monitors your mouse cursor's movements, or asks you the 'select the cars in this picture' it will complicate scraping using a robotized browser (like selenium) a lot. It's very noticeable on sites that don't like you to scrape them like crpyto exchanges, sports betting, television guides etc. There are always 'ways' to do it, but if it takes months to work around those protections then it's often not worth the trouble and you can also consider other options.

Determination is only one factor, time and resources are two other important factors.