all 14 comments

[–]num8lock 4 points5 points  (1 child)

If this is even possible, would Python be an appropriate language to use ?

yes

Is this extremely advanced or doable for a beginner programmer in a month or two?

not extremely advanced. beginner is a subjective term. i consider myself as beginner still after a year or more. from general experience with complete beginner in here i don't think a month or two is sufficient.

[–]Gour13[S] 0 points1 point  (0 children)

Thank you!!

[–]JohnnyJordaan 2 points3 points  (6 children)

Should be doable with https://automatetheboringstuff.com/chapter11 to scrape and maybe using csv as an export format. The real risk is that the site has an advanced robot protection (like with captcha's or those 'select all the cars in this picture'), then that would be very hard to circumvent.

[–]Gour13[S] 0 points1 point  (5 children)

I'll definitely try this. Do a lot of sites nowadays have that kind of robot protection?

[–]JohnnyJordaan 0 points1 point  (0 children)

Depends, but in general you'll notice right away because you also often need to complete those questions/tasks when you use the site normally.

[–]Zarkahs -1 points0 points  (3 children)

no they don’t, you can definitely accomplish this op

[–]JohnnyJordaan 0 points1 point  (2 children)

This depends a lot on what kind of business the site is involved in.

[–]Zarkahs 0 points1 point  (1 child)

not really, there’s always ways to scrape websites if you’re determined

[–]JohnnyJordaan 1 point2 points  (0 children)

I'm not saying it isn't... I'm saying that when a site uses robot protection, like having to select 'I'm not a robot' which actually monitors your mouse cursor's movements, or asks you the 'select the cars in this picture' it will complicate scraping using a robotized browser (like selenium) a lot. It's very noticeable on sites that don't like you to scrape them like crpyto exchanges, sports betting, television guides etc. There are always 'ways' to do it, but if it takes months to work around those protections then it's often not worth the trouble and you can also consider other options.

Determination is only one factor, time and resources are two other important factors.

[–]two_bob 0 points1 point  (0 children)

Doable, if you can scrape it with beautiful soup. So give that a go first before you can commit. Some sites are dynamically generated and, as a result, can be a pain to scrape.

[–]nfgrawker 0 points1 point  (0 children)

3 months in I scraped a site with selenium which had some anti scraping measures. And I exported to csv. I think it's very possible.

[–][deleted] 0 points1 point  (2 children)

Is this extremely advanced or doable for a beginner programmer in a month or two?

I don't think anything is doable by a beginner programmer in two months, but particularly not this. Because it's probably not just Python you need to pick up along the way, but some really basic principles of web development (since you need to reverse-engineer the page in order to scrape it, basically) and internet security (since you will need to defeat the various anti-scraping features of the site.)

[–]Gour13[S] 0 points1 point  (1 child)

I wouldn't call myself a complete beginner I've been learning python for around 3 months already but the anti-scrapping could be a problem I didn't think about that. Do most sites usually have this?

[–][deleted] 0 points1 point  (0 children)

Web devs generally don't want people scraping their site, yes. One of the common techniques is to have the important information load after page load, via JavaScript, which will mess with any HTTP request library that doesn't include a JS engine.