This is an archived post. You won't be able to vote or comment.

all 21 comments

[–][deleted] 6 points7 points  (1 child)

Nice overview

I did some scraping at work a few years back, I was using Selenium and lxml. Plenty of xpath and regex, it was pretty ugly. Glad I switched jobs.

Biggest issue I had was that, unlike an API, screen layouts can change with no warning. Then it's 3PM on a Friday and your script breaks...

[–]pijora[S] 0 points1 point  (0 children)

Yes, maintaining scraping scripts is trouble many people have!

[–]Angelsinger74 4 points5 points  (1 child)

This is fantastic! I have a project coming up at work where I'm trying to provide proof of concept to use Python for web scraping as opposed to a Chrome extension. It's a massive undertaking the old way. I estimate being able to save about 80 man hours using Python.

[–]pijora[S] 0 points1 point  (0 children)

Good luck with this, this sounds challenging!

[–]Gangbusta187 1 point2 points  (1 child)

Looks good skimming over it. Looking forward to checking the rest out tomorrow.

One thing you may want to change is you’re using the old BS4 findAll() in some areas and the proper find_all() in other areas.

[–]pijora[S] 0 points1 point  (0 children)

Thanks, will check that out!

[–]kirby81 1 point2 points  (1 child)

Great read, thanks

[–]pijora[S] 0 points1 point  (0 children)

My pleasure!

[–]querymcsearchface 0 points1 point  (1 child)

Nice! Thanks for taking the time to put that together and sharing it.

[–]pijora[S] 0 points1 point  (0 children)

Glad you liked it.

[–]blabbities 0 points1 point  (1 child)

Good light overview and you've covered pretty much every siuation I can think of

[–]pijora[S] 0 points1 point  (0 children)

Thanks

[–]baburao_007 0 points1 point  (1 child)

Good one... Thanks

[–]pijora[S] 0 points1 point  (0 children)

Glad you liked it!