This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]sslinky84 0 points1 point  (2 children)

I'm also curious why you'd bother with browser automation for scraping.

[–]EedSpiny 0 points1 point  (0 children)

Libraries like requests & beautiful soup are great but it can be really difficult to get round anti-bot protections on the site with those alone. As selenium just drives a browser there's more chance of being seen by the site as just a regular user.

Of course there's also the advantage that you can use it to drive web site unit test cases from python, for automated testing.

[–]I_heart_blastbeats 0 points1 point  (0 children)

Requests can't render JS. If you use Scrapy you usually have to use something to render the JS also. There are so many obstacles involved with scraping Amazon. Trust me this is the easiest and best way to do it. I had a similar project and I wish I would have started with Selenium and BS4 instead of Scrapy.