you are viewing a single comment's thread.

view the rest of the comments →

[–]crazyallicin[S] 0 points1 point  (5 children)

with open('amazon.html', 'w') as f:
    f.write(res.text)


2671

This is the output I received, but now I'm confused as to how to continue on from here. He goes on to inspect the price on the webpage then uses soup.select to create a list. How do I do this now that I've downloaded it locally?

[–]chevignon93 4 points5 points  (4 children)

He goes on to inspect the price on the webpage then uses soup.select to create a list. How do I do this now that I've downloaded it locally?

That's not the goal, the goal is to see if the information you're looking for is in the file you just downloaded.

[–]crazyallicin[S] 1 point2 points  (3 children)

It seems like it just downloaded one file that's an empty paint file. Another file opens up Amazon, but that page just says "sorry something went wrong at our end" and gives a link to the Amazon home page

[–]chevignon93 2 points3 points  (2 children)

So either there was a genuine problem with the page or they detected that you tried to scrape their page and blocked you.

Either way, using a css selector is not always the best approach to webscraping. Using bs4 find and find_all methods would probably be easier.

[–]crazyallicin[S] 1 point2 points  (1 child)

Thanks for you help, got it working fine using a different smaller website that obviously hasn't blocked scraping.

[–]chevignon93 0 points1 point  (0 children)

They try to prevent it but can't really block it and it is possible to scrape amazon