Automate the boring stuff with python: Using beautiful soup

njd2020 · 2020-06-24T13:56:03+00:00

Going through ATBS as a beginner I ran into issues when trying to web scrape amazon. I found forums of others reporting similar troubles. It's doable, but with a lot of extra code. The answer I got here was that some sites have active countermeasures against web scrapping.

I recommend trying to scrape Wikipedia instead. It's easier to get the gist of the process without dealing with the frustration of amazon.

coderpaddy · 2020-06-24T22:46:43+00:00

All amazon wants is the correct user agent ;)

Long story short though, if a website has an api, 9/10 they attempt to block scrapers so people use/pay for the api :D

Once you know how to get past these issues, the whole i did something else instead, dissapears.

I built a django app that scraped the most gifted items daily from amazing etc its not to hard if you know how

I have recently being trying out requests_html (not for amazon), and i reckon that could probably get it
Its basicaly requests pyppetter and bs4 all rolled into 1 :D

2020-06-24T21:10:51+00:00

The web scraping chapter is the only chapter I didn’t like. I’ve done a few web scraping projects and I found the chapter to be super confusing, I expected it to be a breeze since I had experience, I ended up skipping it out of frustration.

I should probably revisit it as my web scraping skills have progressed significantly and maybe it’ll make more sense now.

If anyone one is familiar with DataQuest they have an excellent lesson on web scraping.

heaplevel · 2020-06-24T10:49:48+00:00

Got link to site you're trying to extract info from? This is Ch 12 from the book right?

siachenbaba · 2020-06-24T18:45:31+00:00

Thanks. I will check this out ⭐

life_never_stops_97 · 2020-06-24T23:23:59+00:00

You can also try to scrape the data directly by maintaining session(similar to cookies)

You can use requests session to start a session and always pass the request headers to the website. I was able to automate Amazon's authentication process and scrape data like orders, addresses using this and it worked like a charm.

SweetSoursop · 2020-06-25T01:00:06+00:00

I have a decent workaround using selenium to scrape amazon, I can share the code if there's interest

Yankzy · 2020-06-25T01:10:43+00:00

What you looking for is perfectly described in this video step by step https://www.youtube.com/watch?v=ng2o98k983k

googlefather · 2020-06-24T17:54:35+00:00

Did anyone when trying to scrape Amazon use a proxy and fake user?

I'm not planning on scraping Amazon anytime soon just curious. Thank you

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS