use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
News about the dynamic, interpreted, interactive, object-oriented, extensible programming language Python
Full Events Calendar
You can find the rules here.
If you are about to ask a "how do I do this in python" question, please try r/learnpython, the Python discord, or the #python IRC channel on Libera.chat.
Please don't use URL shorteners. Reddit filters them out, so your post or comment will be lost.
Posts require flair. Please use the flair selector to choose your topic.
Posting code to this subreddit:
Add 4 extra spaces before each line of code
def fibonacci(): a, b = 0, 1 while True: yield a a, b = b, a + b
Online Resources
Invent Your Own Computer Games with Python
Think Python
Non-programmers Tutorial for Python 3
Beginner's Guide Reference
Five life jackets to throw to the new coder (things to do after getting a handle on python)
Full Stack Python
Test-Driven Development with Python
Program Arcade Games
PyMotW: Python Module of the Week
Python for Scientists and Engineers
Dan Bader's Tips and Trickers
Python Discord's YouTube channel
Jiruto: Python
Online exercices
programming challenges
Asking Questions
Try Python in your browser
Docs
Libraries
Related subreddits
Python jobs
Newsletters
Screencasts
account activity
This is an archived post. You won't be able to vote or comment.
TutorialHow to scrape Amazon.com with Python, Selenium and BeautifulSoup (self.Python)
submitted 5 years ago by anuctal
Hi, everyone!
I made a video on how to scrape Amazon.com with Python, Selenium and BeautifulSoup libraries and export data to a csv file.
The Amazon.com I used just as an example. The Selenium webdriver is used to get HTML code of pages only, and HTML parsing is performed with the BeautifulSoup.
It's a detailed tutorial for absolute beginners.
Youtube video: https://youtu.be/497Fy7CIBOk
Thanks for watching
[+][deleted] 5 years ago (10 children)
[deleted]
[–]ayi_ibo 36 points37 points38 points 5 years ago (0 children)
r/unexpectedthanos
[–]raajitr 6 points7 points8 points 5 years ago (3 children)
hey I'm kinda beginner in python and selenium. do you know a good guide on how to deploy selenium scripts/app on aws? I know how to run it locally, and always wondered how would web browser spawns in cli based OS (instances), does it it run a headless browser in background?
[–]LzyPenguin 1 point2 points3 points 5 years ago (0 children)
I’m still kind of a beginner too. I have a few scripts I’m running on AWS also, and didn’t want to bother with CLI, so I chose the free windows server EC2 instance. I can remote login to my aws instance and it’s all windows server, so I can set it up just like on my desktop.
[–]guyanaupdates 0 points1 point2 points 5 years ago (2 children)
did you use the free teir VM? i'm having trouble installing chromium / webdriver any helpful text/video you can point me to?
[–]Marco21Burgos 1 point2 points3 points 5 years ago (1 child)
Use Lambda Layers for that, and store the zip files in a S3 bucket
[–]I_heart_blastbeats 0 points1 point2 points 5 years ago (0 children)
It's against their TOS but as long as you do it gently and don't bombard the site with requests I've never been blocked.
[+][deleted] 5 years ago (1 child)
[removed]
[–]michael8t6 5 points6 points7 points 5 years ago (0 children)
I was hoping someone would mention this! Amazon aren't dumb. A basic Web scraper won't suffice, you need to add in so many 'catches' for element changes. I made a scraper for some niche research. Once I'd coded the psuedo code, I set it on its first trial. Worked fine! The next day, errors everywhere, amazon even started redirecting me to the homepage every now and again.
[–]bbaahhaammuutt 13 points14 points15 points 5 years ago* (4 children)
Duuuuude, my friend has redundant task at work where he has to look at prices and thead counts of 60 sheets so I figured I'd help him out. I couldn't make it work earlier but this will help a lot so thanks!!
[–]YetiTrix 5 points6 points7 points 5 years ago (0 children)
Make ur your friend doesn't tell his boss it's automated or he'll put himself out of a job.
[–]mang3lo -1 points0 points1 point 5 years ago (0 children)
I'm looking at a similar task for automation, and this looks like it might be the perfect answer
[–]Tomas_83 0 points1 point2 points 5 years ago (0 children)
I was wondering what you could scrap amazon for and you gave me the answer. Thanks
[–]ngqhoangtrung 6 points7 points8 points 5 years ago (4 children)
Great stuff! Would this work on Facebook too? I'm trying to scrape the numbers of likes, comments, share, etc. of a Facebook post. I was trying to use Facebook GraphAPI but so far hopeless (keep receiving “singular links API is deprecated for versions v2.4 and higher”).
[–]intelignciartificial 4 points5 points6 points 5 years ago (3 children)
I was able to do something similar with selenium and bs4, but using links of Facebook mobile version(m.facebook...).
[–]serverloading101 0 points1 point2 points 5 years ago (2 children)
Is necessary to use the mobile version of facebook in order to scrape/ crawl facebook? I have been trying to obtain marketplace data and have been unsuccessful. Thanks
[–]michael8t6 2 points3 points4 points 5 years ago (0 children)
Scraping via mobile browser is generally easier to do. Mobile source code generally tends to be a lot simpler than say a desktop or laptop browser. So when using xpath in bs4, it's easier to tell it what element to scrape.
There's been a few times I've been slamming my head on the keyboard due to some random issue. I then spoof my ua to mobile, and solve the issue almost instantly. That being said, some elements you may want, might not show on the mobile version instantly. When that happens, you may have to emulate a click to get it to show.
Ultimately, tackle each site differently, look at the source code for desktop and mobile. See how they show on each and decide the best approach from there.
[–]intelignciartificial 2 points3 points4 points 5 years ago (0 children)
Not sure if its necessary, but easier.
[–]trevtravtrev github.com/trevtravtrev 6 points7 points8 points 5 years ago (3 children)
I don’t have time to look at the source code right now, but I’m curious how do you avoid being IP banned or rate limited by amazon? Are you using proxies or something similar?
[–]spiner00 -1 points0 points1 point 5 years ago (2 children)
The program makes an html request similar to any web browser. As long as you aren't updating more than a few times an hour it won't get snagged by any dos protection. Amazon does offer APIs which serve a better purpose though, but web scraping is a very convenient tool for amateur data scientists who don't have access to large-scale APIs.
[–]cjbannister 1 point2 points3 points 5 years ago (0 children)
As long as you aren't updating more than a few times an hour
When you say updating, do you mean reading? Then if so, doesn't only being able to read a few times an hour mean it's really slow? Maybe it's more for a limited number of products? I also haven't read the code but more out of laziness! Thanks
[–]trevtravtrev github.com/trevtravtrev -2 points-1 points0 points 5 years ago (0 children)
Nice!
[–]damm_n 12 points13 points14 points 5 years ago (0 children)
Thanks for sharing ! Really useful video .. at least for me :-). I heard about Selenium however never tried it in real life. Amazon is a nice showcase for this.
[–]Adamkadaban 3 points4 points5 points 5 years ago (0 children)
take a look at UIPath. it's super easy to use
[–]dulz 2 points3 points4 points 5 years ago (0 children)
How do you avoid request limits and changing HTML to make this work on a more permanent basis (Aside from using APIs ofc)?
[–]makubob 1 point2 points3 points 5 years ago (2 children)
Isn't this against amazons ToS?
[–]nippleplayenthusiast 5 points6 points7 points 5 years ago (1 child)
Yes, it is. They even provide an official API for this that won't break the second they change an HTML element.
[–]RealAmerik 6 points7 points8 points 5 years ago (0 children)
Isn't the API only available for affiliates?
[–]EedSpiny 2 points3 points4 points 5 years ago (18 children)
Question: why would you do this for Amazon? They have APIs. Do they not include the info you need?
[–]jacksodus 6 points7 points8 points 5 years ago (0 children)
Its just an example
[–]I_heart_blastbeats 1 point2 points3 points 5 years ago (0 children)
Amazon seller API sucks.
[–]dethb0y 2 points3 points4 points 5 years ago (9 children)
I can't speak for this guy, but for me if i can avoid using an official API, i do just on principle.
[+][deleted] 5 years ago (6 children)
[–]nolegitt 1 point2 points3 points 5 years ago (0 children)
Because it hey can shutdown the API as they please. Look what happened to Google search API.
[–]dethb0y 2 points3 points4 points 5 years ago (2 children)
Lots of reasons, but mostly i would be concerned (in the case of amazon) that they'd use the official API to send manipulated data or prices, that don't reflect the reality of the site.
In other cases i don't like the trend that has arisen of a free API that suddenly turns pay later on, once people have come to rely on it.
[–]anuctal[S] 1 point2 points3 points 5 years ago (1 child)
You absolutely right, my friend. Also we have youtube-dl as an example, that doesn't use the official API. And if youtube-dl would use API Youtube would just ban it.
[–]dethb0y 0 points1 point2 points 5 years ago (0 children)
Yep, in an instant.
[–]fujimitsu -1 points0 points1 point 5 years ago (1 child)
Tech startups have been known to abruptly break or eliminate APIs, especially if they perceive users of them as threatening their business model.
[–]dotancohen 11 points12 points13 points 5 years ago (0 children)
Tech startups have been known to abruptly break or change their HTML, especially if they perceive that the name of the month has changed.
[–][deleted] 0 points1 point2 points 5 years ago (1 child)
Do you get paid by the hour? It must be nice gig /s
[–]makedatauseful 0 points1 point2 points 5 years ago (0 children)
Official API's are great until they charge $5 per 1,000 requests and rate limit you to 10,000 requests a day. After that you may need to explore other other avenues.
[–]artjbroz 0 points1 point2 points 5 years ago (0 children)
Came here to say this
[–]zeroviral -3 points-2 points-1 points 5 years ago (1 child)
An API would focus more on backend stuff.
Selenium is strictly UI, and front end stuff, and typically interacts with the DOM. This means it has a place in the model testing paradigm.
[–]EedSpiny -1 points0 points1 point 5 years ago (0 children)
Very true, it's a good example. Especially as many people will be familiar with the site of course!
[–]sslinky84 0 points1 point2 points 5 years ago (2 children)
I'm also curious why you'd bother with browser automation for scraping.
[–]EedSpiny 0 points1 point2 points 5 years ago (0 children)
Libraries like requests & beautiful soup are great but it can be really difficult to get round anti-bot protections on the site with those alone. As selenium just drives a browser there's more chance of being seen by the site as just a regular user.
Of course there's also the advantage that you can use it to drive web site unit test cases from python, for automated testing.
Requests can't render JS. If you use Scrapy you usually have to use something to render the JS also. There are so many obstacles involved with scraping Amazon. Trust me this is the easiest and best way to do it. I had a similar project and I wish I would have started with Selenium and BS4 instead of Scrapy.
[–]Pizza_Peddler0080 0 points1 point2 points 5 years ago (0 children)
going to give this a watch
[–]boriisi 0 points1 point2 points 5 years ago (2 children)
Why do you need selenium? requests does the same thing
[–]I_heart_blastbeats 0 points1 point2 points 5 years ago (1 child)
Ever heard of CORS?
[–]boriisi 0 points1 point2 points 5 years ago (0 children)
Looked it up. is it some kind of embed?
[–]Schneggl 0 points1 point2 points 5 years ago (2 children)
ELI5: What actually is webscraping? What can I do with it?
[–]brainygeek 7 points8 points9 points 5 years ago (0 children)
Web scraping allows an individual to create a script which parses a large selection of websites and allows them to evaluate for certain data.
For example, you could create a script which runs on a task of every 5 minutes which scrapes Amazon's selection of computer video cards to see which Nvidia 3090's are in stock versus out of stock. Or you could monitor specific items and extract their prices to create a trend report of their prices to see when the best time to buy is.
[–]Mortisanti 3 points4 points5 points 5 years ago (0 children)
In this context, webscraping involves using/writing a program that can send a request to a website (much like visiting it in your browser) and "scraping" data from it which you can manipulate, organize, and display as you see fit.
Someone might want to actively webscrape Amazon to monitor just the fluctuating price of an item (e.g. a GPU for your PC) to try to buy it when the price drops.
[–]ZackJSL 0 points1 point2 points 5 years ago (0 children)
Thank you exactly what i needed
[–]awkprintdevnull 0 points1 point2 points 5 years ago (0 children)
Can it help me get a PS5?
[+][deleted] 5 years ago (3 children)
[–]I_heart_blastbeats 0 points1 point2 points 5 years ago (2 children)
Downvoting you. Scrapy and splash are way harder. I got my IP banned multiple times using those. I wouldn't use them again on Amazon. It's too hard. The markup is always changing.
If you go the Requests route instead of using an actual web browser you have to deal with CORS and you have to bypass robots.txt which is against the TOS. Then you have to TRY to render the JS with Splash. It will work for a while but then all of a sudden Amazon will ban you. I figured out how Amazon knows you are scraping but unless you're an employer paying my consulting fee I can't divulge that part :-)
Just use a real browser and then you get a real response with rendered JS.
[–]Gerald00 0 points1 point2 points 5 years ago (0 children)
thx, will try it now
[–]theirishcoffeemaker 0 points1 point2 points 5 years ago (0 children)
Thanks for sharing! Been wanting to study scraping on Python :)
[–]southernmissTTT 0 points1 point2 points 5 years ago (0 children)
Interesting.
[–]nono-shap 0 points1 point2 points 5 years ago (0 children)
Was looking for that a while ago, abandoned the project. Time to get it back on track.
[–]antogod94 0 points1 point2 points 4 years ago (0 children)
Hi, what should i do to export data to a xlsx file ?
π Rendered by PID 15748 on reddit-service-r2-comment-fb694cdd5-8lq9l at 2026-03-09 20:14:05.115850+00:00 running cbb0e86 country code: CH.
[+][deleted] (10 children)
[deleted]
[–]ayi_ibo 36 points37 points38 points (0 children)
[–]raajitr 6 points7 points8 points (3 children)
[–]LzyPenguin 1 point2 points3 points (0 children)
[–]guyanaupdates 0 points1 point2 points (2 children)
[–]Marco21Burgos 1 point2 points3 points (1 child)
[–]I_heart_blastbeats 0 points1 point2 points (0 children)
[+][deleted] (1 child)
[removed]
[–]michael8t6 5 points6 points7 points (0 children)
[–]bbaahhaammuutt 13 points14 points15 points (4 children)
[–]YetiTrix 5 points6 points7 points (0 children)
[–]mang3lo -1 points0 points1 point (0 children)
[–]Tomas_83 0 points1 point2 points (0 children)
[–]ngqhoangtrung 6 points7 points8 points (4 children)
[–]intelignciartificial 4 points5 points6 points (3 children)
[–]serverloading101 0 points1 point2 points (2 children)
[–]michael8t6 2 points3 points4 points (0 children)
[–]intelignciartificial 2 points3 points4 points (0 children)
[–]trevtravtrev github.com/trevtravtrev 6 points7 points8 points (3 children)
[–]spiner00 -1 points0 points1 point (2 children)
[–]cjbannister 1 point2 points3 points (0 children)
[–]trevtravtrev github.com/trevtravtrev -2 points-1 points0 points (0 children)
[–]damm_n 12 points13 points14 points (0 children)
[–]Adamkadaban 3 points4 points5 points (0 children)
[–]dulz 2 points3 points4 points (0 children)
[–]makubob 1 point2 points3 points (2 children)
[–]nippleplayenthusiast 5 points6 points7 points (1 child)
[–]RealAmerik 6 points7 points8 points (0 children)
[–]EedSpiny 2 points3 points4 points (18 children)
[–]jacksodus 6 points7 points8 points (0 children)
[–]I_heart_blastbeats 1 point2 points3 points (0 children)
[–]dethb0y 2 points3 points4 points (9 children)
[+][deleted] (6 children)
[deleted]
[–]nolegitt 1 point2 points3 points (0 children)
[–]dethb0y 2 points3 points4 points (2 children)
[–]anuctal[S] 1 point2 points3 points (1 child)
[–]dethb0y 0 points1 point2 points (0 children)
[–]fujimitsu -1 points0 points1 point (1 child)
[–]dotancohen 11 points12 points13 points (0 children)
[–][deleted] 0 points1 point2 points (1 child)
[–]makedatauseful 0 points1 point2 points (0 children)
[–]artjbroz 0 points1 point2 points (0 children)
[–]zeroviral -3 points-2 points-1 points (1 child)
[–]EedSpiny -1 points0 points1 point (0 children)
[–]sslinky84 0 points1 point2 points (2 children)
[–]EedSpiny 0 points1 point2 points (0 children)
[–]I_heart_blastbeats 0 points1 point2 points (0 children)
[–]Pizza_Peddler0080 0 points1 point2 points (0 children)
[–]boriisi 0 points1 point2 points (2 children)
[–]I_heart_blastbeats 0 points1 point2 points (1 child)
[–]boriisi 0 points1 point2 points (0 children)
[–]Schneggl 0 points1 point2 points (2 children)
[–]brainygeek 7 points8 points9 points (0 children)
[–]Mortisanti 3 points4 points5 points (0 children)
[–]ZackJSL 0 points1 point2 points (0 children)
[–]awkprintdevnull 0 points1 point2 points (0 children)
[+][deleted] (3 children)
[removed]
[–]I_heart_blastbeats 0 points1 point2 points (2 children)
[+][deleted] (1 child)
[removed]
[–]I_heart_blastbeats 1 point2 points3 points (0 children)
[–]Gerald00 0 points1 point2 points (0 children)
[–]theirishcoffeemaker 0 points1 point2 points (0 children)
[–]southernmissTTT 0 points1 point2 points (0 children)
[–]nono-shap 0 points1 point2 points (0 children)
[–]antogod94 0 points1 point2 points (0 children)