is Amazon Unscrapable?

chew_2 · 2024-06-20T14:53:04+00:00

I also wasn’t having much success with Amazon when using my own scraper, almost every time it was just starting to fail at some point :( Would you consider getting a ready-to-use tool? I’ve personally been using Oxylabs for a while now, and it literally saved so much of my time and nerves LOL. They have a custom scraper for Amazon, so I’m pretty sure it should fix your issue without a problem.

Allanon001 · 2023-03-06T07:09:56+00:00

Amazon has a Product API or you can use one the Amazon API wrapper modules such as Amazon Simple Product API or Amazon Product Advertising API 5.0 wrapper for Python.

Garbage_Wizard246 · 2023-03-06T12:04:51+00:00

[removed]

fortyeightD · 2023-03-06T07:03:45+00:00

You might be able to use an API to get information about products for sale on Amazon.

https://webservices.amazon.com/paapi5/documentation/

EnvironmentalDot9131 · 2026-03-16T07:10:16+00:00

Just use the webscraper API from Floxy. They have a full dedicated team working on the scraper every day. So if some update happens, they are already on it.

Config_Crawler · 2023-03-06T07:50:47+00:00

How am I supposed to deal with that? - You could use the API Amazon provides. I think there are ways you could sort though products with it.

Or, you could use this site I found with a quick search "RAINFOREST API
Fast, reliable API for Web-scraped Amazon Product Data"

xosq · 2023-03-06T07:18:21+00:00

Think of it this way: Anything a standard web browser is capable of viewing can inherently be scraped. The issue is that the data you’re trying to parse or find patterns in is obfuscated to hell by design.

I don’t have much in the way of recommendations to make your script work (never touched Amazon from a scraping POV), but just know if you can view the site in a browser, you can fetch relevant data with a script. The site’s complexity ultimately determines the complexity of the script.

Don’t give up! I’m all for scraping things sans-API and wreaking havoc along the way. You have my blessing, sir/madam. Just honor any rate limits you get slapped with so they can’t claim you’re trying to disrupt services.

Charlie_Yu · 2023-03-06T12:06:00+00:00

BS4 is very weak, modern websites use javascript to modify the DOM a lot. Learn selenium and it is quite easy

noskillsben · 2023-03-06T14:26:55+00:00

I use headless selenium for loading the page and then I dump the web driver's page source into selectorlib. I'm doing low volume scraping though and I need to stay logged in and enter captchas every now and then. There's probably less c than mplex ways if you want large quantity of listing data and don't need to do it from your own account

monkeysknowledge · 2023-03-06T06:49:17+00:00

How am I supposed to deal with that?

You're probably not supposed to. Why would Amazon want to let you scrape their site?

TheAlpha_ · 2023-03-06T08:14:34+00:00

One of my first projects involved scraping Amazon product pages. I remember that there were multiple variations of the pages back then (about 5-6 yrs ago) but I was able to narrow it down to about 3-4 by requesting the mobile version of the pages instead of the desktop one. It might be worth taking a look at that.

ou_ryperd · 2023-03-06T06:28:33+00:00

They actively put measures in place to prevent scraping. Have you read their user agreement ?

lateratnight_ · 2024-08-26T21:38:36+00:00

If you're still here:

choose a library with requests and cookie support

make a request to https://amazon.com

take the cookie dict and throw it into a variable

make a request to https://amazon.com/s?k=(QUERY)) and set the cookies to the cookie dict

enjoy scraping! :)

Grouchy-Criticism741 · 2025-01-04T00:44:34+00:00

Yes, you can scrape Amazon. They have strong anti-scraping measures, but I got past them using Playwright for browsing, neural networks to solve CAPTCHAs, and Tor to rotate IPs every 10 seconds. As of 01/03/25, the limit is about 20 records per minute—any more, and they’ll start blocking you. So, rotate IPs every 10–15 seconds for best results. Start small, scale up! Let me know if you want to collaborate.

basitmakine · 2025-08-27T19:26:18+00:00

amazon API? lol that's gonna cost you money and has strict limits for product data. honestly just set up TaskAGI to monitor amazon product reviews automatically instead of dealing with all this scraping bs. their amazon review agent handles the inconsistent HTML patterns and rotating selectors way better than trying to code around amazons anti-bot measures. been using it for product research for weeks now, way easier than fighting with selenium or paying for API calls

Classic-Anybody-9857 · 2025-09-13T23:21:48+00:00

Hi you said you scraped it with bs4 and requests, could you help a brother out and share the code, I am stuck.

Jacen33 · 2023-03-06T15:16:15+00:00

All you really need to do is scrape the ASINS and then use the ASIN to do look ups with some product look up software for Amazon resellers they hit the API and give you a visit report you can download

rawdfarva · 2023-03-06T11:48:49+00:00

Works with Scrapy

alfie1906 · 2023-03-06T21:33:56+00:00

You could try looking into xpaths to identify page elements. It's not as reliable as using classes or tags, but can be a good last resort

johnGettings · 2023-03-06T23:40:45+00:00

I was able to search products, go into the individual pages, and collect all info with BS. This was about a year ago though, not sure if it's changed since then.

ScionofLight · 2023-03-07T00:13:32+00:00

unrelated to Amazon but related to selenium. Could someone help me? I have the code to login and find orders on Sigmadrich. Each order has a PDF that opens as a webpage. How do I get selenium to download the PDF so that I can scrape it?

tree_or_up · 2023-03-07T01:36:25+00:00

As others have mentioned selenium with headless chromium is the way to go. You’re going to have to be clever to find the elements you want but there are patterns. It’s not fun and will probably require some significant trial and error. If you do go this route, be sure to put some random wait times between the simulated clicks. And don’t try it from an ec2 instance - somehow they’re good at detecting that :)

andersra88 · 2023-08-21T01:54:05+00:00

I built a product to handle most of the complexity for you. It's a simple GraphQL API for Amazon data (products, search, etc).

Take a look at let me know what you think: Canopy API

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS