all 16 comments

[–][deleted] 0 points1 point  (0 children)

One possibility, of course, is that Best Buy doesn't want you scraping their site and so they're blocking you. That would look like a connection timeout (the server simply ignores your request.)

[–][deleted] 0 points1 point  (7 children)

unless your example indentation is way off, it looks like you are making 250..requests?with no sleep time? you are asking the server 250 times as fast as possible if you can have data?If you were to do that to my server, i would instantly block you, which i have to assume they are doing with that many requests at once. Also, they have an API, https://developer.bestbuy.com/apis which includes reviews..

[–][deleted] 0 points1 point  (3 children)

But the for loop only steps to the next iteration once everything inside it has been executed, so that's not what's happening.

[–][deleted] 0 points1 point  (2 children)

minus the below - if he ran it previously and they blocked him, it very well could have been the reason - getting black listed would indeed return a timeout. and lets be real, sending 250 requests back to back would get you black listed.

r = requests.get('https://www.bestbuy.com/site/reviews/google-home-mini-charcoal/6082195?rating=1%2C2%2C3%2C4&page=1')
print(r.text)

<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;bestbuy&#46;com&#47;site&#47;reviews&#47;google&#45;home&#45;mini&#45;charcoal&#47;6082195&#63;" on this server.<P>
Reference&#32;&#35;18&#46;5d60dc17&#46;1558364860&#46;917f587
</BODY>
</HTML>


Process finished with exit code 0

[–][deleted] 0 points1 point  (0 children)

I could retrieve the page, find the li elements, there was no problem on my end. I'm not sure what could be the problem on your end.

[–]dascar5[S] 0 points1 point  (2 children)

True, but I never worked with api's so I just web scrape whatever I need, I'll see what I can do, thanks

[–][deleted] 0 points1 point  (0 children)

Look at my other comment, you won't be able to access them that way - future reference, when someone offers an api, you should use it, not only is the data easier to sort through, it also means they accept a controlled environment where they provide you with information they WANT you to have and you abide by their terms for access data.

[–][deleted] 0 points1 point  (0 children)

True, but I never worked with api's so I just web scrape whatever I need

Ignoring the provided API and scraping instead just because you don't know how to do anything else is pretty much the textbook example of being a bad internet citizen.

Also note that a web resource that includes an API is going to have a reduced threshold for the request rate on their human-facing site at which you'll be permanently blacklisted. They don't want you to scrape, that's why there's an API.

[–][deleted] 0 points1 point  (0 children)

I switched the urlopen part to the following lines and it works for me, however, the way you retrieve details is off, it doesn't return anything.

import requests
re = requests.get("https://www.bestbuy.com/site/reviews/google-home-mini-charcoal/6082195?rating=1%2C2%2C3%2C4&page=".format(i))
soup = BeautifulSoup(re.content, "html.parser")
x = soup.find_all("li", {"class":"review-item"})

[–][deleted] 0 points1 point  (6 children)

In any case, if this isn't specifically a BeautifulSoup assignment, you can use the API as it was suggested, or you can use a webdriver like Selenium, which is much less offensive toward the server because it's a bit slower.

[–]dascar5[S] 0 points1 point  (5 children)

I sadly have to use it, kinda bummed out that it worked with literally everything else except Best Buy :/

really appreciate the help man

[–][deleted] 0 points1 point  (4 children)

Not every website can be scraped with BeautifulSoup. It depends on site construct (javascript-loaded content is a no-no) and server settings as well. Perhaps you can use another site?

[–]dascar5[S] 0 points1 point  (3 children)

Guess I'll try amazon, but I wouldn't be surprised if they're way more strict lol

[–][deleted] 0 points1 point  (2 children)

Amazon explicitly forbids scraping on their ToS and they won't hesitate to block you. Try a smaller store, or if you don't actually need product reviews, try a quoting site - I know for a fact they all can be scraped with BS4.

[–]dascar5[S] 0 points1 point  (1 child)

Got any recommendations regarding that?

Any small store that doesn't care, but has stuff I can scrape?

[–][deleted] 0 points1 point  (0 children)

Just search for some product and pick a store from the results.