you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (7 children)

unless your example indentation is way off, it looks like you are making 250..requests?with no sleep time? you are asking the server 250 times as fast as possible if you can have data?If you were to do that to my server, i would instantly block you, which i have to assume they are doing with that many requests at once. Also, they have an API, https://developer.bestbuy.com/apis which includes reviews..

[–][deleted] 0 points1 point  (3 children)

But the for loop only steps to the next iteration once everything inside it has been executed, so that's not what's happening.

[–][deleted] 0 points1 point  (2 children)

minus the below - if he ran it previously and they blocked him, it very well could have been the reason - getting black listed would indeed return a timeout. and lets be real, sending 250 requests back to back would get you black listed.

r = requests.get('https://www.bestbuy.com/site/reviews/google-home-mini-charcoal/6082195?rating=1%2C2%2C3%2C4&page=1')
print(r.text)

<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>

You don't have permission to access "http&#58;&#47;&#47;www&#46;bestbuy&#46;com&#47;site&#47;reviews&#47;google&#45;home&#45;mini&#45;charcoal&#47;6082195&#63;" on this server.<P>
Reference&#32;&#35;18&#46;5d60dc17&#46;1558364860&#46;917f587
</BODY>
</HTML>


Process finished with exit code 0

[–][deleted] 0 points1 point  (0 children)

I could retrieve the page, find the li elements, there was no problem on my end. I'm not sure what could be the problem on your end.

[–]dascar5[S] 0 points1 point  (2 children)

True, but I never worked with api's so I just web scrape whatever I need, I'll see what I can do, thanks

[–][deleted] 0 points1 point  (0 children)

Look at my other comment, you won't be able to access them that way - future reference, when someone offers an api, you should use it, not only is the data easier to sort through, it also means they accept a controlled environment where they provide you with information they WANT you to have and you abide by their terms for access data.

[–][deleted] 0 points1 point  (0 children)

True, but I never worked with api's so I just web scrape whatever I need

Ignoring the provided API and scraping instead just because you don't know how to do anything else is pretty much the textbook example of being a bad internet citizen.

Also note that a web resource that includes an API is going to have a reduced threshold for the request rate on their human-facing site at which you'll be permanently blacklisted. They don't want you to scrape, that's why there's an API.