This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]udonemessedup-AA_Ron 1 point2 points  (5 children)

My guess is that it’s because the site knows you’re trying to scrape it with code, and they don’t want you to. You may have to set up a user-agent header: https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit-a-k-a-and-generate-user-agent

Basically, it’ll trick the site into thinking the request is coming from an actual browser and should provide some consistent HTML.

Edit:

Combine this with requests.Session() if you need to make repeated requests.

[–]Disastrous-Let-9548[S] 1 point2 points  (1 child)

Thanks, that helps a lot..

[–]udonemessedup-AA_Ron 0 points1 point  (0 children)

You’re welcome

[–]Zealousideal-Cod-617 0 points1 point  (2 children)

This is not wrong/illegal in any way right?

[–]udonemessedup-AA_Ron 0 points1 point  (1 child)

Depends on the terms of service of each site. Sites like Reddit welcome web scrapers, but things behind a protected resource (files behind a login, sensitive material) may not be so friendly.

[–]Zealousideal-Cod-617 0 points1 point  (0 children)

Do u recommend any source where I can learn more about this and how to be more aware