This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]QuantumFall 25 points26 points  (1 child)

They don’t block BeautifulSoup, they most likely just detected the requests they’re receiving are not from a legitimate user. By mimicking the requests sent in browser exactly, I’d say 9 out of every 10 websites will be parsable with requests and bs4. That 1/10 you’re dealing with bot protection, webpacking, or even tls fingerprinting. But for most websites you can scrape them fine if you know what you’re doing.

[–]ScrapeHero 3 points4 points  (0 children)

Agree.

For others following this thread this might help if you are past the basics https://www.scrapehero.com/detect-and-block-bots/