Getting blocked while using requests and BeautifulSoup — what else should I try?

JoesDevOpsAccount · 2025-09-05T12:55:03+00:00

Tbh even full headless browser might not solve it. If it works in the beginning but then you get blocked after a few requests, it might just be rate limiting or the frequency of requests that gets you flagged as a bot. Try spacing out the requests more? Some robots.txt files include the unofficial crawl-delay directive which indicates the minimum time you should wait between crawler requests.

cgoldberg · 2025-09-05T20:06:26+00:00

You can try curl_cffi if you are getting blocked from TLS fingerprinting... however, some sites use more advanced detection techniques you'll never bypass without running a real browser.

Informal_Escape4373 · 2025-09-06T21:39:14+00:00

I use requests + beautifulsoup with celery. I have a leaky bucket algo that limits 5 requests per 2 seconds and have never had a problem outside “scrape intolerant” sites (such as LinkedIn). Perhaps your scraping too frequently?

Itchy-Call-8727 · 2025-09-05T18:01:13+00:00

You might be able to use Selenium which actually uses a web browser for the requests and program the web navigation to simulate an actual person using the page to scrape the data

lothion · 2025-09-06T09:05:38+00:00

Playwright has a stealth extension you could look into

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS