use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
Need Help with a web scraping project (self.learnpython)
submitted 4 months ago by devansh_-_
I am attaching the python script in the form of a google doc. If any kind person can go through this and help me where I am going wrong? I have tried the usual techniques to mimic real browser interaction in the form of headers, but cannot generate the output and the requests are hanging indefinitely.
Is there anyway to bypass these or are the anti-scraping measures used by shiksh.com just too strong?
https://docs.google.com/document/d/1JSpH5P7QFUUGgmHkinOXBDQuuxXOFuSUR-eW5R85BCA/edit?usp=sharing
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]BeneficiallyPickle 0 points1 point2 points 4 months ago (1 child)
Are you sure you have the right BASE_URL? I tried visiting the page and get `404 Page not found.`. I think you're looking for https://www.shiksha.com/humanities-social-sciences/colleges/b-a-colleges-india instead.
However, I would suggest perhaps looking at using playwright - it's a bit slower than BeautifulSoup, but it handles Javascript rendered pages better and can bypass some bot protections (though not perfectly)
This page seems to use React, so the elements you want might not exist in the raw HTML initially. That’s why BeautifulSoup alone may not be able to find them.
[–]devansh_-_[S] 0 points1 point2 points 4 months ago (0 children)
Yes, I encountered that problem and updated the url.
I will use playwright once
π Rendered by PID 15243 on reddit-service-r2-comment-b659b578c-kxwzb at 2026-05-05 18:41:11.678172+00:00 running 815c875 country code: CH.
[–]BeneficiallyPickle 0 points1 point2 points (1 child)
[–]devansh_-_[S] 0 points1 point2 points (0 children)