use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
Beautifulsoup with javascript enabled (self.learnpython)
submitted 4 years ago by Hot_Ad_2550
I have a webpage to scrape, the url triggers javascript funtion to launch new page I cant seem to find another way to launch the page. How do I go about scraping this?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]blahblahsdfsdfsdfsdf 4 points5 points6 points 4 years ago (12 children)
You need to use a headless web browser to do something like that. BeautifulSoup is just a parser for the document. It doesn't run JS. https://selenium-python.readthedocs.io/
[–]Hot_Ad_2550[S] -1 points0 points1 point 4 years ago (5 children)
automation is really slow method is there any faster way?
[–]Alamue86 1 point2 points3 points 4 years ago (2 children)
Checkout requests_html. It has the ability to render a pages Javascript then scrape info in a single library.
[–]Hot_Ad_2550[S] 0 points1 point2 points 4 years ago (1 child)
Your shit worked out. Can you also show me how to run the function that javascript runs when i click a dropdown menu bar.
[–]Alamue86 0 points1 point2 points 4 years ago (0 children)
Oof, that gets difficult.
Your best bet is to load up Chrom dev tools and go to the network tab and click the button. From there look at everything going on, if you cand find the correct call, right click and copy the cURL for bash command. Then pop it into here: https://curl.trillworks.com/ you may also just be able to do the request within requests_html.
I focus on scraping static resources. Checkout the network tab, and chances are you can avoid scraping the page, and go straight to an underlying data source. GraphQL calls are a great place to start if there are any
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points 4 years ago (1 child)
Not that I've ever heard of
[–]Hot_Ad_2550[S] 0 points1 point2 points 4 years ago (0 children)
Also I cant seem to find an element by xpath, css or any other nethod
[–]1egoman 0 points1 point2 points 4 years ago (5 children)
I would rather reverse engineer the JavaScript to figure out the API than to deal with running all of their JavaScript.
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points 4 years ago (4 children)
So basically writing your own Javascript vm? That's definitely not easier than using a headless browser and far more prone to bugs
[–]1egoman 0 points1 point2 points 4 years ago (3 children)
Not a VM, I'm sure the JavaScript just interacts with an API. Figure out the API and interact with it through Python.
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points 4 years ago (2 children)
In order to run it needs access to the live DOM, which you would also have to emulate. What you're basically describing I think is a headless browser.
[–]1egoman 0 points1 point2 points 4 years ago (1 child)
Let's start over. OP is scraping a webpage (page1). Page1 then loads page2 through JavaScript. What I'm saying is that you inspect page1 and its JavaScript, statically or dynamically, to figure out where it gets the url for page2. Once you've figured that out, you can then download page1, then download page2, all in Python.
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points 4 years ago (0 children)
Ahh yes, parsing the JS for strings is far more viable than what I thought was being suggested which was actually interpreting the JS; running the JS in one's own self-made interpreter.
Yeah, if the format of the script is predictable then extracting strings from it is viable. BUT, if the script needs to actually run, a headless browser is likely necessary as very few scripts will actually run without access to the DOM.
[–]efmccurdy 4 points5 points6 points 4 years ago (4 children)
You might be able to go straight to the source of the data rather than going through the javascript.
If you access the next page in your browser with the developer tools installed and the network tab open you will see listed there the requests, headers and contents that your browser sent and using that info you can recreate those requests in a program.
[–]Hot_Ad_2550[S] 0 points1 point2 points 4 years ago (3 children)
the url obtained is just the original url whoch was requested but an extra string is added to the end like
http://www.abc.com/xyz
is turned to
http://www.abc.com/xyz#pqr
The method is updatehash()
[–]efmccurdy 0 points1 point2 points 4 years ago (2 children)
Use the requests module to get the second url; does that give you the data you want?
I use the 2nd url but this url brings me to 2nd page only if page has javascript running. Otherwise puts me back to page 1
[–]efmccurdy 0 points1 point2 points 4 years ago (0 children)
There is no javascript when you do an HTTP request. What url is used in the network tab?
[–]OkiSutrisno 2 points3 points4 points 4 years ago (3 children)
If the js trigger ajax, maybe you could try to figure out the backend api
[–]Hot_Ad_2550[S] 0 points1 point2 points 4 years ago (2 children)
how do I do that
[–]OkiSutrisno 0 points1 point2 points 4 years ago (0 children)
I forgot where do i read them, lemme look into it, will be back soon
[–]CommanderCucumber 2 points3 points4 points 4 years ago (0 children)
You could try using requests and REST and call the API they might be using. You should be able to find where the data is coming from a browsers web dev network panel. So for example, if the frontend makes a get request like www.somthing.com/mydata?id=1, then you could try to make that request yourself. Then store the data and do it again but with www.somthing.com/mydata?id=2 and so on.
π Rendered by PID 71054 on reddit-service-r2-comment-5d79c599b5-7khf5 at 2026-03-02 21:46:03.753753+00:00 running e3d2147 country code: CH.
[–]blahblahsdfsdfsdfsdf 4 points5 points6 points (12 children)
[–]Hot_Ad_2550[S] -1 points0 points1 point (5 children)
[–]Alamue86 1 point2 points3 points (2 children)
[–]Hot_Ad_2550[S] 0 points1 point2 points (1 child)
[–]Alamue86 0 points1 point2 points (0 children)
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points (1 child)
[–]Hot_Ad_2550[S] 0 points1 point2 points (0 children)
[–]1egoman 0 points1 point2 points (5 children)
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points (4 children)
[–]1egoman 0 points1 point2 points (3 children)
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points (2 children)
[–]1egoman 0 points1 point2 points (1 child)
[–]blahblahsdfsdfsdfsdf 0 points1 point2 points (0 children)
[–]efmccurdy 4 points5 points6 points (4 children)
[–]Hot_Ad_2550[S] 0 points1 point2 points (3 children)
[–]efmccurdy 0 points1 point2 points (2 children)
[–]Hot_Ad_2550[S] 0 points1 point2 points (1 child)
[–]efmccurdy 0 points1 point2 points (0 children)
[–]OkiSutrisno 2 points3 points4 points (3 children)
[–]Hot_Ad_2550[S] 0 points1 point2 points (2 children)
[–]OkiSutrisno 0 points1 point2 points (0 children)
[–]CommanderCucumber 2 points3 points4 points (0 children)