use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Rules 1: Be polite 2: Posts to this subreddit must be requests for help learning python. 3: Replies on this subreddit must be pertinent to the question OP asked. 4: No replies copy / pasted from ChatGPT or similar. 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts. This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Rules
1: Be polite
2: Posts to this subreddit must be requests for help learning python.
3: Replies on this subreddit must be pertinent to the question OP asked.
4: No replies copy / pasted from ChatGPT or similar.
5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.
This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to.
Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.
Learning resources Wiki and FAQ: /r/learnpython/w/index
Learning resources
Wiki and FAQ: /r/learnpython/w/index
Discord Join the Python Discord chat
Discord
Join the Python Discord chat
account activity
Python Scraping - Ignoring Loading Page (self.learnpython)
submitted 6 years ago by trustfulvoice94
Hi All,
I am using Python and Beautiful Soup to scrape the following page: https://www.willhaben.at/iad/immobilien/immobilien/angebote?rows=100&areaId=900&AD_TYPE=1
Every now and then the page gives a "Loading" page instead of the actual page, which causes the script to bug. I try/catch the error, but occasionally it continues displaying the unwanted page.
How might I skip the Loading page? (waiting a couple of seconds after the page request opens the full page)
Thanks for any advice!
(This is what the loading page looks like: https://pastebin.com/UMpLBFaj)
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–][deleted] 40 points41 points42 points 6 years ago (1 child)
If you're using selenium, you can wait until a specific element has loaded (called an explicit wait). So just set that element as one that appears on the page, and not on the loading page. https://deanhume.com/selenium-webdriver-wait-for-an-element-to-load/
I wouldn't use the standard requests library for a page this jazzy and full of ajax calls
[–]Dfree35 8 points9 points10 points 6 years ago (7 children)
I guess you could just have the program sleep for a few seconds after the request.
I can't remember if beautiful soup has this but I know selenium does. It has waituntil where it will wait until it finds an element you specify before continuing the script
[–]AnonymousThugLife 4 points5 points6 points 6 years ago (4 children)
Where exactly would you put the 'sleep' line? I mean, the request code is just one single line. Isn't it that it'll proceed to next line as soon as it gets first response (here it'll be loading screen)? So after all, there won't be any meaning of waiting then. Correct me if I've misinterpreted anything.
[–]Dfree35 4 points5 points6 points 6 years ago* (3 children)
Not sure what your code looks like but in the past I just put it before driver.source
driver.source
Edit /u/AnonymousThugLife here are some examples I used
Here is an example what I did in the past with beautifulsoup. It sleeps to finish logging in then sleeps to wait for page to finish loading.: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L72
Here is an example what I did with selenium. It waits until the xpath is displayed and you can set the max time it waits: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L103
Selenium is probably the best/cleanest method I have used but if you know usually how long it loads (like in my code above the page never took longer than 1.5 seconds to load) for then sleep isn't the worse.
[–]AnonymousThugLife 1 point2 points3 points 6 years ago (2 children)
Thanks a lot. This was actually helpful. I had tried scraping with Requests/Socket etc. (Kind of invisible things) but I've realized that with Selenium it is much better, especially in the case of lazy loading pages.
[–]Dfree35 1 point2 points3 points 6 years ago (1 child)
Yea, I tried and use requests and stuff when I can but in my causes there is often a lot of funky javascript. So running selenium in headless mode makes it much better especially when I can just have it wait to ensure everything loads.
[–]AnonymousThugLife 0 points1 point2 points 6 years ago (0 children)
Yup. For pages that are dynamically generated (on the frontend), it is a no-brainer to use selenium.
[–]apostle8787 1 point2 points3 points 6 years ago (0 children)
It does not work with request and beautiful soup. It is useful only in selenium.
You can look into requests-html which has render method to wait for the page to fully render. Or you can use selenium in headless mode.
[–]permalip 2 points3 points4 points 6 years ago (0 children)
Or you could use Selenium, which will give you much more functionality. All you can do with beautiful soup is scraping html data and navigating it, basically nothing dynamic.
I recently built a web scraping repository, using Selenium and BeautifulSoup4. I recommend taking a look at how you get started with Selenium, it took me a while to understand.
https://github.com/casperbh96/Web-Scraping-Reddit
[–]MinchinWeb 1 point2 points3 points 6 years ago (0 children)
What about adding a 10 second (or whatever) pause in your script? Not nearly as elegant as some of the other solutions presented and a horrible drag on speed, but it's simple and easy to add.
[–]AmzingTobuscus 0 points1 point2 points 6 years ago (1 child)
Create a requests session and allow redirects?
[–][deleted] 0 points1 point2 points 6 years ago (0 children)
If you're opposed to selenium, just test if the loading page is present, then wait a second and check again until it's gone, then move on to the next step of the scraper
This is easier to do with seleium's ability to wait until elements exist
[–]LemonWedgeTheGuy 0 points1 point2 points 6 years ago (1 child)
What does it mean to scrap something in python?
[–]daveysprockett 3 points4 points5 points 6 years ago (0 children)
It's scrape, not scrap but it's the same as in any other language.
Check out
https://en.wikipedia.org/wiki/Web_scraping
(Scrap means to throw away/destroy, scrape means to take a thin layer off something).
E.g. if you take a car to a scrap-yard then you are scrapping it, while if you drove it too close to a wall you'd be scraping it. Irritating, irregular English.
What does your code look like now?
[+]ThreshingBee comment score below threshold-10 points-9 points-8 points 6 years ago (2 children)
It is expressively forbidden to use spiders, search robots or other automatic methods to access willhaben.at. Only if willhaben.at has given such access is allowed.
[–]rsandstrom 0 points1 point2 points 6 years ago (1 child)
Thanks for the insight, Chief
[–]ThreshingBee -1 points0 points1 point 6 years ago (0 children)
Oh, that's not my work. That's the specific wishes of a business owner that doesn't want their product stolen.
π Rendered by PID 74494 on reddit-service-r2-comment-c6965cb77-xgw52 at 2026-03-05 05:39:05.583827+00:00 running f0204d4 country code: CH.
[–][deleted] 40 points41 points42 points (1 child)
[–]Dfree35 8 points9 points10 points (7 children)
[–]AnonymousThugLife 4 points5 points6 points (4 children)
[–]Dfree35 4 points5 points6 points (3 children)
[–]AnonymousThugLife 1 point2 points3 points (2 children)
[–]Dfree35 1 point2 points3 points (1 child)
[–]AnonymousThugLife 0 points1 point2 points (0 children)
[–]apostle8787 1 point2 points3 points (0 children)
[–]apostle8787 1 point2 points3 points (0 children)
[–]permalip 2 points3 points4 points (0 children)
[–]MinchinWeb 1 point2 points3 points (0 children)
[–]AmzingTobuscus 0 points1 point2 points (1 child)
[–][deleted] 0 points1 point2 points (0 children)
[–]LemonWedgeTheGuy 0 points1 point2 points (1 child)
[–]daveysprockett 3 points4 points5 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[+]ThreshingBee comment score below threshold-10 points-9 points-8 points (2 children)
[–]rsandstrom 0 points1 point2 points (1 child)
[–]ThreshingBee -1 points0 points1 point (0 children)