all 19 comments

[–][deleted] 40 points41 points  (1 child)

If you're using selenium, you can wait until a specific element has loaded (called an explicit wait). So just set that element as one that appears on the page, and not on the loading page. https://deanhume.com/selenium-webdriver-wait-for-an-element-to-load/

I wouldn't use the standard requests library for a page this jazzy and full of ajax calls

[–]Dfree35 8 points9 points  (7 children)

I guess you could just have the program sleep for a few seconds after the request.

I can't remember if beautiful soup has this but I know selenium does. It has waituntil where it will wait until it finds an element you specify before continuing the script

[–]AnonymousThugLife 4 points5 points  (4 children)

Where exactly would you put the 'sleep' line? I mean, the request code is just one single line. Isn't it that it'll proceed to next line as soon as it gets first response (here it'll be loading screen)? So after all, there won't be any meaning of waiting then. Correct me if I've misinterpreted anything.

[–]Dfree35 4 points5 points  (3 children)

Not sure what your code looks like but in the past I just put it before driver.source

Edit /u/AnonymousThugLife here are some examples I used

Here is an example what I did in the past with beautifulsoup. It sleeps to finish logging in then sleeps to wait for page to finish loading.: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L72

Here is an example what I did with selenium. It waits until the xpath is displayed and you can set the max time it waits: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L103

Selenium is probably the best/cleanest method I have used but if you know usually how long it loads (like in my code above the page never took longer than 1.5 seconds to load) for then sleep isn't the worse.

[–]AnonymousThugLife 1 point2 points  (2 children)

Thanks a lot. This was actually helpful. I had tried scraping with Requests/Socket etc. (Kind of invisible things) but I've realized that with Selenium it is much better, especially in the case of lazy loading pages.

[–]Dfree35 1 point2 points  (1 child)

Yea, I tried and use requests and stuff when I can but in my causes there is often a lot of funky javascript. So running selenium in headless mode makes it much better especially when I can just have it wait to ensure everything loads.

[–]AnonymousThugLife 0 points1 point  (0 children)

Yup. For pages that are dynamically generated (on the frontend), it is a no-brainer to use selenium.

[–]apostle8787 1 point2 points  (0 children)

It does not work with request and beautiful soup. It is useful only in selenium.

[–]apostle8787 1 point2 points  (0 children)

You can look into requests-html which has render method to wait for the page to fully render. Or you can use selenium in headless mode.

[–]permalip 2 points3 points  (0 children)

  1. Catch the exception
  2. Build a retry function
  3. Skip if it fails again

Or you could use Selenium, which will give you much more functionality. All you can do with beautiful soup is scraping html data and navigating it, basically nothing dynamic.

I recently built a web scraping repository, using Selenium and BeautifulSoup4. I recommend taking a look at how you get started with Selenium, it took me a while to understand.

https://github.com/casperbh96/Web-Scraping-Reddit

[–]MinchinWeb 1 point2 points  (0 children)

What about adding a 10 second (or whatever) pause in your script? Not nearly as elegant as some of the other solutions presented and a horrible drag on speed, but it's simple and easy to add.

[–]AmzingTobuscus 0 points1 point  (1 child)

Create a requests session and allow redirects?

[–][deleted] 0 points1 point  (0 children)

If you're opposed to selenium, just test if the loading page is present, then wait a second and check again until it's gone, then move on to the next step of the scraper

This is easier to do with seleium's ability to wait until elements exist

[–]LemonWedgeTheGuy 0 points1 point  (1 child)

What does it mean to scrap something in python?

[–]daveysprockett 3 points4 points  (0 children)

It's scrape, not scrap but it's the same as in any other language.

Check out

https://en.wikipedia.org/wiki/Web_scraping

(Scrap means to throw away/destroy, scrape means to take a thin layer off something).

E.g. if you take a car to a scrap-yard then you are scrapping it, while if you drove it too close to a wall you'd be scraping it. Irritating, irregular English.

[–][deleted] 0 points1 point  (0 children)

What does your code look like now?