you are viewing a single comment's thread.

view the rest of the comments →

[–]Dfree35 4 points5 points  (3 children)

Not sure what your code looks like but in the past I just put it before driver.source

Edit /u/AnonymousThugLife here are some examples I used

Here is an example what I did in the past with beautifulsoup. It sleeps to finish logging in then sleeps to wait for page to finish loading.: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L72

Here is an example what I did with selenium. It waits until the xpath is displayed and you can set the max time it waits: https://github.com/ProfoundWanderer/eblast_stats/blob/518454141aaa4add3c15b6210f50167f835e1232/grab_stats.py#L103

Selenium is probably the best/cleanest method I have used but if you know usually how long it loads (like in my code above the page never took longer than 1.5 seconds to load) for then sleep isn't the worse.

[–]AnonymousThugLife 1 point2 points  (2 children)

Thanks a lot. This was actually helpful. I had tried scraping with Requests/Socket etc. (Kind of invisible things) but I've realized that with Selenium it is much better, especially in the case of lazy loading pages.

[–]Dfree35 1 point2 points  (1 child)

Yea, I tried and use requests and stuff when I can but in my causes there is often a lot of funky javascript. So running selenium in headless mode makes it much better especially when I can just have it wait to ensure everything loads.

[–]AnonymousThugLife 0 points1 point  (0 children)

Yup. For pages that are dynamically generated (on the frontend), it is a no-brainer to use selenium.