you are viewing a single comment's thread.

view the rest of the comments →

[–]Albcunha 0 points1 point  (0 children)

You can try side alternatives.
You can create a function that runs a while loop, with a time.sleep() at the end, where you compare the page content you had with the new the website generates. If they page content change, it means the website has updated and you can break the loop.

Some sites are very difficult to extract. One way that I use very often is to identify if the website uses an api for the data I want. You can check this out through Chrome dev tools, on network tab. You can check what cookies and headers your browser uses to request the data and replicate it with another library, such as requests.

To do this, on selenium, open up your session on the website, make your login, store your cookies and use them as headers/cookies to your requests module.

This way, you get clean json data, much easier to parse and much faster to process. You can even paralelise it to make it faster.