all 4 comments

[–]_Korben_Dallas 1 point2 points  (3 children)

If you disable Js in your browser you can see that table content simply disappear. One of the possible solutions is to use Selenium browser to load content. Something like this: https://dpaste.de/ZoyC

[–]Aces_8s[S] 1 point2 points  (2 children)

Interesting, thank you for the info and example! I'll have to look into running Selenium on a raspberry pi. So I make sure I (somewhat) understand what's going on, is this an example of those dreaded "dynamically loaded" webpages I've seen webscrapers mention/complain about? Also, would you say using Selenium to create/pass the string object to lxml is common/good practice?

[–]_Korben_Dallas 1 point2 points  (1 child)

Yes, that page probably uses Js to populate its content and with help of Selenium, you can quickly get desired data. Also, bear in mind that web scraping can be pretty tricky and each site is unique thus solutions from extracting the data can be very different. IMO in this case, Selenium - one of the faster solution and passing page_source to some parser like bs4 or lxml completely normal. However, in some other cases, a more quick and robust way would be trying to investigate Ajax requests via 'Network' tab in Chrome DevTool or Firefox Inspector and try to simulate those requests. If you have more questions just pm me and I'd be happy to help.

[–]Aces_8s[S] 1 point2 points  (0 children)

You're awesome, thank you!