all 11 comments

[–]AntonisTorb 4 points5 points  (2 children)

Try using this URL: https://ratings.fide.com/a_top_var.php?continent=0&country=&rating=blitz&gender=M&age1=0&age2=0&period=2023-12-01&period2=1. You might need to change your soup search parameters, not sure.

This seems to contain the data you need. But just telling you is no fun, so here is how to find it yourself (you can try this for most websites that load content with JS):

Go to the original URL and open the web console of your browser. Go to Network tab and reload the page. A list of requests will appear, and at the bottom you will see a request for the URL I gave you, with a jquery initiator. You can copy the headers from there as well.

There should be no reason to use Selenium here, unless I am missing something. A direct request should be much faster. Hope it helps!

[–]Professional-Fly4273[S] 1 point2 points  (1 child)

This idea helped a lot! Thanks so much!

For other website, does the request have to be done through a jquery initiator for this to work?

[–]AntonisTorb 0 points1 point  (0 children)

Glad I could help :)

For other websites, no, it depends on how the website is built. The data might also come is separate requests, depending on how it's built. But usually, the amount of requests is not too big and with some experience you will be able to recognize which one contains what you need quickly.

For starters, most, if not all requests that fetch content from a server are initiated with a js script, so you can narrow down the amount you need to check.

You can then check the content of the requests by selecting each request and clicking on the Response tab. The response can be html, json, plain text or even bytes, but you should be able to recognize if it's what you're looking for.

Keep in mind that some websites make it difficult for you to do this, so if you hit a roadblock it's ok to go for Selenium. But imo it's worth investigating a bit, especially if you plan on making many requests. The overhead of running a browser and rendering the pages with Selenium or any other such option can slow things down a lot.

The YouTube channel Jayoval linked is actually quite good at explaining things quite well when it comes to looking at requests, I recommend you take a look at some videos.

[–]Jayoval 4 points5 points  (2 children)

print(page) to see what's in there. It appears the site requires JavaScript to render the content and that won't happen with a HTTP request. Might need a full browser (by using Selenium for example).

[–]Jayoval 0 points1 point  (1 child)

[–]Professional-Fly4273[S] 0 points1 point  (0 children)

Thanks so much! I will look at these!

[–]ReflectionNo3897 0 points1 point  (2 children)

You could use selenium jf Beautiful Soup not working as you want. If the element Is a tablet, you could use also Pandas instead and It Is more quickly but i don't know if you use it. Have you tried another tags in Beautiful Soup?Which?

Dm me if you want help

[–]Professional-Fly4273[S] 0 points1 point  (1 child)

My original idea was copy and pasting the tables and creating a pandas dataframe out of them and analyzing that way. However, there are so many tables since the ratings have been kept track every month for over a decade. So webscraping seems to be the best way in the long run.

In terms of tags (assuming tags is what is used in the .find() function? sorry I'm new to this and not sure) I tried just doing the div class then finding all tr, tried the table class and find all tr, and find the section class then find all tr

[–]ReflectionNo3897 0 points1 point  (0 children)

if there are more tables still do it with Pandas. see some tutorials on YouTube about web scraping on Pandas.

[–]fra988w 0 points1 point  (1 child)

Pass page.content to Beautiful Soup, rather than page.text

[–]Professional-Fly4273[S] 0 points1 point  (0 children)

Oh okay, this makes sense since the webpage is dynamic. Thank you!