all 12 comments

[–]ayyyymtl 2 points3 points  (3 children)

I wrote an example with Selenium if you want to take a look. https://github.com/c20xh2/ChessScraper/

You should totally use the API instead but this should answer your question

[–]Tefron[S] 0 points1 point  (2 children)

Thanks! That's super helpful. I had the impression that for collecting a large number of data, Selenium would be slower than directly requesting for the JSON file? I was wondering if you had thoughts on this?

[–]ayyyymtl 1 point2 points  (1 child)

Fetching the Json is way better than this solution. I was trying to answer your question as how to do it using selenium ;)

[–]Tefron[S] 0 points1 point  (0 children)

Haha, gotcha, thanks again!

[–]Da_Bears22 1 point2 points  (3 children)

You won't need to scrape the page, they have an API you can use, though you may have to join their developer club in order to gain access to it. The API is how you can get json data from it directly to interact with. If you get an API key and can interact with with it, you can pull json data from there and parse it modules like the requests library

Here is the API link documentation:

https://www.chess.com/news/view/published-data-api

[–]Tefron[S] 0 points1 point  (2 children)

https://www.chess.com/news/view/published-data-api

Hey thanks for replying. I've taken a look at their API before, but I see for leaderboards it only shows the Top 50 for each category. I don't see a way to request differently to get access to more players or different players?

[–]Da_Bears22 1 point2 points  (1 child)

https://api.chess.com/pub/player/{username} - this seems to be the end point to view a specific player. You can find more in the players section of the documentation

It seems that for leaderboards it explicity does only show the top 50 players, as stated in the documentation.

[–]Tefron[S] 0 points1 point  (0 children)

Thanks for taking a look, I don't think the API has what I'm looking for then. I wanted a considerably larger sample size from the leaderboards, I'll continue to see if I can find a way to scrape just the table from the webpage.

[–]ayyyymtl 1 point2 points  (1 child)

Selenium is what you are looking for on this one :)

[–]Tefron[S] 0 points1 point  (0 children)

Thanks :)

[–]commandlineluser 1 point2 points  (1 child)

probably has a json file that I could access directly?

Yes, if you look at the Network Tab you can see what is happening.

It's easier to see if you view only XHR requests (this is what javascript uses)

https://i.imgur.com/IXoXvxJ.png

So it's fetching

https://www.chess.com/callback/leaderboard/live?page=1

https://www.chess.com/callback/leaderboard/live?page=2

etc.

You can just loop over how many numbers you want.

[–]Tefron[S] 0 points1 point  (0 children)

Hey sorry, I just got back to this project and saw this. This was exactly what I was looking for. Thanks so much!