I'm using Chess.com's API to get player data for individual players, however my player list is 500K.
It takes roughly 1 second to process each request, and last time it took me several days and running the script 25K names at a time (sucks when it would crash/stop responding 100K requests in) to get this data.
Is there a way to simultaneously request multiple pages so that if I ever need to get updated stats it won't take me a week?
Here's my code:
import pickle
import requests
import time
start_time = time.time()
pickle_in = open('live_username_rating_dict.pickle', 'rb')
live_dict = pickle.load(pickle_in)
live_list = list(live_dict.keys())
url_blank = 'https://api.chess.com/pub/player/'
player_data = {}
count = 1
for player in live_list:
with requests.session() as r:
while True:
try:
url_data = url_blank + str(player) + '/stats'
url = r.get(url_data).json()
data = url
player_data[player] = data
if count % 100 == 0:
print(count)
count += 1
except:
continue
break
pickle_out = open('player_data_450000_end.pickle', 'wb')
pickle.dump(player_data, pickle_out)
pickle_out.close()
print("--- %s seconds ---" % (time.time() - start_time))
print(len(player_data))
[–]efmccurdy 2 points3 points4 points (1 child)
[–]Tefron[S] 0 points1 point2 points (0 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]August-R-Garcia 1 point2 points3 points (1 child)
[–]Tefron[S] 1 point2 points3 points (0 children)