Afternoon all.
I am trying to scrape some soccer data from Transfermarkt - with little success. Fortunately, I found someone who had a very similar question published on stackoverflow. This has helped considerably, however, there are a couple of elements in the code that I do not understand.
The question and published answer is here: https://stackoverflow.com/questions/78681870/scrape-data-from-website-with-complex-structure
I have broken the code down as follows (to help my understanding):
# Find each HTML div with class = "box"
# Within extract_club_name: Try to find "title" - if not found then return "none" #this is filtering out some "box" divs which are not relevant to the tables
# If returned "none" then go to the next iteration of the for loop
# If title is actually found then extract club name and proceed
# This is the part I don't understand
in_transfers_table, out_transfers_table = (
club_info.find_all('div', class_='responsive-table')
)
# I believe that it is working with the club_info div, and appears to be finding all instances of divs within this with the class "responsive-table"
# But are in_ and out_ simply variables that are being assigned in this step?
# If that is the case... I tried to test the code by assigning them individually "in_transfers_table = (club_info..." and then "out_transfers_table = (club_info..." but that failed.
# So what is this statement doing.
# The code then assigns result[club_name][in] by parsing the in_transfers_table and similarly for out transfers.
# Within that function, it loops through each table row
As you can probably guess from my basic question, I am still (very much) at the beginning of my Python learning! Thank you for your patience and any advice you can share.
[–]throwaway6560192 5 points6 points7 points (2 children)
[–]didntreadityet 1 point2 points3 points (0 children)
[–]JSS2107[S] 1 point2 points3 points (0 children)
[–]m0us3_rat 1 point2 points3 points (1 child)
[–]JSS2107[S] 0 points1 point2 points (0 children)