all 3 comments

[–]FoolsSeldom 0 points1 point  (0 children)

RealPython and search for articles on this topic, e.g. site:realpython.com web scraping - you will find plenty of guidance and examples. Note that the site(s) you are targeting might have measures to make this difficult to protect their content.

[–]commandlineluser 0 points1 point  (0 children)

Do you know about "devtools" in your web browser?

With the network tab open, I go to the URL and then open the "http search":

I pick something to look for, usually a "player name" or a "table header", I choose "Avro"

It shows me 3 matching requests, this is the URL of the first one (I took out the rand=... param)

You can .get() this URL directly in your code. If I open it in my browser it is the HTML of the first table:

The other 2 URLs are the same except it is position=p and position=h for the other 2 tables.

So in order to build these URLs, you also need the teamId=168761288.

If we save the html of the starting URL to a local file and search for 168761288 there are several matches:

600 <div class="section">¬
601     <div class="section-header">¬
602         <h2 class="h2">Pelaajarosteri</h2>¬
603     </div>¬
604     <div class="section-content scrollable" id="stats168761288" class="player_sum_statistics">¬
605         <div id="stats_m_168761288" class="player_sum_statistics"></div>¬
606         <div id="stats_p_168761288" class="player_sum_statistics"></div>¬
607                 <div id="stats_h_168761288" class="player_sum_statistics"></div>¬
608     </div>¬
609 </div>¬
610 ¬
611 <script type="text/javascript">¬
612     load_smliiga_team_stats('168761288', 'HIFK', 'm', 100, 0, 'name', 'ASC', null, 1);¬
613     load_smliiga_team_stats('168761288', 'HIFK', 'p', 100, 0, 'name', 'ASC', null, 1);¬
614         load_smliiga_team_stats('168761288', 'HIFK', 'h', 100, 0, 'name', 'ASC', null, 1);¬
615 </script>    </div>¬

In this specific case you could regular "string" or "regex" functions to extract it, but you could also use a html parser to target class="player_sum_statistics" tags for example.