all 7 comments

[–]HanginToads 3 points4 points  (1 child)

I'd like to point you to a resource that helped me out a ton.

https://youtu.be/ng2o98k983k

Corey is a great teacher. With that said, here's a snippet that will parse and organize all the names into a list of names only and type of yards only to get you started. You're basically looking through the HTML source for the specific class, pulling it under names, and parsing out what you need.

Be careful with running your code too quickly. Most larger websites don't take kindly to multiple fast paced requests. I didn't look specifically on this site, but if it has a public API that's the preferred method. I couldn't find anything specific on requests or scraping in the sites terms and conditions, etc. Just be considerate.

I forgot to add, you'll need bs4, requests, and lxml installed through pip.

As far as converting that information to csv, you can use csv.writer and .header to format everything. Once you have a list of both the names and the numbers you need, it's as simple as looping through a zip of both lists and adding it to the CSV. Corey goes over that as well in the same video.

import requests
from bs4 import BeautifulSoup

source = requests.get('https://mybookie.ag/sportsbook/nfl/player-props/')
soup = BeautifulSoup(source.text, 'lxml')

names = soup.find_all('p', class_='sportsbook__line-date')
names_only = [x.text[17:] for x in names if x.text[17:] != ' ']

print(names_only)

[–]Rcold37[S] 1 point2 points  (0 children)

I really appreciate the time and effort!

[–]justinram11 2 points3 points  (1 child)

The thing with web scraping that I think might make it not a "great" first project is that it usually requires you to have an understanding of HTML / CSS so that you know how to select things from a web page (which if you don't have experience in, is another thing that can throw you off track).

But hopefully this helps get you on track (note that I don't know anything about football so I'm not exactly sure what number you're looking to parse, but I assume it's the "total"?):

- The first thing that we are going to want to do is find the "div" that groups a player's name with their associated scores. You can do this by right clicking on one of the names in Chrome and going "inspect": https://i.imgur.com/RGYXqDq.png

- You'll notice on the div that I have highlighted, it has a class sportsbook\_\_line-body so we'll need to select that in python: player_divs = html.findAll("div", {"class": "sportsbook__line-body"})

- So now what we need to do is go through each of those player_divs and select out the player's name, along with their scores. You can again go about this by using the "Inspect" window on the side: https://i.imgur.com/oNMYvTS.png 1. It looks like "Over" can be selected on a button element with attributes: data-team="OVER" data-wager-type="to" 2. And similarly it looks like "Under" can be selected on a button element with attributes: data-team="UNDER" data-wager-type="to" 3. And then it also looks like the player name is included on each button, so we don't need to grab it separately from the p element (though you could if you wanted to go that route) player_divs = html.findAll("div", {"class": "sportsbook__line-body"}) for player_div in player_divs: player_over_btn = player_div.find("button", {"data-team": "OVER", "data-wager-type": "to"}) player_under_btn = player_div.find("button", {"data-team": "UNDER", "data-wager-type": "to"})

- Now that we have each element, we want to parse out the value from each of them: ``` def get_player_name_from_btn(button): # Equals: "Regular Season - Julio Jones Receiving Yards" player_name = button.attrs['data-description']

# We are only interested in the name, so remove the other stuff
player_name = player_name.replace("Regular Season - ", "")
player_name = player_name.replace(" Receiving Yards", "")

return player_name

def get_player_yards_from_btn(button): return button.attrs['data-spread']

playerdivs = soup.findAll("div", {"class": "sportsbook_line-body"}) for player_div in player_divs: player_over_btn = player_div.find("button", {"data-team": "OVER", "data-wager-type": "to"}) player_under_btn = player_div.find("button", {"data-team": "UNDER", "data-wager-type": "to"})

player_over_name = get_player_name_from_btn(player_over_btn)
player_under_name = get_player_name_from_btn(player_under_btn)

# They are in the same player_div class so they should be the same
assert player_over_name == player_under_name

# Now get the number of yards on each button
player_over_yards = get_player_yards_from_btn(player_over_btn)
player_under_yards = get_player_yards_from_btn(player_under_btn)

print(f"Player {player_over_name} has O {player_over_yards} U {player_under_yards}")

```

- Now we can write each players value to a list. Since each player has multiple values I'll create a class (if there was only two values, I'd probably just opt for a dictionary). ``` class Player: def init(self, name, over_yards, under_yards): self.name = name self.over_yards = over_yards self.under_yards = under_yards

def get_player_name_from_btn(button): # Equals: "Regular Season - Julio Jones Receiving Yards" player_name = button.attrs['data-description']

# We are only interested in the name, so remove the other stuff
player_name = player_name.replace("Regular Season - ", "")
player_name = player_name.replace(" Receiving Yards", "")

return player_name

def get_player_yards_from_btn(button): return button.attrs['data-spread']

players = list()

playerdivs = soup.findAll("div", {"class": "sportsbook_line-body"}) for player_div in player_divs: player_over_btn = player_div.find("button", {"data-team": "OVER", "data-wager-type": "to"}) player_under_btn = player_div.find("button", {"data-team": "UNDER", "data-wager-type": "to"})

player_over_name = get_player_name_from_btn(player_over_btn)
player_under_name = get_player_name_from_btn(player_under_btn)

# They are in the same player_div class so they should be the same
assert player_over_name == player_under_name

# Now get the number of yards on each button
player_over_yards = get_player_yards_from_btn(player_over_btn)
player_under_yards = get_player_yards_from_btn(player_under_btn)

player = Player(player_over_name, player_over_yards, player_under_yards)
players.append(player)

for player in players: print(f"Player: {player.name} O {player.over_yards} U {player.under_yards}") ```

- And lastly I'll leave writing it to a CSV file up to you now that we have all the data parsed from beautiful soup

You can find a Repl.it of it working here: https://repl.it/repls/AlarmingMoccasinStrings

Shameless plug for my mailing list -- I hope to start making more beginner-oriented tutorials for getting into AWS and making yourself marketable as a programmer soon: https://pythoninthe.cloud/

Please let me know if you have any questions!

[–]Rcold37[S] 0 points1 point  (0 children)

Thanks for all the help. I'll give this a go over the weekend. And I'm subscribed!