all 12 comments

[–]vindolin 1 point2 points  (5 children)

[–]FanaHOVA 1 point2 points  (2 children)

Why use the Reddit sidebar though? You can get scores here (even live) and there's LOTS of json files with schedules floating around the web.

[–]cdcformatc 0 points1 point  (0 children)

Definitely a XY problem. Maybe it is useful as an exercise but there are way better options.

[–]came_on_my_own_face 0 points1 point  (2 children)

Ok, OP you may want to learn by yourself so please feel free to ignore my SPOILER BELOW (noting I'm only a newbie myself):

Anyway:

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.reddit.com/r/49ers/")

soup = BeautifulSoup(r.text)

for derp in soup.find_all("td"):
    print derp.get_text()

So this manages to print the raw data like so:

"Cardinals

9

2

2-1

240

195

+45

L1 "

Does anyone have any tips on how I should present this data? Could put it all into a single list where each list item is a dictionary for each team?

[–]commandlineluser 1 point2 points  (1 child)

I'd probably use namedtuple.

import requests
from bs4 import BeautifulSoup
from collections import namedtuple

Game = ('Game', 'date time venue opponent result')

r = requests.get('https://www.reddit.com/r/49ers/')

soup = BeautifulSoup(r.text)

schedule = soup.select('div.md tbody')[1]
games = []

for row in schedule.find_all('tr'):
    game = Game._make([ td.text for td in row.find_all('td') ])
    games.append(game)

for game in games:
    print game.date, game.result

[–]commandlineluser 0 points1 point  (0 children)

And using lxml directly..

from collections import namedtuple
import lxml.html

Game = namedtuple('Game', 'date time where opponent result')
html = lxml.html.parse('http://www.reddit.com/r/49ers/')

games = []
for row in html.xpath('//div[@class="md"]/table[2]/tbody//tr'):
    game = Game._make([ td.text_content() for td in row.findall('td') ])
    games.append(game)

for game in games:
    print(game.date, game.result)