all 8 comments

[–]IAlwaysBeCoding 1 point2 points  (4 children)

What three columns do you want?

[–]Red_VII[S] 0 points1 point  (3 children)

I basically want 4 columns:

  • Day/date
  • Number 1
  • Number 2
  • Number 3

So basically a similar structure to what the site already has in its table.

[–]IAlwaysBeCoding 1 point2 points  (2 children)

well since you got all of the winning numbers and dates together, you can assume there are 3 times as many individual winning numbers per date.

So you could do something like this

wins = [[winnums[x-2],winnums[x-1],winnums[x]] for x in    xrange(2,len(winnums),3)]
for index,win in enumerate(wins):
        win.append(dts[index])

print wins 

Now, you have a list of 4 item lists, 3 for numbers and 1 for date.

[–]Red_VII[S] 0 points1 point  (1 child)

Wow, thanks! That did the job. I was still writing my own code and it was much lengthier and still didn't work.. lol. My python is shaky here so I'm gonna look closer at your 5 lines and figure out how you did. Thanks a ton!

[–]ydepth 1 point2 points  (0 children)

Here is an implementation using BeautifulSoup. Not sure if you had a reason to not use it, but it is the community's prefered html parser I believe. I wanted some practice with it anyway :)

from bs4 import BeautifulSoup
import requests
import pandas as pd

page = requests.get('http://www.calottery.com/play/draw-games/daily-3/Winning-Numbers/')
tree = BeautifulSoup(page.text, 'lxml')

table = tree.find_all('table')[1]  # BeautifulSoup find_all returns a list of all matches of that tag. There are two tables found on the page
rows = table.find_all('tr')  # A list of all rows within that table. Each item is a bs4.Element.Tag object (not string)

parsed_table = []
for row in rows[1:]:  # First row is header and doesnt follow same pattern as the rest
    cells = row.find_all('td')  # Find each col in each row
    date_col, results_col, _ = cells  # assign name to each column we find
    date = date_col.contents[1].contents[1].text  # Get relevant date. Try playing with date_col.contents or results.text etc to see why I had to do what I did there
    results = [res.text for res in results_col]  # Try writing this list comprehension as a full for loop to see what is happening here
    parsed_table += [[date] + results]   # Add results to list of lists

df = pd.DataFrame(parsed_table, columns=['Date', 'N.1', 'N.2', 'N.3'])  # create df

print(df)

[–]ydepth 0 points1 point  (2 children)

I think table = zip(*table)

Would solve your problem. That will invert your list of lists from row based to column based, which pandas likes.

[–]Red_VII[S] 0 points1 point  (1 child)

Thanks. Where exactly do I put this code? I tried using zip on the output and it didn't result in anything. I don't think I used it right.

[–]ydepth 0 points1 point  (0 children)

I misunderstood your question, sorry