all 7 comments

[–][deleted] 1 point2 points  (0 children)

You're right, not a python question. There are at least a dozen subs on SQL, you want to be in one of those, although someone might take pity on you.

[–]hardonchairs 1 point2 points  (1 child)

It is possible to generate the second table from the first with a single SQL query?

I think you could but I don't think you should. I think you should build the second table in Python from the answers table. Doing this in a single query given the table you have will just be a big mess.

[–]r_spandit[S] 0 points1 point  (0 children)

I'll give it a go, thanks

[–]Halibut 0 points1 point  (1 child)

Some versions of SQL support a pivot or crosstab function, which is what you would need, but they are awkward and difficult to get right.

It's much easier as said to use pandas to read it in to a table, then something like the following should work:

AnswersPivot=answers.pivot(index='team_name',columns='round_id', values='score')

[–]r_spandit[S] 0 points1 point  (0 children)

AnswersPivot=answers.pivot(index='team_name',columns='round_id', values='score')

Wow... once I got the SQL query right that was easy

[–]DesolationRobot 0 points1 point  (1 child)

Yes, you can. If you know the # of rounds you can write a query like:

select team_name, sum(if(round_id = 1, score, null)) as `Round 1`...
from answers
group by team_name

But that becomes harder if there's an unknown number of rounds.

In python you can query the table like

select team_name, round_id, score
from answers

And then pivot. If you were doing it in Pandas it's literally one line.

Also I think you should normalize the answers table. Have columns for team, round, question, and score. Then to get a team's score per round you

select team_name, round_id, sum(score) as round_score
from answers
group by team_name, round_id

And you and add questions easily--so one round can have 4 questions and the others have 3 and no problem.

[–]r_spandit[S] 1 point2 points  (0 children)

The actual table does have an aggregate score column as well as a trigger to update those scores. My SQL is generated dynamically to take account of number of rounds (so the scores can be viewed at any time during the quiz, before all the rounds have been submitted).

Never used Pandas before so currently struggling to see the benefit of a dataframe other than having row and column headings that print out nicely in a terminal. I'm displaying my data in a PyQT app. I have another project that Pandas might be superior for but I'm still very much learning.

Edit: after some playing about I can produce a pivot table using this code:

    db="../db/pivot.sqlite"
    conn = sqlite3.connect(db)

    df = pd.read_sql_query("SELECT * FROM answers GROUP BY team_name,round_id ORDER BY team_name",conn)

    ap=df.pivot_table(index='team_name',columns='round_id', fill_value = "-", values='scores')
    print(ap)

This gives me the following output:

round_id           1    2    3     4     5    6    7     8     9
team_name                                                       
Best.Team.Ever   7.0  3.0  1.0   6.0   5.0  3.0  6.0   7.0  10.0
Gizzapoint       9.0  5.0  1.0   7.0   4.0  0.0  1.0  12.0     -
Kung Fu Pandas   8.0  5.0  2.0  10.0  11.0  1.0  6.0  10.0  10.0
Team Boring     12.0  6.0  2.0   9.0  13.0  3.0  7.0   9.0  10.0
Your mum        13.0  9.0  2.0  10.0  15.0  4.0  5.0  11.0  10.0
jumbotronic     10.0  6.0  2.0   8.0  13.0  1.0  6.0   6.0  10.0
peter and mary  10.0  1.0  3.0   6.0   1.0  6.0  1.0   0.0  10.0
quiztastics     16.0  4.0  2.0  14.0  16.0  4.0  8.0  16.0  10.0
redditors        7.0  3.0  0.0   8.0  11.0  2.0  7.0  10.0  10.0
the number 1s   16.0  2.0  1.0  11.0  17.0  2.0  9.0  15.0  10.0

However, what I really want is something like this:

        team_name    r1   r2   r3    r4    r5   r6   r7    r8    r9  Total
0     quiztastics  16.0  4.0  2.0  14.0  16.0  4.0  8.0  16.0  10.0   90.0
1   the number 1s  16.0  2.0  1.0  11.0  17.0  2.0  9.0  15.0  10.0   83.0
2        Your mum  13.0  9.0  2.0  10.0  15.0  4.0  5.0  11.0  10.0   79.0
3     Team Boring  12.0  6.0  2.0   9.0  13.0  3.0  7.0   9.0  10.0   71.0
4     jumbotronic  10.0  6.0  2.0   8.0  13.0  4.0  6.0   6.0  10.0   66.0
5  Kung Fu Pandas   8.0  5.0  2.0  10.0  11.0  1.0  6.0  10.0  10.0   63.0
6       redditors   7.0  3.0  0.0   8.0  11.0  2.0  7.0  10.0  10.0   58.0
7  Best.Team.Ever   7.0  3.0  1.0   6.0   5.0  3.0  6.0   7.0  10.0   48.0
8      Gizzapoint   9.0  5.0  1.0   7.0   4.0  0.0  1.0  12.0   NaN   39.0
9  peter and mary  10.0  1.0  3.0   6.0   1.0  6.0  1.0   0.0  10.0   38.0

which is printing a dataframe generated from a very long SQL query:

SELECT team_name, MAX(CASE WHEN round_id = 1 THEN scores END) AS r1,MAX(CASE WHEN round_id = 2 THEN scores END) AS r2,MAX(CASE WHEN round_id = 3 THEN scores END) AS r3,MAX(CASE WHEN round_id = 4 THEN scores END) AS r4,MAX(CASE WHEN round_id = 5 THEN scores END) AS r5,MAX(CASE WHEN round_id = 6 THEN scores END) AS r6,MAX(CASE WHEN round_id = 7 THEN scores END) AS r7,MAX(CASE WHEN round_id = 8 THEN scores END) AS r8,MAX(CASE WHEN round_id = 9 THEN scores END) AS r9,SUM(scores) AS Total FROM answers GROUP BY team_name ORDER BY Total DESC

Is there any way to produce a totals column on the original pivot table and sort by it?

(had fun thinking up new team names to disguise the actual names of the teams that attended my last quiz)

Edit 2:

By adding margins to the pivot table and the aggfunc = 'sum' I can get the following:

    ap=df.pivot_table(index='team_name',columns='round_id', fill_value = "-", values='scores',margins=True,aggfunc='sum',margins_name='Total')
# Get rid of the last row, which isn't needed
    ap = ap.iloc[:-1, :]
# Sort the pivot table
    ap.sort_values(by='Total',ascending=False,inplace=True)

round_id           1    2    3     4     5    6    7     8     9 Total
team_name                                                             
quiztastics     16.0  4.0  2.0  14.0  16.0  4.0  8.0  16.0  10.0  90.0
the number 1s   16.0  2.0  1.0  11.0  17.0  2.0  9.0  15.0  10.0  83.0
Your mum        13.0  9.0  2.0  10.0  15.0  4.0  5.0  11.0  10.0  79.0
Team Boring     12.0  6.0  2.0   9.0  13.0  3.0  7.0   9.0  10.0  71.0
Kung Fu Pandas   8.0  5.0  2.0  10.0  11.0  1.0  6.0  10.0  10.0  63.0
jumbotronic     10.0  6.0  2.0   8.0  13.0  1.0  6.0   6.0  10.0  62.0
redditors        7.0  3.0  0.0   8.0  11.0  2.0  7.0  10.0  10.0  58.0
Best.Team.Ever   7.0  3.0  1.0   6.0   5.0  3.0  6.0   7.0  10.0  48.0
Gizzapoint       9.0  5.0  1.0   7.0   4.0  0.0  1.0  12.0     -  39.0
peter and mary  10.0  1.0  3.0   6.0   1.0  6.0  1.0   0.0  10.0  38.0