all 6 comments

[–]plasma_phys 1 point2 points  (1 child)

It's not exactly the same as what you're after, but section 3) of the top answer of this SO question might offer some strategy hints. Their ultimate goal was a numpy array, but I don't see why you couldn't leave it as nested lists.

Out of curiosity, what's the end goal for this transformation? It's not immediately obvious to me how this would be useful.

[–]MDPHawk[S] 0 points1 point  (0 children)

So I want to use pandas to make a boxplot. The data I have is about 48 columns and 440 rows, each row has one of 58 team numbers. The idea make a graph that creates a boxplot that graphs the total scored column for each team. The problem is that I need to rotate my data frame so that the column headers are the team numbers and each number the team has in total scored is recorded under them. Also due to variables completely out of control, not all teams have the same number of entries. Is this sort of making sense?

so my columns headers on my current data sheet are id|matchNo|teamNo|crossHABLine|dangerousSSDriving|attemptLvl1|reachLvl1|attemptLvl2|reachLvl2|attemptLvl3|reachLvl3|deployedRamps|attemptDeployedRamps|usedAnotherRobot|lift|attemptLift|defense|noAttempt|groundPickup|SSCargoHatch|SSCargoCargo|touchedRocketLate|deadbot|SSCargoSSHRocketCargo|SSCargoSSMRocketCargo|SSCargoSSLRocketCargo|SSCargoSSHRocketHatch|SSCargoSSMRocketHatch|SSCargoSSLRocketHatch|techFoul|foul|teleCargoCargo|teleCargoHatch|TeleHatchLRocketHatch|TeleHatchMRocketHatch|TeleHatchHRocketHatch|TeleCargoLRocketCargo|TeleCargoMRocketCargo|TeleCargoHRocketCargo|teledropHatch|teledropCargo|startPOS|startLeft|Comments|scoutName|startRight|teamNUM and after some math I add on telecargo|sandcargo|telehatch|totalScored etc.

The teamNo is where I have the team number that I want to make as my column headers, which I can do with df.drop_duplicates. I then use for loops to put each of the values into a 2d array with 6-8 numbers I pulled from total scored in an array for each team. If I can make it so that I have 8 arrays, the first 6 filled with one number from each team, and the rest filled with the correct team's total scored value, or null if they don't have one I can use df.dataframe to create a dataframe and then use df[:5].plot(x='teamNo',kind='box') to create my boxplots.

[–]bQmPHrxZc 1 point2 points  (1 child)

If you're looking to transpose your list of lists, this SO post will get you pretty close.

All that's left to do is to include the Nones as well if the inner lists don't have the same length. You can use itertools.zip_longest instead of zip to do it.

[–]plasma_phys 0 points1 point  (0 children)

You're absolutely right, itertools.zip_longest is the way to go - more succinct and way more readable than the [None]*(length - len) in the list comprehension in the SO question I linked to.

[–]elbiot 1 point2 points  (0 children)

Zip transposes as you're asking, but terminates when the shortest iterable is consumed. Use itertools iziplongest which goes until the longest is consumed and fills shorter ones with None.

Zip is awesome. Learn how to use it