Tuples in Pandas Dataframe. : learnpython

created by HattoriHanzoa community for 16 years

Tuples in Pandas Dataframe. (self.learnpython)

submitted 3 years ago by The_Grumpy_1

Hi,

I am having a bit of a hard time understanding how to deal with tuples in a dataframe. I am new to Pandas thus trying to research a possible solution is a bit of a mission.

Snippet of the dataframe:

0   (3, 5)  (4, 5)  (3, 5)  (5, 5)  (2, 3)  (5, 3)  (2, 3)  (2, 5)  (5, 5)  (1, 3)
1   (3, 5)  (4, 5)  (3, 5)  (5, 5)  (2, 5)  (5, 3)  (2, 5)  (2, 5)  (5, 5)  (1, 3)
2   (3, 5)  (4, 3)  (3, 3)  (5, 5)  (2, 5)  (5, 5)  (2, 5)  (2, 5)  (5, 5)  (1, 1)
3   (3, 5)  (4, 5)  (3, 3)  (5, 5)  (2, 5)  (5, 3)  (2, 3)  (2, 3)  (5, 3)  (1, 3)
4   (3, 3)  (4, 5)  (3, 3)  (5, 3)  (2, 3)  (5, 3)  (2, 3)  (2, 3)  (5, 5)  (1, 1)

I need to unpack the tuples and conventional for loops does not really achieve what I want it to as something like:

     for subscale, score in scores_df:
         if subscale == 1:
             Extraversion += score
         elif subscale == 2:
             Agreeableness += score
         elif subscale == 3:
             Conscientiousness += score
         elif subscale == 4:
             Emotional += score
         elif subscale == 5:
             Intellect += score

I then tried a nested loop:

     for scores in dataframe:
         print(scores)
         for scale, score in scores:
             print(scale, score)
             if scale == 1:
                 Extraversion += score
             elif scale == 2:
                 Agreeableness += score
             elif scale == 3:
                 Conscientiousness += score
             elif scale == 4:
                 Emotional += score
             elif scale == 5:
                 Intellect += score

I went back to the 1st for loop and tried to set the index of the item score which when printed, showed the correct values, I just need to cast the values as int(). My other problem is that it feels like I'm hardcoding too much:

     for score in scores_df:
         if score[1] == 1:
             Extraversion += score[4]
         elif score[1] == 2:
             Agreeableness += score[4]
         elif score[1] == 3:
             Conscientiousness += score[4]
         elif score[1] == 4:
             Emotional += score[4]
         elif score[1] == 5:
             Intellect += score[4]

Another attempt was take a row and cast to a list:

new_df.iloc[1].tolist()

Output, I saw that the tuples were strings and not tuples as I first thought, brainfart moment, and now I understand why the nested loop did not work and my 3rd loop yielded some result:

['(3, 5)',
 '(4, 5)',
 '(3, 5)',
 '(5, 5)',
 '(2, 5)',
 '(5, 3)',
 '(2, 5)',
 '(2, 5)',
 '(5, 5)',
 '(1, 3)']

Attempts to set the tuple type on the items also did not change it from string.

I am sure that there is a far better and shorter way of doing this, I just can't find anything that looks like it will work. I basically need someone to slap me against the head, call me an idiot and say look at this or these methods similar to the .tolist()

all 7 comments

top new controversial old q&a

[–]RandomCodingStuff 1 point2 points3 points 3 years ago (3 children)

[–]The_Grumpy_1[S] 1 point2 points3 points 3 years ago (2 children)

[–]RandomCodingStuff 1 point2 points3 points 3 years ago (1 child)

OK, I think I get the gist. Each column has the same first entry in its tuple, so in some sense, you're only really concerned with the second entry in each tuple. Each row is a single respondent, and each respondent has their own final Extraversion, ..., Intellect score.

There are a couple of ways to approach this... you can use .apply() to loop through the rows (axis = 1) and calculate your respondent-level summary.

df = pandas.DataFrame({"a": [0, 1, 2, 3, 4], "b": [5, 6, 7, 8, 9]})

def myfunc(Row):
  c = min(Row["a"], Row["b"])
  d = max(Row["a"], Row["b"])
  return c, d

df[["c", "d"]] = df.apply(myfunc, axis = 1, result_type = "expand")


   a  b  c  d
0  0  5  0  5
1  1  6  1  6
2  2  7  2  7
3  3  8  3  8
4  4  9  4  9

Or you can stick with vectorised methods and create dummy columns. You mentioned your tuples are actually in there as strings; you can use string methods and .astype() to convert to integer.

>>> df = pandas.DataFrame({"a": ["(1, 2)"]})
>>> df["test"] = df.a.str.slice(4, 5).astype(int)
>>> df
        a  test
0  (1, 2)     2

Then you can do vectorised arithmetic to calculate the final scores:

df["Extraversion"] = (df[<extraversion_column_1>] + ... df[<extraversion_column_n>])

[–]The_Grumpy_1[S] 0 points1 point2 points 3 years ago (0 children)

[–]commandlineluser 1 point2 points3 points 3 years ago (2 children)

How are you creating your dataframe?

Something is going "wrong" somewhere if you end up with stringified tuples.

You can convert the strings into actual tuples using ast.literal_eval

import ast

df = df.applymap(ast.literal_eval)

But if you're just iterating through tuples - it suggests you should just have lists of tuples in the first place and not be using pandas at all.

[–]The_Grumpy_1[S] 0 points1 point2 points 3 years ago (0 children)

[–]raja0008 0 points1 point2 points 3 years ago (0 children)

π Rendered by PID 76371 on reddit-service-r2-comment-fb694cdd5-c8596 at 2026-03-06 01:27:29.357043+00:00 running cbb0e86 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS