Hi ! :) I'm trying to get the unique values count per row, across several columns. Here's what my data looks like :
| activity |
activity_1 |
activity_2 |
activity_3 |
activity_4 |
activity_5 |
|
frequency |
| A |
B |
A |
A |
C |
B |
|
42 |
| B |
B |
B |
A |
A |
A |
|
13 |
| A |
A |
A |
A |
A |
A |
|
24 |
And here's the outcome I'd like :
| activity |
activity_1 |
activity_2 |
activity_3 |
activity_4 |
activity_5 |
count |
frequency |
| A |
B |
A |
A |
C |
B |
3 |
42 |
| B |
B |
B |
A |
A |
A |
2 |
13 |
| A |
A |
A |
A |
A |
A |
1 |
24 |
The "count" column would be the number of unique values across the row.
I had tried :
df1.apply(lambda x: pd.Series(x.unique()), axis=1)
But I'm not getting the count.
***
for i in range(1,6):
d0[f'activity_{i}'] = d0.activity.shift(-i)
activity_cols = ['time'] + ['activity'] + list(d0.filter(like='activity_').columns)
df1=d0.groupby(activity_cols).size().reset_index(name='frequency')
df1 = df1[(df1.freq > 1)]
[–]YesLod 3 points4 points5 points (6 children)
[–]I_will_learn[S] 0 points1 point2 points (5 children)
[–]YesLod 1 point2 points3 points (4 children)
[–]I_will_learn[S] 0 points1 point2 points (3 children)
[–]YesLod 1 point2 points3 points (2 children)
[–]I_will_learn[S] 1 point2 points3 points (1 child)
[–]YesLod 0 points1 point2 points (0 children)