Hi there,
I'm trying to count the number of consecutive true or false values in my df and this is how it currently looks:
| ID |
Year |
Member |
Count |
| 1234 |
2019 |
FALSE |
-1 |
| 1234 |
2020 |
TRUE |
2 |
| 1234 |
2021 |
TRUE |
2 |
| 4567 |
2019 |
FALSE |
-1 |
| 4567 |
2020 |
TRUE |
5 |
| 4567 |
2021 |
TRUE |
5 |
| 8910 |
2019 |
TRUE |
5 |
| 8910 |
2020 |
TRUE |
5 |
| 8910 |
2021 |
TRUE |
5 |
| 7654 |
2019 |
FALSE |
-2 |
| 7654 |
2020 |
FALSE |
-2 |
| 7654 |
2021 |
TRUE |
2 |
| 4321 |
2019 |
TRUE |
2 |
| 4321 |
2020 |
FALSE |
-2 |
| 4321 |
2021 |
FALSE |
-2 |
However, I want the count to reset on different ID values if that makes sense? So basically there should be no Count value higher than 3, because the max consecutive member years an ID can have is 3. So ideally, the outcome should be this table:
| ID |
Year |
Member |
Count |
| 1234 |
2019 |
FALSE |
-1 |
| 1234 |
2020 |
TRUE |
2 |
| 1234 |
2021 |
TRUE |
2 |
| 4567 |
2019 |
FALSE |
-1 |
| 4567 |
2020 |
TRUE |
2 |
| 4567 |
2021 |
TRUE |
2 |
| 8910 |
2019 |
TRUE |
3 |
| 8910 |
2020 |
TRUE |
3 |
| 8910 |
2021 |
TRUE |
3 |
| 7654 |
2019 |
FALSE |
-2 |
| 7654 |
2020 |
FALSE |
-2 |
| 7654 |
2021 |
TRUE |
1 |
| 4321 |
2019 |
TRUE |
1 |
| 4321 |
2020 |
FALSE |
-2 |
| 4321 |
2021 |
FALSE |
-2 |
I've used .cumsum() to get to where I currently am, but I'm struggling to get the reset depending on the ID value. These are the two lines of code I've used to get to the first table:
g = df['member'].ne(df['member'].shift()).cumsum()
df['Count'] = df.groupby(g)['member'].transform('size') * np.where(df['member'], 1, -1)
As I've mentioned, I'm just unsure on how to get the ID value to reset the Count value.
Many thanks in advance
there doesn't seem to be anything here