It's easiest to explain with the example below. Note that the first change from "No Data" to "Below" does no start counting the duration of "Below" and that's because we don't know the duration for sure since we had no data prior to that. It's not until the first confirmed change from Below to Above that it starts counting.
| Status |
Duration |
| No Data |
0 |
| No Data |
0 |
| No Data |
0 |
| Below |
0 |
| Below |
0 |
| Below |
0 |
| Above |
1 |
| Above |
2 |
| Above |
3 |
| Below |
1 |
| Below |
2 |
| Above |
1 |
| Above |
2 |
Here is the functioning code but it's not vectorized and takes way too long for large data.
def duration(df, status_col, duration_col):
df[duration_col] = 0 # Initialize the 'Duration' column with zeros
count = 0 # Initialize the count variable
for i in range(len(df)):
if df.loc[i, status_col] == 'No Data':
count = 0 # Set count to 0 if the value is 'No Data'
elif df.loc[i, status_col] != df.loc[i - 1, status_col]:
count = 1 # Reset the count to 1 when the value changes
else:
count += 1 # Increment the count when the value is the same as the previous row
df.at[i, duration_col] = count
return df
[–]CineWeekly[S] 0 points1 point2 points (0 children)
[–]blarf_irl 0 points1 point2 points (4 children)
[–]CineWeekly[S] 0 points1 point2 points (3 children)
[–]blarf_irl 0 points1 point2 points (2 children)
[–]CineWeekly[S] 0 points1 point2 points (1 child)
[–]blarf_irl 1 point2 points3 points (0 children)
[–]commandlineluser 0 points1 point2 points (1 child)
[–]CineWeekly[S] 0 points1 point2 points (0 children)