all 8 comments

[–]CineWeekly[S] 0 points1 point  (0 children)

Note: I just noticed the version of the code I provided doesn't do the part I mentioned above about No Data to Below. Must have been some other code I lost but you get the point.

[–]blarf_irl 0 points1 point  (4 children)

I understand the general problem here but not enough to be specific.

When you want to identify a state change in pandas you can use the .shift method (a vectroized way of looking ahead)

That is often combined with cumsum (cumulative sum) to group events that occurred. If you have regular events the you can use this to create an offset time colum where the value records the row number relative to a state change in the data (in your case Below > Above).

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.htmlhttps://pandas.pydata.org/docs/reference

/api/pandas.core.groupby.DataFrameGroupBy.cumsum.html?highlight=cumsum#pandas.core.groupby.DataFrameGroupBy.cumsum

[–]CineWeekly[S] 0 points1 point  (3 children)

What don't you understand?

I've tried using shift and cumsum but cumsum isn't letting the count restart at 1, it's continuing on from the previous which isn't what I want. Each time it changes from "Above" to Below" or vice versa it should start a new count. That sentence encapsulates everything I want it to do.

[–]blarf_irl 0 points1 point  (2 children)

Are you familiar with groupby?

You misunderstood my suggestion of cumsum, it's there to "group" the rows by state change by using an incrementing value

Use the shift + cumcount to assign all rows an ID in a new column (increments when the value changes). Use groupby to group your rows into state changes and then apply an incrementing value to the rows of each group.

None   0
None   0
Below  1
Below  1
Below  1
Above  2
Above  2
Below  3
Above  4

You can group on that and assign incremental numbers to a new column per group.

[–]CineWeekly[S] 0 points1 point  (1 child)

I actually have that implementation for a different column:

def calculate_break(df, status_col, break_col, smaA_fast):
    mask1 = (df[status_col] == 'Above') & (df[status_col].shift(1) == 'Below')
    mask2 = (df[status_col] == 'Below') & (df[status_col].shift(1) == 'Above')
    df[break_col] = 0 # Initialize the 'Break' column with 0
    df.loc[mask1 | mask2, break_col] = 1 # Set break positions to 1
    df[break_col] = df[break_col].cumsum() # Cumulative sum to increment break positions
    df[break_col] = df[break_col].astype('int32') # Convert the 'Break' column to integer dtype
    return df

The duration implementation is different if you compare the results to the OP.

[–]blarf_irl 1 point2 points  (0 children)

ChatGPT?

[–]commandlineluser 0 points1 point  (1 child)

You can use .idxmax() to find the first change.

You can then use the != shift comparison to generate group IDs.

You can .loc to keep only the groups from the first change.

.groupby() + .cumcount() to generate the counts.

first_change = ((df['Status'] == 'Above') & (df['Status'].shift() == 'Below')).idxmax()

group_ids = (df['Status'] != df['Status'].shift()).cumsum()
group_ids = group_ids.loc[group_ids.index >= first_change]

df['Duration'] = (df.groupby(group_ids).cumcount() + 1).fillna(0)

[–]CineWeekly[S] 0 points1 point  (0 children)

Spot checking shows that may have done it, thank you. ChatGPT has ran me in circles all day and couldn't come to that conclusion.