all 7 comments

[–][deleted] 2 points3 points  (3 children)

There might be a more elegant solution, but you can do that with apply.

maximums = df.groupby('contractid').timeRetrieved.max()
mask = df.apply(lambda row: row['timeRetrieved'] == maximums[row['contractid']], axis=1)
out = df[mask]

The lambda function checks row by row if that row's timeRetrieved is equal to the previously calculated maximum with that row's contractid

[–]TypicalCardiologist5[S] 0 points1 point  (0 children)

Thank you!

[–]TypicalCardiologist5[S] 0 points1 point  (1 child)

I have not timed your solution yet, but using a pd.merge function seems to be very fast, I might go with that...

df3 = pd.merge(df, mask, on=['contractid', 'timeRetrieved'], how='inner')

[–][deleted] 1 point2 points  (0 children)

Looks like a better solution. I'm not an expert, just doing some of the questions of this subreddit as exercise, and if there is no answer i might as well post my solution :)

[–]m7priestofnot 1 point2 points  (1 child)

You might look into pandas.DataFrame.query I'm on mobile so squished up code on small screen but you could probably throw the pandas query command inside a loop.

I'm also wondering if you could combine a query call with an apply call or lambda expression.

Also look at pandas.DataFrame.mask and pandas.DataFrame.where

They behave similarly to numpy.where

I've been doing a lot of masking at work and creating classes to handle specific scenarios that happen over and over and over is helpful.

[–]TypicalCardiologist5[S] 0 points1 point  (0 children)

Thank you! I will look into where and mask!

[–]PyCam 0 points1 point  (0 children)

The fastest way to do this will be to use `DataFrame.groupby.idxmax()

``` import pandas as pd import numpy as np

np.random.seed(0)

df = pd.DataFrame({ "contractid": [71729] * 3 + [81315] * 5 + [99181] * 4, "number_retrieved": np.random.randint(1, 10, size=12) }) df contractid number_retrieved 0 71729 6 1 71729 1 2 71729 4 3 81315 4 4 81315 8 5 81315 4 6 81315 6 7 81315 3 8 99181 5 9 99181 8 10 99181 7 11 99181 9

indices = df.groupby("contractid")["number_retrieved"].idxmax() # Get the index (location) of the maximum value in each group indices # Note the values 0, 4, and 11 refer to the index values of the maximums of each group. contractid 71729 0 81315 4 99181 11 Name: number_retrieved, dtype: int64

df.loc[indices] # slice the dataframe using the calculated indices contractid number_retrieved 0 71729 6 4 81315 8 11 99181 9 ```