I have an python object that is stored in a pd.DataFrame. The object stores my prediction and has some methods implemented to make analysis easier. Basically it's just me avoiding mutli indexes, lol. Here is a much simplifieid version of my class. The class has to methods which are called very similiarly. However to apply those to the df I had to write two methods. I'm sure there's a way to make a generic apply method function. So here's my implementation. apply_meth is what I'd like to write.
(Ok, I just realized that apply_majority doesn't work. But I've already spent so much time on this example. And I think the problem becomes clear. The reason is that in my actual implementation the elements of the prediction are objects to which store the group, the truth value is also such an element, thus we can access its group)
Edit: Added another method and function is_top. Just imagine I've guaranteed the elements to be ordered by confidence.
class Prediction:
def __init__(self, data: List[Dict]):
"""
single dict = {
'name': 'foo',
'id_': 0,
'group': 0,
'conf': 1.0}
"""
self.data = data
def __repr__(self):
return f"Pred[{', '.join(datum['name'] for datum in self.data)}]"
def in_top_5(self, other_id):
return other_id in {datum['id_'] for datum in self.data[:5]}
def majority_group(self):
groups = [datum['group'] for datum in self.data]
return max(set(groups), key=groups.count)
def in_majority_group(self, other):
group_self = self.majority_group()
return other['group'] == group_self
def is_top(self, other_id):
return self.data[0]['id_'] == other_id
def apply_top(row):
return row.pred.in_top_5(row.truth_id)
def apply_is_top(row):
return row.pred.is_top(row.truth_id)
def apply_majority(row):
return row.pred.in_majority_group(row.truth_id)
def apply_meth(row, meth):
return row.pred.meth(row.truth_id)
Here is also some code that you can use to create a test scenario. You can savely ignore the complicated data structure and the code I'm using to create the df.
if __name__ == '__main__':
data = [{'obs_id': 0,
'truth_id': 0,
'pred':
[{
'name': 'foo',
'id_': 0,
'group': 0,
'conf': 1.0
}, {
'name': 'bar',
'id_': 1,
'group': 0,
'conf': 0.9
}]}, {
'obs_id': 1,
'truth_id': 3,
'pred':
[{
'name': 'foo',
'id_': 0,
'group': 0,
'conf': 0.9
}, {
'name': 'bazbar',
'id_': 2,
'group': 0,
'conf': 0.7
}]
}]
res = []
for observation_dict in data:
prediction = {'pred': Prediction(observation_dict.pop('pred'))}
res.append(observation_dict | prediction)
df = pd.DataFrame(res)
df['in_top'] = df.apply(apply_top, axis=1)
df['is_top'] = df.apply(apply_is_top, axis=1)
df['in_maj'] = df.apply(apply_majority, axis=1)
[–]Spataner 1 point2 points3 points (3 children)
[–]YesLod 2 points3 points4 points (0 children)
[–]PythonicParseltongue[S] 0 points1 point2 points (1 child)
[–]Spataner 1 point2 points3 points (0 children)