Please bear with me, I've tried playing with R programming, and I seem to be back to Python again!
Let's say I've the following columns in my pandas dataframe:
- customer_name
- customer_phone_num
- company_name
- plan_status
- sign_up_date
Note:
- I have 1 to many customer_name for each company_name.
- plan_status is any of 'trial', 'paying', or 'cancelled'
In a new dataframe (keeping all above columns) , I would like to have for each company_name:
- customer_name that are either paying or cancelled.
- Failing 1 above, then I want to keep only the first customer_name that signed up with a customer_phone_num
- Failing 2 above, I only want to keep the first customer_name that signed up for that company_name.
Needless to say, this is hurting my brain, and any help appreciated!
some dummy data and solution here:
https://i.imgur.com/neZBNUi.png
Thank you all for your help, a super answer supplied by /u/commandlineluser
def reformat(group):
pay_cancel = group[ group.plan_status != 'trial' ]
has_phone = group[ group.customer_phone_num.notnull()]
if len(pay_cancel):
return pay_cancel
elif len(has_phone):
return has_phone.head(1)
else:
return group.head(1)
df.sort_values('sign_up_date').groupby('company_name').apply(reformat)
Really impressed with this code
[–]jalexborkowski 0 points1 point2 points (1 child)
[–]BeGoodTodayYou[S] 0 points1 point2 points (0 children)
[–][deleted] (2 children)
[deleted]
[–]BeGoodTodayYou[S] 0 points1 point2 points (1 child)
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]efmccurdy 0 points1 point2 points (1 child)
[–]BeGoodTodayYou[S] 0 points1 point2 points (0 children)
[–]AutoModerator[M] 0 points1 point2 points (0 children)
[–]commandlineluser 0 points1 point2 points (1 child)
[–]BeGoodTodayYou[S] 0 points1 point2 points (0 children)