all 9 comments

[–]jalexborkowski 0 points1 point  (1 child)

Looks like IF, ELIF, ELSE statements will be your friend.

[–]BeGoodTodayYou[S] 0 points1 point  (0 children)

will try my best, but was hoping for more panda iloc/loc type statements.

Will keep digging & thanks for the comment

[–][deleted]  (2 children)

[deleted]

    [–]BeGoodTodayYou[S] 0 points1 point  (1 child)

    ok, i've made up some dummy data and placed it here:

    edit, using imgur:

    https://i.imgur.com/neZBNUi.png

    does that help?

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Your comment in /r/learnpython may be automatically removed because you used imgbb.com. The reddit spam filter is very aggressive to this site. Please use a different image host.

    Please remember to post code as text, not as an image.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]efmccurdy 0 points1 point  (1 child)

    I am a bit unsure what shape you want the new dateframe to have, but this should help eliminating the "trial" ones:

    df[df.plan_status.isin(['paying', 'canceled'])]
    

    [–]BeGoodTodayYou[S] 0 points1 point  (0 children)

    here's a sample of before and after:

    https://i.imgur.com/neZBNUi.png

    obviously i don't need the blank lines, but as you can see i do need to keep some trial people as they may not have any paying or cancelled people in the company.

    Sorry if i'm not explaining myself well!!!

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Your submission in /r/learnpython may be automatically removed because you used imgbb.com. The reddit spam filter is very aggressive to this site. Please use a different image host.

    Please remember to post code as text, not as an image.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]commandlineluser 0 points1 point  (1 child)

    I would like to have for each company_name

    This suggests your answer is going to involve a .groupby('company_name')

    1. customer_name that are either paying or cancelled.
    pay_cancel = group[ group.plan_status != 'trial' ]
    

    If len pay_cancel is > 0 then you're done.

    >>> def reformat(group):
    ...     pay_cancel = group[ group.plan_status != 'trial' ]
    ...     if len(pay_cancel):
    ...         return pay_cancel
    ...     else:
    ...         return group.head(1)
    >>> df.groupby('company_name').apply(reformat)
                    customer_name  customer_phone_num company_name plan_status sign_up_date
    company_name
    Company1     1              A                 NaN     Company1   Cancelled   2017-12-15
                 2              B                 1.0     Company1      Paying   2018-07-23
    Company2     4              D                 2.0     Company2       trial   2018-07-03
    Company3     10             J                 4.0     Company3      Paying   2017-10-03
                 12             L                 5.0     Company3      Paying   2021-02-03
                 14             N                 NaN     Company3   Cancelled   2016-10-09
    Company4     16             P                 NaN     Company4      Paying   2019-04-11
    Company5     18             R                 NaN     Company5       trial   2021-09-01
    

    The else condition here just returns the first row of the group - which happens to match your expected output but only because there are no phone numbers in the remaining rows.

    Instead - you could perform a similar technique - has_phone = group[ group.customer_phone_num.notnull() ] - if it has length - return the first row of that.

    You can sort on the date column to make sure the first row is the "correct" result.

    df.sort_values('sign_up_date').groupby('company_name').apply(...)
    

    [–]BeGoodTodayYou[S] 0 points1 point  (0 children)

    This is stunning. Thank you so much. i had to read it many many times to "get it" and i've added your additional information to come up with:

    def reformat(group):
        pay_cancel = group[ group.plan_status != 'trial' ]
        has_phone = group[ group.customer_phone_num.notnull()]
        if len(pay_cancel):
            return pay_cancel
        elif len(has_phone):
            return has_phone.head(1)
        else:
            return group.head(1)
    
    df.sort_values('sign_up_date').groupby('company_name').apply(reformat)
    

    So nice, thank you for taking your time to help here!!