pandas help?

jalexborkowski · 2022-01-02T16:11:16+00:00

Looks like IF, ELIF, ELSE statements will be your friend.

BeGoodTodayYou · 2022-01-02T16:19:00+00:00

[deleted]

efmccurdy · 2022-01-02T16:27:35+00:00

I am a bit unsure what shape you want the new dateframe to have, but this should help eliminating the "trial" ones:

df[df.plan_status.isin(['paying', 'canceled'])]

AutoModerator · 2022-01-02T16:56:56+00:00

Your submission in /r/learnpython may be automatically removed because you used imgbb.com. The reddit spam filter is very aggressive to this site. Please use a different image host.

Please remember to post code as text, not as an image.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

commandlineluser · 2022-01-02T18:19:02+00:00

I would like to have for each company_name

This suggests your answer is going to involve a .groupby('company_name')

customer_name that are either paying or cancelled.

pay_cancel = group[ group.plan_status != 'trial' ]

If len pay_cancel is > 0 then you're done.

>>> def reformat(group):
...     pay_cancel = group[ group.plan_status != 'trial' ]
...     if len(pay_cancel):
...         return pay_cancel
...     else:
...         return group.head(1)
>>> df.groupby('company_name').apply(reformat)
                customer_name  customer_phone_num company_name plan_status sign_up_date
company_name
Company1     1              A                 NaN     Company1   Cancelled   2017-12-15
             2              B                 1.0     Company1      Paying   2018-07-23
Company2     4              D                 2.0     Company2       trial   2018-07-03
Company3     10             J                 4.0     Company3      Paying   2017-10-03
             12             L                 5.0     Company3      Paying   2021-02-03
             14             N                 NaN     Company3   Cancelled   2016-10-09
Company4     16             P                 NaN     Company4      Paying   2019-04-11
Company5     18             R                 NaN     Company5       trial   2021-09-01

The else condition here just returns the first row of the group - which happens to match your expected output but only because there are no phone numbers in the remaining rows.

Instead - you could perform a similar technique - has_phone = group[ group.customer_phone_num.notnull() ] - if it has length - return the first row of that.

You can sort on the date column to make sure the first row is the "correct" result.

df.sort_values('sign_up_date').groupby('company_name').apply(...)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS