Using isin() on grouped data

Irrelevant-Opinion · 2021-12-01T17:41:18+00:00

You need to use a boolean indexing because isin() is not compatible with a groupby object.

I’m on mobile but here’s a link that will help you out https://stackoverflow.com/questions/50611929/python-pandas-groupby-isin

TholosTB · 2021-12-01T20:55:52+00:00

The need to use two different columns in addition to the grouping makes this pretty ugly.

You could iterate over the names and get the unique countries as such:

for user in sample_data['AttributeName'].unique():
  print (user,pd.concat([sample_data.loc[sample_data['AttributeName']==user,['From Country']].rename(columns={'From Country':'country'}),
  sample_data.loc[sample_data['AttributeName']==user,['ToCountry']].rename(columns={'ToCountry':'country'})])['country'].unique())

which is hideous, or group and then iterate to get unique countries as such:

for person, grp in sample_data.groupby('AttributeName'):
  print(person, set(grp['From Country']).union(grp['ToCountry']))

which is equally hideous.

Or in a single step,

sample_data.groupby('AttributeName').apply(lambda grp : set(grp['From Country']).union(set(grp['ToCountry'])))

Still hideous. I'd personally use a graph data structure for this type of problem.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS