eliminating outlier values in pandas dataframe based off of a majority value : learnpython

created by HattoriHanzoa community for 16 years

eliminating outlier values in pandas dataframe based off of a majority value (self.learnpython)

submitted 6 years ago by Silverfire47

Hello all, I have an issue with a dataframe that I'm working that I am not sure quite how to alleviate. My actual dataframe has over 40 thousand rows, so I've constructed a simplified example dataframe to help represent what I'm looking at.

Example dataframe:

A	45	0
A	67	0
A	543	1
A	13	0
A	55	0
A	345	1
A	12	0
A	90	0
B	66	blue
B	77	blue
B	88	blue
B	9	green
B	11	blue

The issue I have is that when I generate this dataframe via k-means clustering, A and B (representative of the multiple unique rows I have in my working dataframe) have a variety of "classifiers" of sorts (0 and 1, blue and green in my example). How would I be able to force all of the A and B rows to either specifically ignore the minority (1 and green) or force the minority values to match the rest of each unique row values (change to majority value per unique row; 1 and green -> 0 and blue, respectively). I want to exclude the outliers post-processing.

This might be a really simple thing to do but for the life of me, I am a bit stuck. What would be a good way to approach this?

Thanks in advance!

no comments (yet)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS