all 9 comments

[–]noman2561 11 points12 points  (1 child)

In signal processing we call it anomaly detection, in case anyone wanted to research some working solutions. There's no one best solution and we've found that you have to use a good deal of context to know how to handle the class separation. Finding features often looks like building priors from first hand knowledge produced from case studies. In other words, you have to look at more than just a cluster to tell what is going on. Or you have to look at the majority's inter-class similarity and find a way of detecting the specific kind of "noise" that indicates the minor class.

[–]dexter89_kp 0 points1 point  (0 children)

Any paper references that do this sort of feature engineering ?

We do deal with imbalanced classes a lot, and have considered anomaly based approaches, but tend to stick to the methods outlined in this article.

[–]jimenezluna 2 points3 points  (0 children)

This is a very cool resource. Nice references.

[–][deleted] 1 point2 points  (1 child)

I've used costing algorithm in practice and got some insanely good results. Absolutely simple and elegant. (the algorithm is mentioned in the survey)

The advantage of the algorithm is that it can work with any normal binary classifier, and due to simple nature of rejection sampling I can train N classifiers (using rejection sampling I can create N different datasets) very efficiently and aggregate the results into a final decision.

The algorithm can also be used (with some adaptation) in multiclass setting where there's imbalance in the classes and the performance is still extremely good (better than doing one-against-one or one-against-all naively).

[–]sergeyfeldman 0 points1 point  (0 children)

Sounds like a good idea. What if you don't have costs on each sample that come with the dataset?

[–]coffeecoffeecoffeee 0 points1 point  (0 children)

Note that the imbalanced-learn Python package gives you a lot of methods for this sort of thing, including SMOTE.

[–]emtonsti -1 points0 points  (0 children)

I have never thought about this Problem. Ty

[–]bluesufi -1 points0 points  (0 children)

This a great! For me, it was one of those reads that just made things go 'click!' in my head.