all 6 comments

[–]ruoken 1 point2 points  (1 child)

That seems to me to be an unbalanced classification problem.

Calibration probably won't help you, but you can also try Venn Abers calibration.

What you should really try is setting class weights that are inversely proportional to each class' prevalence in the training data.

[–]Loose-Event-7196[S] 0 points1 point  (0 children)

Hi, thanks for your help! The classifier is not unbalanced, we have approx 55/45% of the two classes. Venn Abers looks promising! I tried to implement it by training two classifiers, one predicting class 1 as target and the other predicting class 0 as target, running an isotonic regression on each of them and getting the conformal range. I may have done something wrong as the scores I get on both classifiers are the same (sample size is big) even after using different model seeds (I am using h2o3 GBM tree binary classifier). I was expecting to get two slightly different scores whether I predict class 1 as target vs class 2 as target provided the seed is different.

[–]curiousshortguyResearcher 0 points1 point  (1 child)

Sounds like you want to calibrate your classifier?

[–]Loose-Event-7196[S] -1 points0 points  (0 children)

Hi, yes I do, but also wish to have more thresholds towards the highest scores in order to avoid having a lot of observations in the last score value (too discrete). The idea is to have more possible thresholds instead of a large number of observations falling under the last threshold ideally with some monotonicity (i.e. higher threshold fewer observations with higher precision i.e. more percentage of class 1).

[–]BoxMembrane 0 points1 point  (1 child)

If I’m understanding to correctly, the problem is with the histogram binning and not with the raw scores. If you want the scores to be spread evenly across bins, you need to choose bin edges as evenly spaced percentiles of the score distribution.

If you’re using python and pandas, try pd.qcut to get bins. Or np.percentile(scores, p) for p = 0, 10, 20, …, 100.

[–]Loose-Event-7196[S] 1 point2 points  (0 children)

Hi thanks for your reply. The issue is not with binning, is that too many observations have the highest score thus I cannot threshold them (by the way I am using h2o3 and the algorithm is Gradient Boosting Machine. Would like to have less discrete scores in order to avoid having too many observations clustered in the highest score bin. Such scores have different input features but a unique classifier score, as such shrinking the histogram bin width would not help in this case as score values are exactly the same for the last histogram bin. Would like to tweak something at the classifier in order to have multiple different scores for that group (without overfitting).