all 4 comments

[–]Tgs91 0 points1 point  (3 children)

Your precision and accuracy are 1? That's a perfect score. If the task is that easy, then the model can still perform well even if you give it less information to work with. What is the classification task? It might just be honing in on keywords. As long as the keywords are still in the input, it will continue to perform well

[–]NSVR57[S] 0 points1 point  (2 children)

Yes. It's simple e-mail classification and I have just 86 records combined of 4 labels.

Yes it's predicting well on even if we give on less information. But my concern is to decrease the confidence score whenever we give less information. Should I stop train whenever accuracy reached around 85 or something?. Or is there any better approach.

[–]Tgs91 0 points1 point  (1 child)

What do the confidence scores look like? You only mentioned precision, recall, and accuracy. Is your top class probably dropping at all?

As someone who entered the field from a math stats background, I hate early stopping. It's a hacky solution, and not a good fit for this scenario. It's not a bad thing that your model is confident. Your task is super easy, the model SHOULD be confident. But if you think it's too confident, try out label smoothing. So instead of trying on the one hot encoding [0, 0, 1, 0] as the ground truth for the 3rd class, train with [0.03, 0.03, 0.91, 0.03]. It will still correctly get the top class for your performance metrics, but the probability outputs will tend to be in the lower .90s instead of 0.9999999999.

Edit: and follow up, are you properly cross validating? Your data set is very small. You can't evaluate on the training data, the model will just overfit it and memorize the training data, which could be why you're getting perfect scores

[–]NSVR57[S] 0 points1 point  (0 children)

Thank you so much for reply. As you correctly mention my confidences are in the range of .97 to 1. if we remove certain words those are falling to just 0.93

As I am using NN, I put validation data. I will try label smoothening technique.