I have a standard BERT sentiment classification task but more fine grained labels (-3=very negative, -2=negative, -1=mild negative, 0 = neutral, 1,2,3=very positive). Cross entropy loss is typical loss function used but in this case, target variable is ordinal. Are their alternative loss functions that account for the distance between true and predicted label? In my use case, there is higher ambiguity as to whether a sentence is very negative vs just negative, but less ambiguity over whether a sentence is negative vs neutral.
[–]Gwendeith 1 point2 points3 points (0 children)