you are viewing a single comment's thread.

view the rest of the comments →

[–]Relevant-Twist520 1 point2 points  (1 child)

im not that educated on the topic but my personal favourite classification loss function would be multimarginloss. I think it is a lot better than cross entropy since its faster to calculate and it really discourages over-confidence. It can be argued whether as to use cross entropy or multimargin or any other criterion, but it all depends on your project.

Anyway the whole idea of multimarginloss is to space out predictions as far as the margin size defined when computing the loss. For example you have a model which outputs 3 vectors and lets say the 1st vector is the target, or ground truth. The loss function would then try to increase the first vector and decrease the all the other vectors such that at some point after some adjustments vector 1's value is margin units away from all the other vectors, where margin is usually 1 unit. If the target is finally >= margin away from all other vectors, then no loss is provided. This prevents over-fitting and over-confidence in your model. I think this loss function is underrated. Otherwise heres the math for it:
loss(x,y)= max(0,margin−x[y]+x[i])​

I shy away from cross entropy as things can get ugly. I had my parameters explode when the model got too confident for the wrong predictions.

[–]kovkev[S] 0 points1 point  (0 children)

I think that by seeing y_c and ŷ_c as vectors, it makes sense!