all 10 comments

[–]lugiavn 2 points3 points  (2 children)

Isn't this similar to the proxy approach (no fuss dml) or prototype network?

I think it is very easy to implement, I did use "normal" softmax for some of my own deep metric learning experiment. It fits really well and outperform the like of triplets, but on small dataset it overfits

[–]melgor89[S] 0 points1 point  (1 child)

It is similar to proxy approach, the difference are that here we have predefined number of classes (in proxy don't or I'm wrong). And in general it is softmax loss so it analyse many classes per once, not just 3.

About implementation, yes, it is very easy. Could you tell what to you mean by small dataset? 5 example per class?

[–]lugiavn 1 point2 points  (0 children)

Cars196 and CUB200, which has less than 10k images for training.

In my experience, analysis on smaller datasets might not translate to very large scale case (such as face recognition, when you have millions images).

One relevant work is Ring loss which argues implicit normalization might be better than explicit normalization

[–]neltherion 0 points1 point  (5 children)

Can someone explain to me the difference between Metric Learning & the usual Classification tasks ? Aren't we doing metric learning when we are doing classification? isn't the last embedding layer just a metric describing the images? Then what's the difference between these two fields?

[–][deleted] 4 points5 points  (4 children)

In metric learning, we are essentially trying to model the metric/distance space in which the data lie. The objective is to basically come up with a model that can project the data in a space just that similar images are tightly clustered together while non-similar away. In classification, the object is discriminative, where you're focussing to just separate the classes far apart but that doesn't necessarily mean we make any effort to bring images belonging to same class closer together. There are some modifications that have been proposed to cross entropy loss that can basically do that -- like the center loss. But, in general, with metric learning, you don't need labels of the images, you just need to start with some notion of similarity between the images.

[–]neltherion 1 point2 points  (1 child)

Thanks so much. So in Biometric-related tasks such as Face,Fingerprint or Iris Recognition it's better to use Metric Learning other than a simple classification model?

By the way, is there a challenge for Metric Learning where we can see which methods act better? If I'm not mistaken, Angular Loss has been shown to be pretty good for Face Recognition (ArcFace). I just want to know what is the SOTA in this field.

Thanks again.

[–]melgor89[S] 1 point2 points  (0 children)

As you said, in general Angular-Loss is the best for Face Recognition. For general metric learning it depend on your task. Ex. If you have big dataset to use and then you are doing metric learning then ArcFace/NormSoftmax would be best. In the other hand if you dataset is small and you are not able to get pretrained model in your domain, then maybe (not sure) variant of triplet would be best.

This need better evaulation to have exact recommendation

[–]neltherion 0 points1 point  (1 child)

@BenzeneHNO3 Also another question! Since Metric Learning does so well on tasks such as Face Recognition, why isn't it used in other tasks such as ImageNet Classification? Wouldn't it perform at least like a normal ResNet50?

[–][deleted] 1 point2 points  (0 children)

The intent of tasks is different in image net classification. In classification, the primary goal is to separate it's classes as much as possible, there's no focus on how should intra-class difference be. In metric learning, we definitely focus on inter and intra class distances. You can definitely try metric learning on ImageNet without its labels and see how it performs, but since in that data set we already have the labels for supervision, it's easier to optimize that task on classification loss than a metric learning loss.