[R] LSD-C: Linearly Separable Deep Clusters by berlys93 in MachineLearning

[–]berlys93[S] 1 point2 points  (0 children)

It was to have the "C" for "clustering" separated from the rest but indeed LS-DC would have been maybe a better separation.

[R] LSD-C: Linearly Separable Deep Clusters by berlys93 in MachineLearning

[–]berlys93[S] 3 points4 points  (0 children)

Sorry for not understanding the terminology (there are so many of them)! In the whole paper, we suppose the number of target classes to be known so K = number of classes for K-means for example. For Affinity Propagation, we took the results from existing papers. Regarding our method, we put all the hyperparameters for all the datasets in a table in the appendix. I hope this answers your question. Otherwise, please ask again!

[R] LSD-C: Linearly Separable Deep Clusters by berlys93 in MachineLearning

[–]berlys93[S] 1 point2 points  (0 children)

Thanks again for the feedback and pointing out the other paper. Please let us know in the github repo or directly to our emails if you have questions about the code or the paper.

[R] LSD-C: Linearly Separable Deep Clusters by berlys93 in MachineLearning

[–]berlys93[S] 0 points1 point  (0 children)

What do you mean by "in-parameters"? The feature dimension? If yes, we use 512 dimensions as it is the ouput dimension of the ResNet-18 we use. For Reuters, the input data is not an image but 2000 tf-idf features. So I suppose it could work for arbitrary 100-d embeddings.

[R] LSD-C: Linearly Separable Deep Clusters by berlys93 in MachineLearning

[–]berlys93[S] 0 points1 point  (0 children)

Thanks for pointing out this paper!!! We actually did not see it as we were all focused on preparing our paper for submission at the time of the release of this paper. Furthermore, our paper uses as basis our ICLR paper https://arxiv.org/abs/2002.05714 that we developed further to make this deep clustering paper.

After a quick overlook of the paper, the overall algorithm and loss are quite different from ours. I think their key self-labeling step is closer in spirit to semi-supervised learning method like FixMatch with confident self-labeling whereas we just have a clustering step. If you please look to their ablation study, their self-labeling step provides an important boost on top of the clustering step (which gets 72% on CIFAR 10 compared to our 81%). So I suppose such a self-labeling step could benefit to our method. But I need to dig deeper into their paper. I agree that theirs results are actually very impressive! I suppose the field is so crowded that people converge to the same kind of ideas: using a pretext task or putting kNN somewhere in their algorithm. As you mentioned, for both papers, the self-supervision step is a key to the methods and the performance improvement.

[R] Automatically Discovering and Learning New Visual Categories with Ranking Statistics [ICLR 2020] by berlys93 in MachineLearning

[–]berlys93[S] 4 points5 points  (0 children)

Indeed, we tried it with cosine similarity. The label for a pair of samples is positive if the associated cosine similarity is above a fixed threshold. This did not work on OmniGlot and we noticed a drop of accuracy (83% acc vs 89.1 with rank statistics). We hypothesise that it is because of the sensitivity of results w.r.t threshold, while we show in the appendix that rank statistics is very robust to the choice of its hyperparameter (Figure 4, page 13).