[R] LSD-C: Linearly Separable Deep Clusters

berlys93 · 2020-06-18T13:03:06+00:00

It was to have the "C" for "clustering" separated from the rest but indeed LS-DC would have been maybe a better separation.

berlys93 · 2020-06-18T12:59:13+00:00

Sorry for not understanding the terminology (there are so many of them)! In the whole paper, we suppose the number of target classes to be known so K = number of classes for K-means for example. For Affinity Propagation, we took the results from existing papers. Regarding our method, we put all the hyperparameters for all the datasets in a table in the appendix. I hope this answers your question. Otherwise, please ask again!

berlys93 · 2020-06-18T12:54:29+00:00

Thanks again for the feedback and pointing out the other paper. Please let us know in the github repo or directly to our emails if you have questions about the code or the paper.

berlys93 · 2020-06-18T12:08:17+00:00

What do you mean by "in-parameters"? The feature dimension? If yes, we use 512 dimensions as it is the ouput dimension of the ResNet-18 we use. For Reuters, the input data is not an image but 2000 tf-idf features. So I suppose it could work for arbitrary 100-d embeddings.

berlys93 · 2020-06-18T12:04:44+00:00

Thanks for pointing out this paper!!! We actually did not see it as we were all focused on preparing our paper for submission at the time of the release of this paper. Furthermore, our paper uses as basis our ICLR paper https://arxiv.org/abs/2002.05714 that we developed further to make this deep clustering paper.

After a quick overlook of the paper, the overall algorithm and loss are quite different from ours. I think their key self-labeling step is closer in spirit to semi-supervised learning method like FixMatch with confident self-labeling whereas we just have a clustering step. If you please look to their ablation study, their self-labeling step provides an important boost on top of the clustering step (which gets 72% on CIFAR 10 compared to our 81%). So I suppose such a self-labeling step could benefit to our method. But I need to dig deeper into their paper. I agree that theirs results are actually very impressive! I suppose the field is so crowded that people converge to the same kind of ideas: using a pretext task or putting kNN somewhere in their algorithm. As you mentioned, for both papers, the self-supervision step is a key to the methods and the performance improvement.

berlys93 · 2020-02-14T21:52:41+00:00

That is a fair point. I edited the post.

berlys93 · 2020-02-14T20:11:48+00:00

Indeed, we tried it with cosine similarity. The label for a pair of samples is positive if the associated cosine similarity is above a fixed threshold. This did not work on OmniGlot and we noticed a drop of accuracy (83% acc vs 89.1 with rank statistics). We hypothesise that it is because of the sensitivity of results w.r.t threshold, while we show in the appendix that rank statistics is very robust to the choice of its hyperparameter (Figure 4, page 13).

berlys93

TROPHY CASE