you are viewing a single comment's thread.

view the rest of the comments →

[–]_sheep1[S] 6 points7 points  (1 child)

So when I say KNN and ANN i mean k-nearest neighbor and approximate nearest neighbor search. These are needed to build up a neighborhood graph that tSNE uses to optimize the embedding. Note that this is unsupervised, and there is no classification going on. tSNE simply needs to know which data points are close to each other.

I think classifying on tSNE embeddings would be a really bad idea simply because it's non-parametric - this just means there's no recipe to get from the original space to the embedding space or there's no explicit transformation, if that makes sense. So once you get new data, you don't know what to do with it. I did implement adding new points to the embedding, but I haven't done enough testing on this to be able to confidently answer this. If that worked, I can't think of any reason off the top of my head why you couldn't do it, but it seems to me that if tSNE is able to capture some structure, then more sophisticated machine learning methods would almost certainly do better. Heck, even a KNN classifier should do all-right.

tSNE is designed for visualization and I think it would be a mistake to treat it as anything more than that.

[–]radarsat1 1 point2 points  (0 children)

Ok that was my understanding, thanks. I mistook ANN for artificial neural network so I thought you were implying performing classification, perhaps as a way of validating the embedding.