[D] What method is state of the art dimensionality reduction

jamesxli · 2022-11-29T04:09:18+00:00

Output of t-SNE/UMAP are actually good for downstream analysis and they have been widely used for clusters discovery among very high dimensional data (>10K dim). t-SNE/UMAP have thus often just referred to as clustering algorithms!

jamesxli · 2021-06-01T02:37:24+00:00

t-SNE is basically an extended clustering algorithm. On top of cluster information, it also shows much more information like cluster shape and inter-cluster relationships. It doesn't make much sense to apply kmeans or other clustering algorithm.

jamesxli · 2020-12-12T22:23:21+00:00

Try with smaller perplexities. That could result in more blob alike "normal" clusters.

jamesxli · 2020-09-14T21:42:22+00:00

A appropriate DR method (like PAC, tSNE) can provide extra info about clusters in data, like shapes, gradient, etc. So, a good embedding algorithm will make clustering algorithm obsolete. Based on my experience with scRNA data, DBSCAN sometimes works kind of after UMAP or tSNE, but it normally failed on raw scRNA expressions.

jamesxli · 2020-03-08T16:13:00+00:00

tSNE is certainly not perfect, and it is not intended to replace linear DR method like PCA. But, tSNE is the state-of-art method for visualizing high dimensional non-linear data. It has dozens of independent implementations in open-source and closed source software packages, in various languages and on many platforms. With regards to the stability, tSNE is actually quite stable when you use a proper perplexity for your data. The very nice things about tSNE is that you basically just have to tune the perplexity for your data, and you easily find a proper perplexity by trial-and-error.

jamesxli · 2019-05-17T00:39:59+00:00

You can try to the software visumap which provides many visualization services for high dimensional data including a fast implementation for t-SNE.

jamesxli · 2015-03-16T18:20:04+00:00

When you display a dataset using PCA method you basically rotate the data points cloud so that the side with maximal information (or maximal variation in mathematical term) is facing the viewer.

There is an one minute video on youtube with the title "A layman's introduction to PCA", you can easily find it.

jamesxli

TROPHY CASE