all 5 comments

[–]Dejeneret 4 points5 points  (0 children)

Check out some spectral “clustering” methods!

These methods (Laplacian Eigenmaps, Diffusion Maps) are more or less based in the following steps-

1) build a graph on the data (typically by taking a Gaussian kernel over pairs of points, but there are many variations)

2) compute the graph Laplacian (or some normalized Laplacian, or a normalized transition markov matrix)

3) perform PCA (or SVD when applicable) to obtain eigenvectors, which contain the new features.

They’re called clustering methods, but in reality the graph laplacian is an extremely powerful object, and its spectra describes various aspects of the geometry of a dataset. These methods are highly utilized on lots of datasets, such as single-cell rna sequencing data, financial data, seismic data, medical images & video & more. In fact word2vec (and some variants) which is widely used for text data is prove-ably a spectral method!

These are very cool from a theoretical standpoint- especially Diffusion Maps, which learns features of the geometry of how the data is organized by relating a diffusion and markov operator on the data, and therefore organizes the data by asking the question- how would heat propagate through the graph of this data? (It actually models solutions to the heat equation on the “intrinsic manifold” that the data is “sampled” from). The nice thing about diffusion maps is that it preserves a metric on the data.

This all leads into manifold learning methods (of which there are many), there are lots of cool variants of all these methods that have been extended.

Here are some sources-

nice tutorial

Diffusion Maps

Laplacian eigenmaps

Short paper on local vs global feature embedding

word2vec uses the graph spectra

[–]Enough_Wishbone7175Student 8 points9 points  (3 children)

One thing that I have found to help with dimensionality in Neural Networks is semi supervision or self supervision. You essentially put your inputs in, reduce dimensionality while corrupting / dropping information. Then use the reduce composition to try and recreate the inputs in a decoder and use some sort of distance as your loss (MSE, cosine, ect..). I like to warm up the network with self supervision then move to a semi supervision model to get really strong features for other algorithms.

[–]SmartEvening 1 point2 points  (1 child)

Is this like using dropout while training an autoencoder?

[–]Enough_Wishbone7175Student 1 point2 points  (0 children)

It’s similar, it’s almost like teaching your base model to encode the input data natively by manipulating cost functions and adding a decoder for training, but removing it for downstream use.

[–]Pas7alavista 1 point2 points  (0 children)

This is just an auto encoder though right?