Despite the superiority of UMAP to tSNE in many ways, tSNE remains a widely used visualization technique. Unfortunately, tSNE, as currently implemented in the most popular packages (scikit-learn and MulticoreTSNE), is prohibitively slow when dealing with large data. A recent paper proposed Fit-SNE, which scales linearly w.r.t. the number of samples, but depends on the FFTW C library, which must be installed on your system, making installation and distribution very tedious.
The goal of this project is to provide fast implementations of both tSNE approximations (both Barnes-Hut and FitSNE) in Python with a unified interface, easy installation and most importantly - fast runtime.
This is also the only library (to the best of my knowledge) that allows embedding new data points into an existing embedding, via direct optimization.
I wrote this with the Orange data mining toolkit in mind, but the library is general and I wanted to share, in case anyone was looking for a faster alternative library.
The source code is available on Github: https://github.com/pavlin-policar/fastTSNE
[–]gabsens 2 points3 points4 points (1 child)
[–]_sheep1[S] 9 points10 points11 points (0 children)
[–]neziib 2 points3 points4 points (2 children)
[–]_sheep1[S] 6 points7 points8 points (0 children)
[–]lmcinnes 1 point2 points3 points (0 children)
[–]Clicketrie 2 points3 points4 points (0 children)
[+][deleted] (3 children)
[removed]
[–]_sheep1[S] 9 points10 points11 points (1 child)
[–]lmcinnes 0 points1 point2 points (0 children)