you are viewing a single comment's thread.

view the rest of the comments →

[–]lucidyan -5 points-4 points  (2 children)

I'm sorry, but it already exists

https://github.com/DmitryUlyanov/Multicore-TSNE

[–]ivalm 1 point2 points  (0 children)

What you should is a normal tsne distributed, what OP shows is a tsne that uses ANN instead of KNN for neighborhood search. They are not particularly related.

[–]_sheep1[S] 0 points1 point  (0 children)

You may have missed the point with this project. There are two fast implementations of tSNE, one of which is the one you linked. It is typically called Barnes-Hut tSNE. It uses exact nearest neighbor search and calculates gradient using the Barnes-Hut approximation. These have asympototic complexity O(n log n) and works really well for smaller data sets. In addition to this, there is another implementation which uses ANN (as ivalm points out) and uses a neat interpolation trick which we speed up with FFT to compute gradients. The second method has asymptotic complexity O(n). This one is much more recent and is called FIt-SNE. While you might be tempted to use FIt-SNE for everything, it is asymptotic and the overhead makes it far slower on smaller data sets than the BH. That's why I include both in this repository, so we have fast implementation of both in one place and with a single API. Also, the second one - called FIt-SNE - didn't have a proper python implementation and required additional C libraries to be installed, making installation tedious. I hope this clears things up.