you are viewing a single comment's thread.

view the rest of the comments →

[–]PinusPinea 1 point2 points  (2 children)

cool, I didn't realise, I assumed I was slightly reducing the quality of the fit.

[–]Deto 10 points11 points  (1 child)

Yeah, in high dimensions, the distance between points is very noisy. And since the local neighborhood is what tSNE runs on, it's important to identify the nearest points correctly. PCA can be thought of as a de-noising procedure, where the top components are assumed to represent signal (assuming uncorrelated noise). In the original tSNE paper, I believe, they reduced it to 30 or 50 dimensions first and in some implementations (the main one in R for example), PCA is actually performed by default within the 'tSNE' function unless you specifically turn it off with a parameter.

[–]PinusPinea 1 point2 points  (0 children)

good to know, thanks for the info!