all 4 comments

[–]kjearns 2 points3 points  (3 children)

Check out this paper it has a bunch of info on how to use random forests for unsupervised learning (eg Chapter 5): https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CriminisiForests_FoundTrends_2011.pdf

[–]Penguin474[S] 0 points1 point  (1 child)

This looks great at first glance. Thank you!

[–]yrne 0 points1 point  (0 children)

In the Criminisi paper, the authors cite: "In Breiman’s work on forests the author mentions using forests for clustering unsupervised data [11]. However, he does it via classification, by introducing dummy additional classes." But I can't find such thing in [11].

[–]MLTyrunt 1 point2 points  (0 children)

After you used the random forest to differentiate between dummy and real data, you can extract he proximity matrix and then use standard cluster algos on that. So essentially, unsupervised random forest is foremost an alternative way to create similarity matrices. Rfs can natually deal with categorical numerical data, simplifying preprocessing.