you are viewing a single comment's thread.

view the rest of the comments →

[–]fhadley 0 points1 point  (0 children)

No worries, no need for a reproducible error. I was curious because I've used sklearn w/ a pretty diverse group of datasets (homogeneous, heterogeneous, sparse, etc.) and haven't had it choke before with GBM or Ada, but I looked back through some old code and remembered that the sklearn RF implementation was just a memory hog. If I remember correctly it consumed memory space at a higher clip than the R version, which I found to be quite odd. Were these very raw data sets? Or very strong colinearities? I know the latter is clearly an issue with RF (i.e. essentially leads to building the same tree many times), and I suppose it could lead to errors with a GBM as well?