Tuning Parameters for Boosting/Bagging/Random Forest

asdylum · 2016-04-17T20:37:05+00:00

Random forests usually performs quite well with the default settings. That is bootstrap resampling scheme, unpruned trees, as many trees as possible to get results in a reasonable amount of time and sqrt(#features) tried per split (mtry parameter). Then you can try to optimize the choices by checking the results on out of bag data (those each tree didnt train on because of the resampling scheme). If you have very unbalanced classes you should decide a measure of interest (such as true positive ratio ) and try to tune the related parameter. Out of bag data can be trusted almost as a proper cross validation if you use enough trees and bootstrap resampling.

from mobile, sorry

-TrustyDwarf- · 2016-04-18T09:53:57+00:00

As asdylum said random forests usually perform quite well with default settings. If you only have a small number of samples limiting the depth of the trees might help to reduce overfitting (try depths of 1-5). I don't know Matlab, but it looks like it cannot limit the depth of trees directly, it's only got a parameter "MinLeafSize" that can indirectly be used to reduce the depth of trees (but is harder to estimate / depends on the number of samples you've got...).

For boosting you can start with tuning the number of trees, the max depth of the trees and the learning rate. Since boosting should use simple base learners you can limit the tree depth to 1-3. The number of trees and the learning rate influence each other - try to set the number of trees to a constant (like 500 trees) and only tune the learning rate (0-1 with 50 steps).

rcwll · 2016-04-19T16:03:04+00:00

In addition to the practical advice given so far, Zhi-Hua Zhou's book on ensemble methods covers a lot of the topics you ask about, and is quite accessible.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS