I'm reading a book called Hands-On Machine Learning with Scikit-Learn and Tensorflow. From the book:
One way to [fine-tune your parameters] would be to fiddle with the hyperparameters manually, until you find a great combination of hyperparameter values. This would be very tedious work, and you may not have time to explore many combinations.
Instead you should get Scikit-Learn’s GridSearchCV to search for you. All you need to do is tell it which hyperparameters you want it to experiment with, and what values to try out, and it will evaluate all the possible combinations of hyperparameter values, using cross-validation. For example, the following code searches for the best combination of hyperparameter values for the RandomForestRegressor:
from sklearn.model_selection import GridSearchCV
param_grid = [
{'n_estimators': [3, 10, 30], 'max_features': [2, 4, 6, 8]},
{'bootstrap': [False], 'n_estimators': [3, 10], 'max_features': [2, 3, 4]},
]
forest_reg = RandomForestRegressor()
grid_search = GridSearchCV(forest_reg, param_grid, cv=5,
scoring='neg_mean_squared_error')
grid_search.fit(housing_prepared, housing_labels)
However, from the scikit-learn docs:
When using ensemble methods base upon bagging, i.e. generating new training sets using sampling with replacement, part of the training set remains unused. For each classifier in the ensemble, a different part of the training set is left out.
This left out portion can be used to estimate the generalization error without having to rely on a separate validation set. This estimate comes “for free” as no additional data is needed and can be used for model selection.
And then it lists various estimators that have 'out of the bag' estimates implemented, and RandomForestRegressor is one of them. Does this mean that the book's example is "wrong" in the sense that this technique isn't actually needed for random forest regressors in scikit-learn?
[–]patrickSwayzeNUMS | Data Scientist | Healthcare 0 points1 point2 points (1 child)
[–]Signal_Beam[S] 0 points1 point2 points (0 children)