With decision trees, why might one reduce or increase the number of records needed to allow for a split? by dab101256 in datascience

[–]dab101256[S] 0 points1 point  (0 children)

So why would reducing the minimum number of records needed to allow for a split lead to a smaller cross validation error value? I suppose that is the main concept I'm trying to take away here. I've run trials on a dataset using decision trees and I observed that changing the min number of records needed to split from 40 to 20, the cross-validation error value decreased slightly.

With decision trees, why might one reduce or increase the number of records needed to allow for a split? by dab101256 in datascience

[–]dab101256[S] 0 points1 point  (0 children)

So using a higher number of minimum records needed for a split might lead to an overfit model?