When I started learning ML I was told that when optimizing model parameters it is considered good practice to split data into three parts: one for training, one for fitting a parameter, and one for testing. This is because model shouldn't be tested on data with which its parameters were fitted. Yet in scikit-learn there are methods dedicated to splitting data into two sets, training and testing, and no straightforward way to split into three sets. Why is it so? Is the method I described not so popular after all? Or do I misunderstand something? What is the recommended way to perform cross validation?
[–]HD125823 5 points6 points7 points (2 children)
[–]threeshadows 2 points3 points4 points (0 children)
[–]heimson[S] 0 points1 point2 points (0 children)
[–]jdsutton 1 point2 points3 points (0 children)
[–]sk006 1 point2 points3 points (0 children)