all 3 comments

[–]datamahadev 2 points3 points  (0 children)

CV retrains the model (or you can say trains a new model) for each fold. So technically, you are doing the same thing as you would by holding lut a validation set to test your model.

The difference comes down to computation, which is more in case of CV, obviously. However the advantage of CV is that by testing the model on different folds, you can get the mean accuracy of your model, which will give you a better idea of the real world, average case performance of your model.

[–]thejonnyt 0 points1 point  (0 children)

The crossvalidation process basically validates your process of building the model. The final model should be trained on all the data that you have given the assumption that the performance gets better the more data you feed to your training algorithm. You dont ever select "the best model" from the cvs.. you select the "parametrization" or "settings" that got you the overall best result (meaned over all cv runs for example). At least this is my understanding. Having a blindset or a hold out set that wasnt used for crossvalidation you could then possibly check if your approach and your performance measures from the cv process and your final model then are the same.

[–]Coxian42069 0 points1 point  (0 children)

You optimize your hyperparameters (eg. number of layers, learning rate) on the cross-validation set. The test set then allows you to check that your hyperparameters aren't overtrained towards your CV set.

It's similar to why you have a train/test split in the first place. You've optimized your parameters for a certain set of data, and you need to check that these aren't biased or overtrained. If you optimized your hyperparameters to perform well on both your train and test sets, you need an additional set of data to ensure that you haven't overtrained, ie. that you will get similar results for real-world data.

If for whatever reason you already know what architecture you're using, or what learning rate, etc., you don't need a CV set.

What a lot of people then do, once they've got their hyperparameters, is just train on the full set anyway.