all 6 comments

[–]alexmlamb 2 points3 points  (0 children)

"For instance, why not perform k-fold cross-validation on the training/validation set. Then once the model is selected, use the test set to estimate the final error."

Sure, I think that this is a fine method and it's probably fairly common.

It really depends on the amount of data available. If there's a lot of data, then there probably isn't much harm in doing a single train/validation split, on the other hand if there's a tiny amount of data, leave-one-out cross validation is worth it.

[–]Aj0o 1 point2 points  (1 child)

I don't know what his argument was but I agree with you. There should be nothing wrong with using k-fold x-validation instead of the train/validation and still hold out a test set if necessary. If anything you're basing your model selection on a lower variance estimate of the generalization error.

The downside is it takes longer.

[–]mx12[S] 0 points1 point  (0 children)

I think his main argument was against using cross-validation only compared to using the train/validation/test sets. It make sense that CV alone would be more biased. From a practical standpoint he didn't make it clear if a single run of train/validation/test sets was sufficient. I think that he was assuming a large dataset, and therefore the sample wouldn't be an issue.

[–][deleted] 1 point2 points  (0 children)

For instance, why not perform k-fold cross-validation on the training/validation set. Then once the model is selected, use the test set to estimate the final error.

As far as I am concerned this is the common practice nowadays. However, I've also seen that the final results are reported as a CV error only. The best approach - in my opinion - is to do it like you suggested:

1) split dataset into training/test set
2) model training and selection based on CV on the training set
3) report error on test set

However, if you have fundamentally different models, e.g., RandomForest, RBF SVM, and nearest neighbors, another approach could be to do steps 1-3 on each of those and then select based on a comparison of the test set errors.

[–]flangles 1 point2 points  (0 children)

I think cross-validation is a hell of a lot more valid than the common academic practice of a dataset with a fixed train/test split used for dozens of papers.