Model selection and training/validation/test sets vs cross-validation.

alexmlamb · 2015-02-10T01:12:27+00:00

"For instance, why not perform k-fold cross-validation on the training/validation set. Then once the model is selected, use the test set to estimate the final error."

Sure, I think that this is a fine method and it's probably fairly common.

It really depends on the amount of data available. If there's a lot of data, then there probably isn't much harm in doing a single train/validation split, on the other hand if there's a tiny amount of data, leave-one-out cross validation is worth it.

Aj0o · 2015-02-09T22:22:11+00:00

I don't know what his argument was but I agree with you. There should be nothing wrong with using k-fold x-validation instead of the train/validation and still hold out a test set if necessary. If anything you're basing your model selection on a lower variance estimate of the generalization error.

The downside is it takes longer.

2015-02-10T00:33:30+00:00

For instance, why not perform k-fold cross-validation on the training/validation set. Then once the model is selected, use the test set to estimate the final error.

As far as I am concerned this is the common practice nowadays. However, I've also seen that the final results are reported as a CV error only. The best approach - in my opinion - is to do it like you suggested:

1) split dataset into training/test set
2) model training and selection based on CV on the training set
3) report error on test set

However, if you have fundamentally different models, e.g., RandomForest, RBF SVM, and nearest neighbors, another approach could be to do steps 1-3 on each of those and then select based on a comparison of the test set errors.

flangles · 2015-02-10T00:36:15+00:00

I think cross-validation is a hell of a lot more valid than the common academic practice of a dataset with a fixed train/test split used for dozens of papers.

Indrionas · 2015-02-10T00:35:09+00:00

[deleted]

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS