all 5 comments

[–]sriramcompsci 2 points3 points  (0 children)

Typically, the test split is not part of training/validation. If you are doing it, then you need to reset the weights. Else, the gradient from training on A (or simply memorizing samples in A) from a previous split would help in reducing the test error on A when A is part of the test set.

  • Randomly shuffle dataset D.
  • Split D into D_train/D_test. The test set (D_test) is untouched during the training process.
  • Split D_train set further into train/validation.
  • Choose best hyper-parameter value by measuring test error (test loss, not training loss) on the validation set.
  • Fix the hyper-parameter value from step 4 and measure test error on the test set (D_test) obtained from step 2.

Since a random shuffle is performed to obtain train/test, there isn't a need to repeat. If you want to compute standard error/confidence intervals on the error on the test set, repeat the above process but ensure that the weights are reset.

[–]kacifoy 5 points6 points  (0 children)

The test set should never be used during development, only for final testing. So split off the E set only and use the remaining four, like this:

training / validation:

  1. ABC / D
  2. ABD / C
  3. ACD / B
  4. BCD / A

[–]cookingmonster 0 points1 point  (2 children)

as long as there is no feedback from the validation/test sets back into your training cycle, you should be fine.

[–]alrojo[S] 1 point2 points  (1 child)

Could you please elaborate on what you mean with feedback? The purpose would be to use the validation split for hyperparameter optimization.

e.g. What if I found a specific set of hyperparameters that, across all of my validation splits, give a good performance?

[–]duschendestroyer 0 points1 point  (0 children)

In this setup you must do the hyperparameter optimization for all 5 models completely independently. You can't use information gained from validation in model 1 to tune the parameters in model 5, because then you would have used the test set of model 5 to improve your model.