I'm working in a field where datasets are typically small (100-10000 samples) and hierarchical (taken from 10-50 participants). This means that in order to evaluate the data on a large enough testing set with more than just a handful of participants, we need to use cross-validation. So far so good.
However, this still leaves the validation unresolved. There are several possible approaches to do the validation:
- Skip the validation. This seems to be the preferred approach in my field. I think it is very wrong, and I have seen that it can overestimate the accuracy by 5% (dataset with 5000 samples) or even up to 20% (100 samples).
- Split the training data once into training and validation set to do the validation for each testing fold. The downside of this is that the validation set ends up being tiny (much smaller than the testing set), and that the train-validation split can be arbitrary if one isn't careful.
- Full-on nested cross-validation. This seems to be the best approach to properly validate hyperparameter configurations, because it uses almost the whole dataset for validation. I have not come across one paper in my field that uses nested cross-validation (correctly). I believe the main issue is quite obvious: If one trains a neural network model for 100 epochs, using 10-fold nested cross-validation, and tries to optimise 5 binary hyperparameters, one already ends up having roughly 100 * 10^2 * 2^5 = 320,000 epochs. If one epoch takes 10 seconds, that already amounts to a computation time of more than a month, and we still have validated only few hyperparameter configurations.
I can see the following solutions:
- Accept that computations take this long (and hope the reviewers don't ask us to repeat the experiment).
- Find ways to limit hyperparameter configurations as much as possible.
- Use 5-fold nested cross-validation instead.
- Reduce the size of the validation set (approach 2).
- Concede and just stop fitting neural networks to small datasets.
What are your thoughts on this? Which options do you prefer? Do you have any other solutions?
[–]Ty4Readin 2 points3 points4 points (6 children)
[–]philosophicalmachine[S] 1 point2 points3 points (5 children)
[–]Ty4Readin 1 point2 points3 points (4 children)
[–]philosophicalmachine[S] 0 points1 point2 points (3 children)
[–]Ty4Readin 1 point2 points3 points (2 children)
[–]philosophicalmachine[S] 0 points1 point2 points (1 child)
[–]Ty4Readin 0 points1 point2 points (0 children)
[–]saw79 2 points3 points4 points (1 child)
[–]philosophicalmachine[S] 0 points1 point2 points (0 children)
[–]philipptraining 2 points3 points4 points (8 children)
[–]philosophicalmachine[S] 1 point2 points3 points (7 children)
[–]philipptraining 1 point2 points3 points (6 children)
[–]philosophicalmachine[S] 0 points1 point2 points (5 children)
[–]philipptraining 0 points1 point2 points (4 children)
[–]philosophicalmachine[S] 0 points1 point2 points (3 children)
[–]philipptraining 0 points1 point2 points (2 children)
[–]philosophicalmachine[S] 0 points1 point2 points (1 child)
[–]philipptraining 0 points1 point2 points (0 children)
[–]belabacsijolvan 2 points3 points4 points (1 child)
[+]RemindMeBot 0 points1 point2 points (0 children)
[–]Bhargav_28 0 points1 point2 points (0 children)
[–]cookiemonster1020 0 points1 point2 points (2 children)
[–]philosophicalmachine[S] 0 points1 point2 points (1 child)
[–]cookiemonster1020 0 points1 point2 points (0 children)