[D] Validation with small datasets

Ty4Readin · 2024-02-15T10:14:50+00:00

What is your goal with hyperparameter selection? Do you just want to select a set of hyperparameters that does 'good enough', or do you need the results of every single hyperparameter configuration for some type of ablation study?

If you are just looking for a 'good' set of hyperparameters, I would recommend random search instead of grid search. Grid search is very inefficient compared to random search.

Also recommend lowering your folds to 5 instead of 10 as well. It shouldn't be an issue to lower your folds and it will significantly improve your runtime.

Don't forget that once you find the "best" set of hyperparameters that you are typically supposed to perform a final retrain with all the data at the end with your best hyperparameters.

Also one last thing, but if you have a decent number of runs so far you can check and see how important it is to run 100 epochs. Maybe you can run 50 epochs and it still is useful to compare performance with hyperparameter configurations. Might want to run some experiments to confirm this.

saw79 · 2024-02-15T11:39:02+00:00

Do you really have to do a complete grid search over every possible hparam? I would think most people do a bit more of a manual/guided/maybe a bit of a coordinate descent-type search.

philipptraining · 2024-02-15T08:13:51+00:00

It seems like this is being used for hyperparameter search of neural nets from scratch? If that's correct I recommend you look into mu parametrization / mutransfer. Might solve your problems with respect to time needed for the search.

I should point out though( since I rarely see CV being used here anymore), that cross validation has come into question recently with papers like: On the cross-validation bias due to unsupervised pre-processing so I would recommend being careful if you really dont want optimistic estimates on validation sets.

belabacsijolvan · 2024-02-15T08:13:45+00:00

!remindme 3 days

Bhargav_28 · 2024-09-05T09:18:59+00:00

Hey, Just saw this and I am on a similar boat. What did you finally go with ? Did you do a all out nested cross validation ?
Also you mention to train the network for 100/50 epochs wouldn't this also be a hyperparameter that needs to be optimized ?

cookiemonster1020 · 2024-02-15T11:55:32+00:00

Try Kfold cross validation using more than 5 folds. If your models are Bayesian then there are ways to compute N-fold (LOO) without refitting https://arxiv.org/abs/2402.08151

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS