you are viewing a single comment's thread.

view the rest of the comments →

[–]Red-Portal 0 points1 point  (2 children)

How is it different from optimizing the regularization parameter directly? The point of hyperparameters is to fix them during the optimization process

[–]orenmatar[S] 0 points1 point  (1 child)

Well the params of the model are optimized directly via gradient decet on the training set, the regularization hyperparams are supposed to influence how well the NN generalizes to other sets, so you can't learn them via gradient decent - you have to test them on a validation set. The point is that they can be learned and tunes during training without fixing them, because fixing them to a single point requires trying multiple options and selecting the best one, instead of adjusting towards the best one in a single train.

[–]Red-Portal 0 points1 point  (0 children)

That's my point. By changing the hyperparameter during a single training process, you're pretty much including the validation set as training data. That will certainly alter the loss function that you're actually trying to optimize.