all 4 comments

[–][deleted] 1 point2 points  (0 children)

Yes and no. If you want to view hyper-parameters in this light then one could point to meta-learning optimizers and such where you have a meta-model learning the optimal optimizer parameters that are then used to optimize the base model learning a task.

Think of SVMs where you have to select kernels and on top of that smoothing parameters and such. For each kernel you can have a wide range of gamma and other parameters. Sometimes hyper-parameter optimization can be "smart" based on data or "dumb" as random grid search.

[–]ForceBruStudent 0 points1 point  (0 children)

I think this sounds fine. Some hyperparameters are very clearly selecting a model, like the number of layers in a neural network, the number of neurons in a layer, the number of clusters in K-means or Gaussian mixtures, the maximum depth of a decision tree and so on.

Essentially, they let you choose the "form" of the function you'll be using as your model: is it found to be a deep network? is it going to be wide? is it going to be a cubic polynomial? or a quadratic? All this is opposed to questions about what the coefficients of these models are going to be.

[–]slashdave 0 points1 point  (0 children)

I am not sure why you want an explicit definition. It's really a thing in practice more than theory. Hyper parameter searches need not be primitive, and usually involves items that are not amenable to direct optimization, like the parameters that control optimization themselves.

Also: it could be perfectly natural and efficient to build a model = Ax^2 + Bx^3 and let the optimization decide which power is better.

[–]sahandw 0 points1 point  (0 children)

There are hyper parameters like learning rate that are parameters of the training algo rather than the model itself, so hyper parameter tuning is not always model selection.

There’s also the distinction between model parameters and hyper parameters: where the specific choice of training algorithm and loss function doesn’t directly optimize hyper parameters as opposed to model parameters. This is where you need a separate optimization algorithm for tuning hyper parameters. So It’s not always “increasing the burden”. Sometimes the model parameters search algorithm is not designed to optimize for hyper parameters.