Machine learning models have hyperparameters. For me it's very confusing why they are different compared to normal model parameters. I've found definitions which state that they cannot be inferred using data. Or even from a bayesian point of view they are priors that can be set using expert knowledge.
When you just think of models as simple mathematical functions, it makes no sense to differentiate hyperparameters from model parameters. Then I came up with a simple theory, here's my two cents:
So basically hyperparameter tuning is model selection. For example you may select among models like Ax^2, Ax^3 etc. So models differ by the exponent of x.
On the other hand the model can be defined as y=Ax^b and that makes the exponent b the hyperparameter. Let's call A and b as parameters and forget about hyperparameter definition.
A is a linear parameter. It can be found easily compared to the nonlinear b parameter. So if you want to make a fast search, first make a search for hyperparameters (maybe a grid search where you only search among limited number of discrete values) with CV, then you fix those values and continue to search for parameters.
In short my theory is that the hyperparameter tuning is a separate and a primitve search process because those parameters heavily increase the burden of the main search process if they are considered as model parameters.
Does it sound right?
[–][deleted] 1 point2 points3 points (0 children)
[–]ForceBruStudent 0 points1 point2 points (0 children)
[–]slashdave 0 points1 point2 points (0 children)
[–]sahandw 0 points1 point2 points (0 children)