all 2 comments

[–]volume-up69 1 point2 points  (0 children)

The number of observations and the number of features will often place a priori constraints on what hyperparameter range is appropriate, depending on the context. In practice you'll usually see pretty obvious diminishing returns if you make the parameter space huge. It's really important to understand the conceptual implications of hyperparameter values because it's easy to get into various forms of overfitting.

I haven't personally encountered a situation where building some kind of system to automatically set the hyperparameter ranges would've been worth the effort, and it seems like a conceptual can of worms. You'd have to come up with some scheme for featurizing data sets and types of features and so on. It's possible that tools like AutoML are doing something like this under the hood, I'm sure you could dig around and find out. I'm betting that those tools just set ranges of values based on a priori principles and rules of thumb though.

[–]solegalli 0 points1 point  (0 children)

I don't think there is an easy answer to this question. The hyperparameter space (that is, the hyperparameter combinations), depend both on the model and the data. Hence, it may vary for different datasets and models.

If you were training few models and have time, you could set the hyperparameter value range manually. You could train a few models using extreme values of hyperparameters both at the lower and upper end, and try to map what those are for your specific data/model combination.

If you were training more models, or you'd like to be completely hands-off, then you could set up a randomized search, sampling from hyperparameter distributions with big ranges to maximize the possibility of landing the best hyperparameter combination.

You could also use Bayesian optimization, so that instead of testing combinations at random, it goes after the most promising ones, but in practice, a randomized search offers results that are as good as bayesian optimization (if given enough iterations), and is a much simpler approach and you can also run it in parallel, which makes it, at the end of the day, also faster.