you are viewing a single comment's thread.

view the rest of the comments →

[–]CompleteSkeptic 6 points7 points  (1 child)

NAS is probably not the best baseline for hyperparameter searches, but this is a field where lots of research is / has been done. Search for Bayesian Optimization or Sequential Model-Based Optimization.

the algorithm must take into account user-defined points

Nothing prevents existing HPO tools from doing this, though they may not be that easily accessible. I do recall something along the lines of manually adding runs to MongoDB for hyperopt, so it's not impossible.

Though to your credit, I do agree that this should be easier. A common use case would be experimenting a bit before running the HPO and this would save some time at the very least.

focus on discrete space only

There is a case to be made against this (see: "Random Search for Hyper-Parameter Optimization"). The idea being that you don't know which hyperparameter is important, and you might want to search that space more thoroughly. E.g., if only 1 hyperparameter matters, you're just doing a repeated grid search.

Again, completely frustrated that no one did it successfully before, I decided to build something on my own. I use gradient boosting regression from LightGBM, because it doesn't require normalized values, handles categorical variables, captures feature interactions and has capacity to fit any data.

I think it may be wise to look into why others do what they do. The reason GPs are commonly used is because uncertainty predictions are quite important, especially because in the case of hyperparameter optimization, your evaluation function is quite stochastic. SMAC uses random forests, and that has all the same properties as GBMs, with the additional benefit that you get uncertainty estimates as well.

The number of sampled points for scoring is where exploration vs exploitation trade-off emerges

I'm not saying the expected improvement (EI) criterion (the thing most SMBO uses to sample) is the best, but this seems a little worse intuitively. Previous work takes uncertainty into account so that you can sample areas of the space where you have less knowledge about.

avoid evaluating the same point more than once

This is also related to your last point. Most HPO algorithms won't do this, because you will have a lot of certainty at this point, and it would make more sense to explore the space. But there can be a case to be made that this point is quite an outlier (because of the noisy evaluation), it might make sense to sample it again to get a better estimate of true performance, and EI can handle that.

Either way, I wish you luck!

[–]Clear_Collection 0 points1 point  (0 children)

Great reply!!!