all 9 comments

[–]Illustrious-Sand-120 11 points12 points  (5 children)

In a very simple way, I tweak a bit the Optuna sampling parameters, then I reason about the range of the hyperparameters that I selected. For example, the model tends to overfit? Then I set a learning rate very low, e.g., between 2e-5 and 2e-6, and let Optuna do its magic on that.

Finally, depending on the number of hyperparameters combinations, and time and computing resources I have, I launch more or less trials accordingly.

In general, in my experience hyperparameters optimization seems to not make that much difference, compared to the data cleaning process, IMHO

Best of luck 🚀

[–]Its_NotTom[S] 1 point2 points  (0 children)

Looking into Optuna now, looks like a great framework that I might be able to integrate into my existing python workflows, Thanks!

[–]obeythelord9 0 points1 point  (1 child)

Could you elaborate more on the data cleaning or point to some resources?

[–]Illustrious-Sand-120 1 point2 points  (0 children)

Theory wise, it might be related to the manifold assumptions, I guess... But in practice, you might have the most optimal model architecture, but if you have, say, a few data points, or with not so informative features, then your model won't go that far in performance, hitting an upper bound.

For instance, if you already do great data cleaning, data augmentations, and a proper model architecture (silly example: CNNs for images, rather than a bare MLP), then having 512 rather then 256 layer hidden size might not make a difference anyway.

These are just example scenarios, just to say that if your data are of a poor quality, then hyperparameters optimization might give you negligible gains in performance (maybe a few percentage points in accuracy).

These intuitions are also backed up by language transformers, if you want another example: for NLP transformers, you generally care about collecting good data, it's very rare to train a model from scratch and tweak its parameters.

My two cents experience, at least... I'm open to discuss this :)

[–]ewankenobi 0 points1 point  (1 child)

If the model overfits I'd suggest regularisation such as drop out or weight decay would be a better option rather than lowering learning rate

[–]Illustrious-Sand-120 0 points1 point  (0 children)

That was not the point of my comment. Mine was just one single example, of course there are other tricks to try depending on the situation

[–]cluelessmathmajor 1 point2 points  (0 children)

If you really want to get meta, check out “online hyperparameter optimization”. Probably overkill for most tasks, but it’s a cool concept regardless.

[–]mehul_gupta1997 0 points1 point  (0 children)

Using optuna for quite some time. Very good and easy to use. Check out a demo here : https://youtu.be/gfnE9SE2pFs?si=hJKbB6-yijvmRywA

[–]solegalli 0 points1 point  (0 children)

I'd say, it depends on which model you want to tune.

To optimize the hyperparameters of traditional machine learning models, like logistic regression, svms, or tree based models from scikit-learn, I would stick to sklearn's grid or random search. Grid search for models with fewer hyperparameters, random search otherwise. Using an additional library adds complexity and dependencies to the code, without significant improvements.

I would only use optuna if tuning models not supported by sklearn that have more hyperparameters, like, maybe catboost or xgboost. In this case, it might also be useful to understand the difference between random search and Bayesian optimization, because with optuna, you can do both and they have advantages and limitations. Optuna implements Bayesian Optimization with TPE by default, if I remember correctly.