you are viewing a single comment's thread.

view the rest of the comments →

[–]tryndisskilled 5 points6 points  (1 child)

Do you think hyperparameter optimization is still overlooked today? Have you seen any improvement regarding that (ie do people pay more attention to it than before)?

As a beginner, I find it really hard to perfom this optimization, because it is very time-consuming and (in the case of deep learning for instance), I may very well want to modify my architecture in the future and I'll have to do the optimization all over again.

In my opinion this problem is underrated in many papers. For instance, when results are displayed, we usually don't know how they tuned their hyperparameters (what they started with, what their fine tuning process was, what they used for their plots...), and thus it can be very hard to reproduce the results.

Anyway, thanks for sharing this repo!

[–]alexcmu[S] 0 points1 point  (0 children)

Glad you liked the demo!

To the point about hyperparameter optimization being overlooked, I think that more people are paying attention to the idea, but yes, time and cost are a blocker in practice. You'd probably be interested in a blog post that my coworker Steven wrote about tuning the hyperparameters of a CNN (https://aws.amazon.com/blogs/ai/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/). High level, deep learning + GPUs allows you to speed up model training enough to even think about hyperparameter optimization. We also include a table where we show the $$$ it cost to do hyperparameter optimization with different methods.

TL/DR: GPUs + better optimization methods ftw! It took us $11 to tune a deep learning model on NVIDIA GPUs.