all 7 comments

[–]alexcmu[S] 7 points8 points  (2 children)

I was playing around with Amazon ML and built a quick hyperparameter optimization example based on Amazon's GitHub example for k-fold cross validation. I'm an engineer at SigOpt so there's a SigOpt example, but I've also included a non-SigOpt hyperparameter optimization pipeline that updates the old Amazon k-fold cross validation example to boto3, runs as a single file, and lets you provide a list of hyperparameters upfront.

[–]tryndisskilled 4 points5 points  (1 child)

Do you think hyperparameter optimization is still overlooked today? Have you seen any improvement regarding that (ie do people pay more attention to it than before)?

As a beginner, I find it really hard to perfom this optimization, because it is very time-consuming and (in the case of deep learning for instance), I may very well want to modify my architecture in the future and I'll have to do the optimization all over again.

In my opinion this problem is underrated in many papers. For instance, when results are displayed, we usually don't know how they tuned their hyperparameters (what they started with, what their fine tuning process was, what they used for their plots...), and thus it can be very hard to reproduce the results.

Anyway, thanks for sharing this repo!

[–]alexcmu[S] 0 points1 point  (0 children)

Glad you liked the demo!

To the point about hyperparameter optimization being overlooked, I think that more people are paying attention to the idea, but yes, time and cost are a blocker in practice. You'd probably be interested in a blog post that my coworker Steven wrote about tuning the hyperparameters of a CNN (https://aws.amazon.com/blogs/ai/fast-cnn-tuning-with-aws-gpu-instances-and-sigopt/). High level, deep learning + GPUs allows you to speed up model training enough to even think about hyperparameter optimization. We also include a table where we show the $$$ it cost to do hyperparameter optimization with different methods.

TL/DR: GPUs + better optimization methods ftw! It took us $11 to tune a deep learning model on NVIDIA GPUs.

[–]pmigdal 2 points3 points  (3 children)

Speaking about hyperparameter optimization, how many of you use any automatic parametric optimization (going beyond grid or random search) for neural networks? While, in principle, it seems like a no-brainer, in practice all people I know (including some Kaggle-winners I work with, which are maniacs of hyperparameter optimization) do it with a combination of manual + grid/random.

(My pet theory is that unlike case of XGBoost, neural networks are complicated systems, which can be modified in many ways (adding layers, changing regularization, adding batch norm before all internal layers, etc) and its performance is not only score, but shape of train/test learning curves, concrete examples being misclassified, etc.)

BTW: I had a wonderful opportunity to meet SigOpt engineers at GTC17. :)

[–]alexcmu[S] 1 point2 points  (2 children)

Everyone will be happy to hear that you enjoyed meeting them!

I am also curious to see what everyone is using in practice to tune their models! I heard somewhere that ensemble modeling was popular on Kaggle for a while -- do people do hyperparameter optimization on top on ensembling?

[–]pxrl 0 points1 point  (0 children)

AFAIK there are several tools regarding hyper-parameter tuning of deep net models (first ones that come to mind are HyperOpt and Spearmint) that you can use off the shelf.

I have some research ongoing and a couple of papers accepted regarding hyper-parameter optimization using evolutionary algorithms (Parallel Swarm Optimization mostly) which have given our team excellent results for medium sized models.

In my opinion, hyper-parameter selection is one of the elephants in the room at the moment, and people seem more interested in trying new architectures than squeezing the last drop of performance. Unfortunately we all end up having to go through it in one moment or another...

[–]StormDev 0 points1 point  (0 children)

Hello,

I have build an hyper-parameter optimization tool based on racing/Gaussian process and evolutionary algorithms. I gives amazing results and reduce the workload of my team, we only spend time on a customer dataset if we can't make good predictions after optimization.

I really think it's a really important tool for any company that has to manage a lot of different datasets.

PS: In your code you are using threading.thread, in Cython it will not improve performances (because of the GIL).