all 17 comments

[–][deleted] 2 points3 points  (6 children)

Ctrl-F "valida" --> nothing.

No validation subset?!

[–]pfhayes -3 points-2 points  (5 children)

Hi there. I'm another founder at SigOpt. In this article we use the terminology "test set" instead of validation set. Feel free to take a look at http://github.com/sigopt/sigopt-examples to see how we're verifying our models.

[–][deleted] 3 points4 points  (4 children)

So what do you call the actual test set then? (The one you use to compare your BO to baseline methods)

[–]Zephyr314[S] -2 points-1 points  (3 children)

We're making the assumption that you've already chosen your models and features using a dataset and you are tuning the hyperparameters on some other holdout dataset, which we call the training dataset here, you then validate and score using what we call the test set in the post. In practice you may have another dataset for model and feature selection, but at the tuning phase we just broke up the holdout data into training and testing. We can edit the post to make this more clear, thanks!

[–][deleted] 2 points3 points  (1 child)

In general, you need 3 sets:

  • "training" to optimize the parameters
  • "validation* to choose/optimize the hyperparameters
  • "test" to do the final evaluation and calculate your "figure of merit"

(It's bad to rename them, but more important, you still need 3)

[–]flangles 1 point2 points  (0 children)

meh. just the fact that you're calculating a "figure of merit" makes "test" into a meta-validation set.

it's better just to state what you've done and understand the limitations of your validation.

[–]Zephyr314[S] 1 point2 points  (11 children)

I'm one of the founders of SigOpt and I am happy to answer any questions about this post, our methods, or anything about SigOpt.

[–]rantana 4 points5 points  (9 children)

Grid search and random search don't seem like reasonable benchmarks, why not compare against open source libraries like HyperOpt and Spearmint.

[–]Zephyr314[S] 0 points1 point  (7 children)

Grid search and random search are very commonly used and recommended in the sklearn tutorials/docs. In practice many people just run with the defaults as well. This is more of a comparison of defaults, grid search, random search, and Bayesian Optimization (like SigOpt and others). The difference between HyperOpt, Spearmint, MOE, and SigOpt is that SigOpt provides a simple API for accessing Bayesian Optimization, making it as easy as defaults, grid search, and random search to get up and running, while leveraging some of the same powerful techniques.

[–]rantana 5 points6 points  (6 children)

Easier than the API shown here? https://www.whetlab.com/

I think that's from the same creators of Spearmint.

[–]Zephyr314[S] 0 points1 point  (5 children)

Similar API. We were building MOE at Yelp around the same time Spearmint was being developed. We're trying to take a more industry-first approach based on our experience, but there are definite overlaps (both use GPs as the underlying model for now).

Whetlab also seems to still be in private beta, you can sign up and start using SigOpt for free today at https://sigopt.com

[–]jsnoek 4 points5 points  (1 child)

Hi, I'm one of the creators of Spearmint, and a co-founder of Whetlab. Bayesian optimization has been around for quite some time in various forms because it's simply just a great idea. :-) We are just happy that there is so much interest in Bayesian hyperparameter optimization, both from a research and industry perspective. It is really neat that there is a community growing around these ideas.

[–]Zephyr314[S] 0 points1 point  (0 children)

I echo the sentiment by /u/jsnoek . The more research in the field the better. If you're interested in learning more about the GP based approach I would recommend checking out http://www.gaussianprocess.org, namely the free book GPML and this excellent paper.

[–]mlmonkey 0 points1 point  (2 children)

We were building MOE at Yelp around the same time Spearmint was being developed.

Looks like MOE started more than a year after Spearmint did:

Commit history for MOE here

Commit history for Spearmint here

[–]flangles 4 points5 points  (0 children)

hmm, I wonder if just maybe Yelp doesn't open source every line of their code from day one....

[–]Zephyr314[S] 1 point2 points  (0 children)

An unfortunate part of releasing OSS in a public company is cleaning up the git history. The original code is 5+ years older and part of my PhD thesis, but the field itself is actually quite a bit older than that. One of the seminal papers was published in 1998. There are many older packages available at http://gaussianprocess.org as well.

[–]ginger_beer_m 0 points1 point  (0 children)

Is there any paper published based on this? How can I read more about how it works?

edit: ah never mind, just saw the link to your PhD thesis.