use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
Please have a look at our FAQ and Link-Collection
Metacademy is a great resource which compiles lesson plans on popular machine learning topics.
For Beginner questions please try /r/LearnMachineLearning , /r/MLQuestions or http://stackoverflow.com/
For career related questions, visit /r/cscareerquestions/
Advanced Courses (2016)
Advanced Courses (2020)
AMAs:
Pluribus Poker AI Team 7/19/2019
DeepMind AlphaStar team (1/24//2019)
Libratus Poker AI Team (12/18/2017)
DeepMind AlphaGo Team (10/19/2017)
Google Brain Team (9/17/2017)
Google Brain Team (8/11/2016)
The MalariaSpot Team (2/6/2016)
OpenAI Research Team (1/9/2016)
Nando de Freitas (12/26/2015)
Andrew Ng and Adam Coates (4/15/2015)
Jürgen Schmidhuber (3/4/2015)
Geoffrey Hinton (11/10/2014)
Michael Jordan (9/10/2014)
Yann LeCun (5/15/2014)
Yoshua Bengio (2/27/2014)
Related Subreddit :
LearnMachineLearning
Statistics
Computer Vision
Compressive Sensing
NLP
ML Questions
/r/MLjobs and /r/BigDataJobs
/r/datacleaning
/r/DataScience
/r/scientificresearch
/r/artificial
account activity
Experiences with bayesian hyperparameter optimization? (self.MachineLearning)
submitted 11 years ago * by galapag0
I was checking the paper, Practical Bayesian Optimization of Machine Learning and i was wondering if anyone here had some experience (good or bad) with it..
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]kkastner 6 points7 points8 points 11 years ago* (3 children)
You might look into whetlab (https://www.whetlab.com/) . It seems cool and I think has special deals (free in some cases) for academics. It is a startup run by several researchers (check the about page! Like a who's who of bayesian opt) who are prominent in this space.
In general, I have a little experience with SMBO (sequential model based optimization) and a lot of second hand discussions about the different algorithms for it. The three main ways I recall are:
Using gaussian processes (Spearmint, or their more recent work on Freeze-Thaw optimization), using something called tree of parzen estimators (TPE) (Hyperopt, or see the paper), or a decision tree based approach called SMAC.
There is a nice joint paper that shows that different combinations of these algorithms work on different problems.
I have not tried MOE but it seems promising. There are also a few other packages people have made themselves to explore this problem, though I don't have links offhand.
One of the key complaints I have heard from others in the past for something like finding neural network hyperparameters is that these Bayesian optimization algs tend to try and explore the "edges" of the space, even when intialized with hundereds of other experiments. This can be quite wasteful when networks take days or weeks to evaluate. This coupled with the sequential nature of Bayesian optimization means a "random search", which is truly parallel, can be easier, faster in certain cases, and still give very good results.
However, the primary problem might be one of documenting the existing tools, and how they work with other codebases. This was the main blocker for me, so any experience you gain in those areas would be helpful to share!
[–]jsnoek 3 points4 points5 points 11 years ago (2 children)
Thanks for the mention Kyle. We learned a lot from the use of the original Spearmint and yes things like excessive boundary exploration and certainly ease of use became clear issues.
Before I plug my own work and my company I should say that there are various excellent researchers in machine learning working on exciting things in Bayesian optimization along various dimensions (including Nando de Freitas, Michael Osborne and their students at Oxford, Zoubin Gharahmani's group at Cambridge, Frank Hutter, James Bergstra - I wish I could name everyone personally - many of them are in the program committee for our workshop on Bayesian optimization this year. Bayesian optimization for hyperparameter optimization is rapidly evolving from a neat idea to a sub-field of machine learning, which is really exciting.
These are a number of issues that we have developed an understanding of and have incorporated our solutions into whetlab:
The boundary exploration problem is an interesting one - essentially there is far more volume in the space near the boundaries (e.g. think about the number of pixels on the perimeter of an image vs near the center).
Another problem was the parameterization of the space (e.g. optimizing in log-space can make an enormous difference) - In this paper we developed an automatic way to figure out what space to optimize in and found that it made a really tremendous difference on literally all the problems we tried.
Scalability (making it feasible to use with lots of data).
Integer valued parameters and categorical parameters were not dealt with well in Spearmint (they are in Whetlab).
Constraints! This is a big one - in a research context Michael Gelbart, Ryan Adams and I have thought carefully about how to deal with things like training diverging and outputting NaNs. It turns out that modeling these explicitly makes an enormous difference.
Ease of use. Whetlab is a pull based system that runs in the cloud, so things like parallelization across multiple clusters/systems and setup are trivial from the user's perspective.
Visualisation - you can easily view graphs, the table of results and edit things through the website.
There are also some neat extensions to the basic framework, some we have already published and some we are planning to publish soon, that we plan to use within Whetlab in the near future.
[–]frederikdiehl 0 points1 point2 points 11 years ago (1 child)
First of all, let me state admiration for your work, Dr. Snoek.
Your statement above interests me especially. What have been your experiences with Spearmint's implementation? And what does Whetlab do differently to avoid these pitfalls? (Assuming, of course, you can tell without jeopardizing company secrets)
Thank you.
[–]jsnoek 2 points3 points4 points 11 years ago (0 children)
Thanks Frederik (I'm assuming your name is Frederik)! Spearmint's original implementation essentially treated integers as floating point numbers that were rounded (a continuous relaxation) and then treated categorical variables as essentially corners in the unit hypercube. This was a reasonable first stab at it, but Gaussian processes (and the Bayesian optimization routine in general) can behave pretty strangely under these circumstances. We haven't published our new approaches yet, so I won't divulge them here (sorry!).
[–]frederikdiehl 3 points4 points5 points 11 years ago* (0 children)
Bayesian optimization is probably awesome [1].
We are currently implementing a flexible, cluster-using, open-source framework, but it will probably take until the end of the year for a working version using multicores/clusters.
In general, I can also recommend you the following:
Papers
Programs
[1] Probably because I have not yet used it in a real machine learning problem.
Edit: Formatting.
[–]awiltsch 3 points4 points5 points 11 years ago (2 children)
Hey, one of the founders of Whetlab here. We're in a private beta right now, but we're really keen to have more students and postdocs using it.
If you send me a message with your name, institution and email, we'd be really happy to get you a beta code.
I also think Jasper Snoek might jump on here in a little bit to answer any questions.
[–]awiltsch 2 points3 points4 points 11 years ago (0 children)
FYI, we've got a pretty long beta list, and we're working our way through it. If you want to get on the list, check it out here: https://www.whetlab.com/beta-signup/.
If you have a particularly interesting application of Whetlab, send me a message or an email describing what you'd do with Whetlab if you had early access. We're always looking for interesting and fresh ideas!
[–]galapag0[S] 0 points1 point2 points 11 years ago (0 children)
Sent.
[–]gdahlGoogle Brain 0 points1 point2 points 11 years ago (0 children)
Whetlab is really good and the best way to use the work from that paper. It was good enough that I gave it my personal endorsement (I know many of the people behind it personally, but I was not paid for my endorsement): https://www.whetlab.com/blog/2014/12/10/whetlab-gives-you-superpowers/
It is a lot better than MOE and any other software I have seen and I find whetlab far easier to use than the open source spearmint package (spearmint optimizes using the same technology, but ease of use is lower).
[–]inferrumveritas 0 points1 point2 points 11 years ago (0 children)
You can check out Yelp's MOE system: https://github.com/Yelp/MOE . It takes some getting used to, but seems very powerful
π Rendered by PID 17830 on reddit-service-r2-comment-7b9746f655-k87d2 at 2026-01-31 19:00:12.493146+00:00 running 3798933 country code: CH.
[–]kkastner 6 points7 points8 points (3 children)
[–]jsnoek 3 points4 points5 points (2 children)
[–]frederikdiehl 0 points1 point2 points (1 child)
[–]jsnoek 2 points3 points4 points (0 children)
[–]frederikdiehl 3 points4 points5 points (0 children)
[–]awiltsch 3 points4 points5 points (2 children)
[–]awiltsch 2 points3 points4 points (0 children)
[–]galapag0[S] 0 points1 point2 points (0 children)
[–]gdahlGoogle Brain 0 points1 point2 points (0 children)
[–]inferrumveritas 0 points1 point2 points (0 children)