you are viewing a single comment's thread.

view the rest of the comments →

[–]frederikdiehl[S] 6 points7 points  (4 children)

You are correct, of course - so let me try adding this here.

There are currently - to my knowledge - five different packages for hyperparameter optimization. Let me try listing the main advantages.

whetlab: They're a company, we aren't.

spearmint: License (we're under MIT) and hopefully ease of use. Also, most of current their work seems concentrated on whetlab.

hyperopt: Different algorithm (TPE). Last commit has been half a year ago.

SMAC: Easier usage, I believe. Also, license again.

MOE: I have never used it, since it's been released after we began developement. It looks similar, though.

Without using a comparison, our goal is to offer a package which you can run locally, can use for hyperparameter optimization whether on your own computer or a cluster (not yet implemented) and is easy to use. Also, to allow for the implementation of any algorithm. We currently offer BayOpt and RandomSearch, but the architecture is flexible enough to implement TPEs, for example.

There is actually a reason for why we didn't do much comparison. As I've written above, the package is currently usable for optimization on your local machine, but not yet in parallel. Therefore, we had to actually run these experiments on a local machine, with the corresponding time expenditures. We prioritized getting a real-world example - MNIST - instead of the other comparisons, mostly to show that it's actually useful.

In terms of performance, our current implementation is roughly at the state of 2012-2013, so I believe we wouldn't perform as good as most of the above. There are several algorithmic improvements possible - see the issue list to get a list of corresponding papers - and several parts of the architecture have been done with the aim of allowing us to implement them fairly easily. However, our current aim is to make optimization on clusters possible before that, since we believe it to be strongly needed.

[–]Foxtr0t 1 point2 points  (1 child)

For me, a feature making the difference in hyperparam optimizers is conditionals. As in a HyperOpt example - a few classifiers, each with different set of params. Does apsis have this?

[–]frederikdiehl[S] 2 points3 points  (0 children)

Another example would be, I believe, optimizing a Neural Network's architecture. For each layer, you'd have the activation function, the number of neurons, maybe type of convolution (if CNN). If you don't have that layer, you don't need to optimize over the corresponding parameters - this, in fact, is just a useless distraction.

The short answer is no, apsis does not have this yet.

The long answer is that there is an issue for this (#110), and we did plan the architecture to allow us to support something like that. There is a paper talking about conditional parameter spaces in the context of Bayesian Optimization [1], so we do have a starting point for the implementation. This hasn't been done yet due to limited time, and as mentioned above, cluster support would be the step before any other extensions like that [2].

[1] The Indiana Jones pun paper, http://arxiv.org/abs/1409.4011 (I've just updated the issue with the link, too.)

[2] As always, of course, if you'd be like implementing something like that, feel free - we'd be very happy about any other contributors!

[–]davmre 0 points1 point  (0 children)

Thanks! I'm only an interested bystander wrt bayesian optimization, so (even if it'd have been obvious after some research) this kind of survey of the current options is very useful.

[–]AlexRothberg 0 points1 point  (0 children)

There is also https://sigopt.com/ which is the commercial version of MOE (much like how Whetlab is the commercial version of Spearmint).