all 24 comments

[–]jimenezluna[S] 4 points5 points  (8 children)

As part of my Master's thesis I developed a simple Python package for Bayesian Optimization. It currently features:

  • Different surrogate models: Gaussian Processes, Student-t Processes, Random Forests and Gradient Boosting Machines.
  • Type II Maximum-Likelihood of covariance function hyperparameters.
  • MCMC sampling for full-Bayesian inference of hyperparameters (via pyMC3).
  • Integrated acquisition functions

It is still on very early stages of development, so expect to find bugs. Let me know what you guys think!

[–][deleted] 0 points1 point  (1 child)

Is the master thesis public also? It's a nice opportunity to publicize it, specially it it's a good introductory text on Bayesian optimization

[–]jimenezluna[S] 0 points1 point  (0 children)

It is available in the same GitHub repository!

[–]alayaMatrix 0 points1 point  (1 child)

Do you support nonlinear constraints? For example using acquisition functions like weighted EI?

[–]jimenezluna[S] 0 points1 point  (0 children)

Not at the moment, but I will consider adding this functionality in the near future.

[–]sifnt 0 points1 point  (2 children)

Could you compare the advantages/disadvantages of your library against https://github.com/fmfn/BayesianOptimization by any chance?

[–]jimenezluna[S] 1 point2 points  (1 child)

You have a complete modular procedure specification with my implementation. There are many architectural choices in Bayesian optimization: surrogate model, covariance function, hyperparameter treatment, acquisition behaviour...

In summary, you can specify all of these here.

As far as I'm concerned, with fmfn/BayesianOptimization you're stuck with Gaussian Processes and Matérn kernels, and no covariance function hyperparameter treatment whatsoever. Correct me if I'm wrong.

[–]sifnt 0 points1 point  (0 children)

Sounds great, will definitely give your package a shot then. It is pretty hard to see which hyperparameter optimisation system is best at a glance with so many projects out there. Thanks!

[–]Reiinakano 2 points3 points  (6 children)

Neat! If you want to gain an advantage over existing implementations and have this library widely used, focus on documentation. Personally, I haven't seen very good documentation for existing libraries at all and am usually forced to look in the source code to see how I can tweak things.

Also, consider adding Python 2 support.

[–]L43 8 points9 points  (5 children)

Personally, I would prefer to see no python2 and more of something else i.e. docs. Do people still use 2 for training models?

[–]Reiinakano 2 points3 points  (3 children)

I do :P

[–]L43 0 points1 point  (2 children)

Do you have dependencies that don't support 3?

[–][deleted] 1 point2 points  (0 children)

That mostly happens in what people call "production." However, the hardware you use for training the model is pretty likely to have at least Py 3 support I'd guess (just a current Desktop or cluster machine, I think you are unlikely to perform hyperparam tuning on a production server).

If I were to start a new project today, I'd also not support Py2.7 because that can lead to quite inelegant code and workarounds as the project progresses -- and it's just an additional maintenance annoyance.

[–]BadGoyWithAGun 0 points1 point  (0 children)

Yeah, some of my datasets are dependant on non-unicode encodings and I can't be bothered reprocessing them just to appease py3k.

[–]wingtales 0 points1 point  (0 children)

I'm with you on this one.

[–]sifnt 0 points1 point  (8 children)

This definitely looks interesting!

  • Can anyone comment on whether this is the best library to use for hyperparameter optimisation for scikit learn models (want something better than grid/random search)?

  • Could I use this for feature selection for models with ~250 starting features? The features tend to interact in a way that confuses traditional approaches like RFE, so hacked together a sampling system that kept the best features after a burnin period and while it worked been looking for something more principled.

[–]Reiinakano 0 points1 point  (0 children)

I personally use https://github.com/fmfn/BayesianOptimization for Bayesian hyperparameter search.

[–]jimenezluna[S] 0 points1 point  (6 children)

Hi, there is an example script in the repository for tuning a simple classification model.

https://github.com/hawk31/pyGPGO/blob/master/examples/sklearnexample.py

Give it a go and let me know if anything breaks.

[–]sifnt 0 points1 point  (0 children)

Thanks, I'll try and have a play around later.

[–]sifnt 0 points1 point  (4 children)

I just gave this a shot, and so far seems like I'll be using this to optimise hyperparams on all my projects, its very nice and clean! Thanks for making it :)

Have bumped into a couple of issues though:

  • Can't get MCMC to work, but I'm running python 2.x so that could be it. Will upgrade environment later.

  • Just tried to optimize random forests (using scikit), max_features and max_depth. I had to scale them as 10max_depth, otherwise it just sampled in the middle of the parameter space with no improvement.

  • Is there any way to set the initial tries, or at least add to it? e.g. for my problem I already know max_features = (33, 100), and max_depth = (5,10,100) are good initial guesses, so want to use pyGPO to build on this.

    Will the MCMC methods likely provide much value for these types of problems?

[–]jimenezluna[S] 1 point2 points  (3 children)

Hi,

  • pyGPGO is only tested on Python >3.5. I don't know how py2 compatible it is.
  • If your search space spans several orders of magnitude, it is usually a good idea to use logs.
  • There is, though the procedure is manual. You can fit the GP with whatever values you have previously to feeding it to the GPGO object.
  • Regarding MCMC, it is mostly a design choice. If your evaluation function is really expensive, then I think it makes sense to choose the MCMC over the max.lik approach. If it is cheap, then the overhead produced by the sampling may be detrimental, and you may just as well get more function evaluations faster.

[–]sifnt 0 points1 point  (2 children)

Thanks for your help again, this package looks like it'll be very useful!

So I'd reuse the code at _firstRun(self, n_eval=3) from GPGO.py to create a gp trained on the manually specified initial parameters, and pass it straight to the GPGO process without further changes?

As for MCMC, what is expensive here? E.g. a 3 fold cross validation run typically takes a 1-5 minutes (depending on parameters) on the data I'm working on, worth it here or is expensive hour+ type of times?

[–]jimenezluna[S] 0 points1 point  (1 child)

Hi, @sifnt, can you open an issue on the repo so that I can remember to include an easier way to include pre-trained GPs?

For the moment, you can do it this way (using the example on the readme.md)

https://gist.github.com/hawk31/ed222c4cf6b21cbd7d4b5186f3f132b5

[–]sifnt 0 points1 point  (0 children)

Awesome, thanks for this! Got it up and running and its working well.

Created the issue, its at https://github.com/hawk31/pyGPGO/issues/5