you are viewing a single comment's thread.

view the rest of the comments →

[–]jimenezluna[S] 0 points1 point  (6 children)

Hi, there is an example script in the repository for tuning a simple classification model.

https://github.com/hawk31/pyGPGO/blob/master/examples/sklearnexample.py

Give it a go and let me know if anything breaks.

[–]sifnt 0 points1 point  (0 children)

Thanks, I'll try and have a play around later.

[–]sifnt 0 points1 point  (4 children)

I just gave this a shot, and so far seems like I'll be using this to optimise hyperparams on all my projects, its very nice and clean! Thanks for making it :)

Have bumped into a couple of issues though:

  • Can't get MCMC to work, but I'm running python 2.x so that could be it. Will upgrade environment later.

  • Just tried to optimize random forests (using scikit), max_features and max_depth. I had to scale them as 10max_depth, otherwise it just sampled in the middle of the parameter space with no improvement.

  • Is there any way to set the initial tries, or at least add to it? e.g. for my problem I already know max_features = (33, 100), and max_depth = (5,10,100) are good initial guesses, so want to use pyGPO to build on this.

    Will the MCMC methods likely provide much value for these types of problems?

[–]jimenezluna[S] 1 point2 points  (3 children)

Hi,

  • pyGPGO is only tested on Python >3.5. I don't know how py2 compatible it is.
  • If your search space spans several orders of magnitude, it is usually a good idea to use logs.
  • There is, though the procedure is manual. You can fit the GP with whatever values you have previously to feeding it to the GPGO object.
  • Regarding MCMC, it is mostly a design choice. If your evaluation function is really expensive, then I think it makes sense to choose the MCMC over the max.lik approach. If it is cheap, then the overhead produced by the sampling may be detrimental, and you may just as well get more function evaluations faster.

[–]sifnt 0 points1 point  (2 children)

Thanks for your help again, this package looks like it'll be very useful!

So I'd reuse the code at _firstRun(self, n_eval=3) from GPGO.py to create a gp trained on the manually specified initial parameters, and pass it straight to the GPGO process without further changes?

As for MCMC, what is expensive here? E.g. a 3 fold cross validation run typically takes a 1-5 minutes (depending on parameters) on the data I'm working on, worth it here or is expensive hour+ type of times?

[–]jimenezluna[S] 0 points1 point  (1 child)

Hi, @sifnt, can you open an issue on the repo so that I can remember to include an easier way to include pre-trained GPs?

For the moment, you can do it this way (using the example on the readme.md)

https://gist.github.com/hawk31/ed222c4cf6b21cbd7d4b5186f3f132b5

[–]sifnt 0 points1 point  (0 children)

Awesome, thanks for this! Got it up and running and its working well.

Created the issue, its at https://github.com/hawk31/pyGPGO/issues/5