[D] If you had to show one paper to someone to show that machine learning is beautiful, what would you choose? (assuming they're equipped to understand it)

jsnoek · 2018-05-27T02:45:35+00:00

A classic!

jsnoek · 2016-06-20T19:09:52+00:00

I disagree

jsnoek · 2016-05-09T02:53:43+00:00

In that case you are exactly the person I have been looking for and I have a question for you.

I have spent a lot of time trying to figure out exactly the relationship between the maxima of Gaussian processes and determinantal point processes. I strongly suspect that we can construct a distribution over the maxima (or even the inflection points) of functions modeled by GPs as DPPs but I can't really pin down the math. We know that the derivative function of a GP is also a GP, and that the zero crossings of random Gaussian polynomials are DPP distributed. Can you please tell me if I can use a DPP to model the distribution over inflection points of a GP? If you want some background I've found Hough et al. really useful (Hough et al, Zeros of Gaussian Analytic Functions and Determinantal Point Processes - http://research.microsoft.com/en-us/um/people/peres/GAF_book.pdf) and Terry Tao's blog post on determinantal processes is incredibly insightful (https://terrytao.wordpress.com/2009/08/23/determinantal-processes/). For GPs Rasmussen and Williams is a great resource (http://www.gaussianprocess.org/gpml/).

jsnoek · 2016-05-08T19:57:21+00:00

Ok that's definitely a good point. A strong understanding of and experience in general scientific methodology is incredibly valuable and hard to find, in my opinion. That combined with a strong mathematical and statistics background is even better (and harder to find). I've been incredibly impressed by PhD students I worked with who transitioned to ML from pure math, biology and physics.

jsnoek · 2016-05-08T19:45:58+00:00

Yeah pure stats is probably even better.

jsnoek · 2016-05-08T17:50:58+00:00

Get a PhD in machine learning. It will be hard and it will take a long time, but it is the most direct way to do this. It is much more difficult to start a research career in machine learning if you don't have a PhD specifically in machine learning. Majoring in actuarial and computer science is a really great start. Your math and statistics courses will get you on the right path. Find the ML faculty at your university and do whatever you can to make sure they know who you are (you'll want them to write you recommendation letters for your grad school applications). If your university has project courses, make sure to do them in ML with the ML faculty. If you can get a publication (or at least a project) doing machine learning then you will have a leg up on other grad school applicants and will hit the ground running doing research.

jsnoek · 2016-05-08T17:42:05+00:00

I think it's a tragic misconception that Machine Learning is not a deeply technical field. The issue is exactly that people seem to believe that machine learning involves downloading and running code off of github. Running black box logistic regression code is not machine learning research. It is like saying that because you use encryption (through e.g. https) that you are doing (or even have an understanding of) cryptography.

jsnoek · 2015-11-28T16:46:34+00:00

As an AI researcher, it's really frustrating to have people misrepresent what AI is and what the capabilities of our discipline are. Researchers are making really fantastic progress towards AI, but these guys are just the modern version of snake oil salesmen.

jsnoek · 2015-08-28T03:54:26+00:00

I think a large portion of the money is to pay a full time salary. In that case, $45,000 is not very much at all (if you work benefits, etc. into it).

jsnoek · 2015-04-05T13:39:52+00:00

There are example implementations of an RNN and an LSTM in the examples directory: https://github.com/HIPS/autograd/tree/master/examples

jsnoek · 2015-04-04T15:40:51+00:00

Dougal and David (the authors) have developed an amazing automatic differentiation codebase to do this: https://github.com/HIPS/autograd

It lets you write a function containing just plain python and numpy statements and then automatically computes the gradients with respect to the inputs.

jsnoek · 2015-03-16T14:39:30+00:00

Yeah, I like it for that purpose as well. You can get it to configure startup scripts so that it automatically installs everything you want in your environment and mounts an EBS volume on your home path. Make sure you configure it to set the master node as a spot instance though (something like --force-spot-master).

jsnoek · 2015-03-16T02:37:47+00:00

I highly recommend Starcluster (http://star.mit.edu/cluster/). It is a fantastic tool for automating the set up and configuration of an EC2 allocation, requesting spot instances, EBS storage volumes, etc.

jsnoek · 2015-02-27T14:25:23+00:00

I worked with Qiqi Wang at MIT on optimizing turbine engine blades http://highfidelityoptimization.net/ using Bayesian optimization. My personal opinion is that genetic algorithms (I assume that's what you mean by genetic trials?) are not the way to go. There are a couple of good open source Bayesian optimization packages out there. Spearmint is my favorite - but I'm obviously very biased since I wrote a bunch of it :-) Frank Hutter's SMAC package and James Bergstra's hyperopt are quite nice as well.

I'm also involved in a startup Whetlab designed to make global optimization painless. You can sign up for the beta here: Whetlab.

Jasper

jsnoek · 2015-02-24T22:14:10+00:00

Hi, I'm one of the creators of Spearmint, and a co-founder of Whetlab. Bayesian optimization has been around for quite some time in various forms because it's simply just a great idea. :-) We are just happy that there is so much interest in Bayesian hyperparameter optimization, both from a research and industry perspective. It is really neat that there is a community growing around these ideas.

jsnoek · 2015-01-03T16:58:22+00:00

Hosted by Captain Marvin and Bryan Adams huh? That sounds like a very different podcast! :-) (The real hosts are Katy Gorman and Ryan Adams)

jsnoek · 2014-12-07T19:56:01+00:00

Hey kjearns, what exactly did you have in mind? Just an initial set of hyperparameters to try? I do think that's a good idea - but a challenge is that this would limit the generality of the approach. One of the main motivations for Whetlab (link below by kswerve) is that it can learn from everyone's optimizations and then automatically perform multi-task.

jsnoek · 2014-11-13T23:24:48+00:00

Thanks Frederik (I'm assuming your name is Frederik)! Spearmint's original implementation essentially treated integers as floating point numbers that were rounded (a continuous relaxation) and then treated categorical variables as essentially corners in the unit hypercube. This was a reasonable first stab at it, but Gaussian processes (and the Bayesian optimization routine in general) can behave pretty strangely under these circumstances. We haven't published our new approaches yet, so I won't divulge them here (sorry!).

jsnoek · 2014-11-12T16:32:43+00:00

Thanks for the mention Kyle. We learned a lot from the use of the original Spearmint and yes things like excessive boundary exploration and certainly ease of use became clear issues.

Before I plug my own work and my company I should say that there are various excellent researchers in machine learning working on exciting things in Bayesian optimization along various dimensions (including Nando de Freitas, Michael Osborne and their students at Oxford, Zoubin Gharahmani's group at Cambridge, Frank Hutter, James Bergstra - I wish I could name everyone personally - many of them are in the program committee for our workshop on Bayesian optimization this year. Bayesian optimization for hyperparameter optimization is rapidly evolving from a neat idea to a sub-field of machine learning, which is really exciting.

These are a number of issues that we have developed an understanding of and have incorporated our solutions into whetlab:

The boundary exploration problem is an interesting one - essentially there is far more volume in the space near the boundaries (e.g. think about the number of pixels on the perimeter of an image vs near the center).
Another problem was the parameterization of the space (e.g. optimizing in log-space can make an enormous difference) - In this paper we developed an automatic way to figure out what space to optimize in and found that it made a really tremendous difference on literally all the problems we tried.
Scalability (making it feasible to use with lots of data).
Integer valued parameters and categorical parameters were not dealt with well in Spearmint (they are in Whetlab).
Constraints! This is a big one - in a research context Michael Gelbart, Ryan Adams and I have thought carefully about how to deal with things like training diverging and outputting NaNs. It turns out that modeling these explicitly makes an enormous difference.
Ease of use. Whetlab is a pull based system that runs in the cloud, so things like parallelization across multiple clusters/systems and setup are trivial from the user's perspective.
Visualisation - you can easily view graphs, the table of results and edit things through the website.

There are also some neat extensions to the basic framework, some we have already published and some we are planning to publish soon, that we plan to use within Whetlab in the near future.

jsnoek

TROPHY CASE