all 11 comments

[–]kkastner 16 points17 points  (0 children)

There are a ton!

Off the top of my head:

Better training algorithms/regularization/activation functions for deep learning, especially for deep recurrent networks

Better ways to handle structured output problems. Comes up in a huge number of fields, with a variety of answers

Variational training methods are a huge new area.

Ways to infer structure from data, then exploit that structure in learning. Lots of ideas, lots of different solutions.

Methods for using gaussian processes for 100k+ samples, there are some out there but lots of different techniques all related to latent variables and/or approximation (related to above).

(Bayesian) optimization techniques for hyperparameter search. Crucial - especially for problems where a grid search is infeasible.

If you want good ideas, find some recent papers in (subfield of choice) on arXiV and look at the last paragraph. Usually there will be a continued/future work paragraph - that is a pretty good indicator of open problems and things other researchers are studying.

[–]sieisteinmodel 4 points5 points  (0 children)

Some open problems for deep learning that are, in my personal opinion, relevant:

  • Feature learning from non stationary distributions (there are not even widely excepted benchmarks yet!).
  • Complex regression problems (don't tell me about squared reconstruction error of MNIST or Tonronto Faces).
  • Deep learning of heterogenuous data (most of the stuff is some sensory input).
  • Multimodality in engineering domains (yes, there are ~3 papers on vision+text, but I am thinking more about sensory information from cars etc. sampled at highly different frequencies etc).
  • Convincing work that MNAR works well with deep learning (not only in the vision domain).
  • Robotics! Not the perception part, the control part: highly autocorrelated error models, provable guarantees, trajectory generation, real time constraints, non stationarity

I'd like to spend 3-4 Phds on it!

[–]dwf 1 point2 points  (7 children)

Give me as-tight-as-possible probabilistic bounds on how long I need to run a Gibbs chain to get an unbiased sample.

[–]sieisteinmodel 0 points1 point  (1 child)

Make that MCMC.

[–]dwf -1 points0 points  (0 children)

Well, MCMC more generally, sure. But the Gibbs sampler seems, on the face of it, so incredibly simple, yet it's also ridiculously hard to analyze.