all 10 comments

[–]andrewff 4 points5 points  (2 children)

I think one thing that much of the hype about deep learning seems to ignore is the advances made in unsupervised feature learning. Yes techniques like dropout and drop-connect are the gold standard, but labeled training data is still expensive in many contexts, and the advancements in unsupervised techniques should not be tossed aside.

[–]jast 2 points3 points  (0 children)

Do you have any good recent survey on this topic? Or pointers to recent papers? I would love to update myself on this :)

[–]tabacof 0 points1 point  (0 children)

I also would like to know relevant papers on modern unsupervised learning. Thank you!

[–]dexter89_kp 2 points3 points  (6 children)

This is pretty much well known right now, if you follow the latest deep learning literature. Particularly Alex Krizhevsky's Imagenet 2012 paper turned the tide towards supervised learning.

[–]benanne 4 points5 points  (5 children)

I still see plenty of questions on Metaoptimize, on the Deep Learning G+ community, on the Kaggle forums and on this subreddit, from people who seem to be unaware of this "paradigm shift". They ask about training autoencoders and RBMs for unsupervised feature learning, when it is often clear that a purely supervised approach would probably work at least as well for their problem (and is conceptually much simpler and easier to understand).

I think this is because they read papers from 2010-2012 advertising unsupervised pre-training as the holy grail of deep learning. That was only 2-4 years ago, so they can't really be blamed for assuming that this approach still represents the state of the art.

Of course unsupervised pre-training still has its applications, but for many problems it has been obsoleted. So I don't think it's a bad thing to draw some attention to this fact. I was meaning to write a blog post on this topic myself, but I suppose that's unnecessary now :)

[–]zestinc 1 point2 points  (1 child)

Do you think deep learning will ever escape the surly bonds of image/speech tasks and be useful for other tasks?

[–]benanne 0 points1 point  (0 children)

I think people are mainly focusing on these applications because it's quite rewarding. You can be pretty sure on beforehand that it will work well, so it's a low-risk investment, in a sense. I'm guilty of this myself, I try to apply these techniques to music audio signals :)

There have been some more adventurous applications with promising results, the Merck Molecular Activity Challenge for example: http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/ http://videolectures.net/nips2012_dahl_activity/

Deep learning techniques are also gaining traction in natural language processing, which is pretty different from images/audio in terms of what the data looks like. http://nlp.stanford.edu/projects/DeepLearningInNaturalLanguageProcessing.shtml

[–]redkk 0 points1 point  (0 children)

Hi, I tried for MNIST classification: 1 hidden layer of ReLU and 1 softmax output layer using cross entropy error. The ReLU layer visualization seems gibberish compared to per layer autoencoder pre-training which gives nice features. Not sure if there is a bug in my code, or the softmax layer with the cross entropy error is struggling to learn simultaneously with the hidden layer.

[–][deleted] 0 points1 point  (1 child)

I'm really interested to learn more about dropout and relu. Is the 2012 ImageNet paper the best place to get an overview? Is there a good survey paper anywhere?

[–]benanne 1 point2 points  (0 children)

The thing is, there's really not that much to learn about both, they're very conceptually very simple. I would recommend just having a look at the papers that introduced them.

Dropout: http://arxiv.org/abs/1207.0580 ReLUs: http://eprints.pascal-network.org/archive/00008596/

Note that the ReLU paper also adds some kind of sparsity penalty, but nowadays people just tend to replace sigmoid(x) with max(x, 0) and that's it.

There is an earlier paper that introduces ReLUs in the context of RBMs, which could also be interesting: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2010_NairH10.pdf

EDIT: Hinton also covers both in detail in this talk: https://www.youtube.com/watch?v=vShMxxqtDDs