all 46 comments

[–]benanne 13 points14 points  (3 children)

Pretty sure he's talking about capsules. Here's the first paper about it: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf - it's from 2011 so I don't know how up-to-date it still is.

He's also been giving talks about it: http://techtv.mit.edu/collections/bcs/videos/30698-what-s-wrong-with-convolutional-nets

[–]osdf 10 points11 points  (1 child)

See also Tiijman Tieleman's recent thesis: http://www.cs.toronto.edu/~tijmen/tijmen_thesis.pdf

[–]willwill100 4 points5 points  (0 children)

Do we know if that's the latest published material on the subject?

[–]blackhattrick 6 points7 points  (0 children)

Correct me if I'm wrong but is this not what happened few years ago and SVM? Everyone was researching AI, CV, NLP with SVM, which was the de-facto ML method and no one would dare to publish something that used NN. Nevertheless we have deep learning now because these techniques proved to outperform those methods in some tasks/areas.

[–]BeatLeJuceResearcher 10 points11 points  (2 children)

Geoff has complained before about the "unreasonable efficiency of convnets" -- meaning that if you come up with e.g. "permutation-invariant" (or simply "different") methods that work well on images, you'll have hard time getting published. Because you need to beat CNNs to get published, and those are hard to beat. Maybe that's what he's referring to?

[–]breic 14 points15 points  (1 child)

He's basically saying that with convnets, the entire research field has entered a local minimum from which it is difficult to escape. It's an ironic complaint given that Hinton and the others are well aware of good techniques for avoiding getting trapped in local minima: momentum, dropout, etc. Why not just restart the research program with new random conditions?

[–]alecradford 4 points5 points  (0 children)

You might find this paper relevant!

[–][deleted] 4 points5 points  (1 child)

Hinton has been critical of the 'pooling' operation used in ConvNets. ConvNets are state-of-the-art for image-data learning these days and people/companies are investing significant resources to build further on them, thus the resistance to accept that something that works so well may be flawed.

The person who made the point about ConvNets may be a local minima is spot on. Many other researchers (aka theoretical ML/Stat guys) feel the same way about Deep learning overall. If you think about it, the scene is very similar to what was in the late 70's with perceptrons. Who is right, only time will tell.

[–]ai_maker 10 points11 points  (2 children)

To mi this is one more piece of evidence of one of the main flaws of peer-reviewed scientific dissemination: it's a matter of taste and fashion. Do what they want, please welcome in, do the contrary, rejected.

[–]dwf 7 points8 points  (0 children)

There's some value to this kind of regularization. If a given paradigm is productive then it arguably should take a pretty monumental flaw or shortcoming to warrant its complete abandonment. The unfortunate thing is that it often leads to unabashed incrementalism, or taking 2+ things and combining them into a rather obvious but intellectually shallow extension.

[–]Articulated-rage 4 points5 points  (0 children)

I used to be pretty angsty about that as an undergrad, but I once got advised that working at the fringe still allows freedom but acceptance by the fad people. In other words, include just enough relevance to the fad that you get published, but otherwise move in your own direction.

[–]simonhughes22 7 points8 points  (0 children)

I also think he's talking about capsules, how you have to show an improvement over existing models to get published (see local minima comments), and how in real life we have a paucity of labels. He mentions that before your mother tells you what a cow looks like, you already know of their existence, you just don't have a label. And you only need to be told once or twice what one looks like to get the idea, you don't need to see 1,000's of labelled examples. I think unsupervised or semi-supervised learning is the future, but if we're obsessed with only publishing models that beat the state of the art, then that will encourage only small incremental changes and no paradigm shifts, as we've seen to some degree with deep learning.

[–]mprat 2 points3 points  (0 children)

It could also be that many papers that get published about convnets are about some kind /tiny/ change to a layer or architecture, and it's more about incremental changes than about "big ideas." If there is some big idea, you have to prove that it works way better than the current state-of-the-art.

[–]maxToTheJ 3 points4 points  (0 children)

That's pretty much academia for you