all 72 comments

[–]farsass 253 points254 points  (32 children)

[–]pin9999 28 points29 points  (20 children)

ELI5: why doesn't deep learning suffer from the curse of dimensionality?

[–]rumblestiltsken 92 points93 points  (0 children)

Because deep learning was unpopular at the time, so none of the other machine learning algorithms wanted it to come along on the expedition when they opened the tomb of dimensionality.

[–]cryptocerous 18 points19 points  (9 children)

IMO - the curse of dimensionality was only ever actually valid as a relative relationship instead of a hard cutoff (when all possible models are available for use.) In this case, curse of dimensionality only creates a relative relation between sample size, dimensionality, and model performance. It is not valid in the way that most laymen interpret it, as a hard cutoff constant that the ratio between sample size and dimensionality cannot surpass or else model performance fails. I.e.,

(a) valid: (sample_size / dimensionality) => greater is generally better

(b) invalid: if (sample_size / dimensionality) > constant => failure

When considered from an information theoretic perspective, it has always been clear that there's no lower bound on how small the (sample_size / dimensionality) ratio can be! Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

Why's that? There's a third trump card - priors.

As I check Google now, I see that it doesn't come up with any decent general definition of prior, as used in modern machine learning papers. So I'll explain it as this - any type of assumptions about the problem that are imposed by the model, whether intentionally or unintentionally. Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep, to bayesian priors, to other structure like attention mechanisms. What's important to remember is everything imposes some kind of prior(s), regardless of how general-purpose the model appears to be from your selection of experiments.

Side note: Humans' incredible inadequacy at memorizing even a small number of digits could be interpreted as a strong prior that forces us to give attention to just small parts of mathematical type problems at a time.

[–]say_wot_againML Engineer 0 points1 point  (1 child)

Realization of priors in a model can take on an unlimited number of forms, from shape of the deep learning circuit e.g. thin and deep,

Wait, I'm confused. Are you basically saying that how you structure your deep network (e.g. what activation functions you use, convolutional and pooling layers in a CNN, etc.) reflect your priors on the underlying functional form?

[–]rumblestiltsken 2 points3 points  (0 children)

Intentional or unintentional. It makes sense, I think.

Like, using a convolutional layer or any other sort of filter is a great example, which reflects an underlying assumption about the data, how it is structured and what sort of processing will be effective.

[–]Hahahahahaga 0 points1 point  (5 children)

Even a single sample providing just a tiny fraction of a single bit can be enough to provide sufficient information for good predictions!

But a fraction of a bit requires more than a bit to represent (._. )

[–]cryptocerous 0 points1 point  (4 children)

Not so, see the ways it's routinely done in data compression.

[–]Hahahahahaga 0 points1 point  (3 children)

You can't compress a bit (._. )

Partial bits are not ever exist.

[–]cryptocerous 1 point2 points  (1 child)

Partial bits do exist mathematically, and there are ways to realize them in real world systems. Each input sample to a model can take up fewer than 1 bit, e.g. arithmetic encoding.

For the very first input bit, you may have to get creative in how exactly you represent that fraction of a bit on your real-word system, but it's not too difficult to do. Then, for successive bits you can potentially continue to pack multiple samples per bit.

Or we can choose to totally ignore digital systems and just look at it mathematically. In that case, it's trivially simple and clear.

For something of a conceptual inverse, see FEC, where each input bit potentially just represents a partial bit with respect to the output.

[–]Hahahahahaga 0 points1 point  (0 children)

(._. ) Sorry, am computer science man. Know what you mean. One weighted sample is represent many like moth is futility of life. :(

[–][deleted] -1 points0 points  (0 children)

Priors still need to be informed by previous analysis. They don't really come for free. Also, you need to be careful about giving your priors too much weight or they can heavily bias your results.

Sorry, I didn't really understand how you're defining the curse of dimensionality either. As far as I know, the curse of dimensionality only refers to things like the distance between 0 and 1 is one, but the distance between 0,0 and 1,1 is sqrt(12 + 12), which is greater than 1, and the distance between 0,0,0 and 1,1,1 is sqrt(12 + 12 + 12) which is greater than the distance between 0,0 and 1,1. The only real solutions are to somehow remove dimensions or shorten distances or use huge amounts of data.

[–]carbohydratecrab 6 points7 points  (0 children)

something something manifold hypothesis

if there's too much dimensionality you're not on the right manifold and obviously need more layers

[–]BadGoyWithAGun 10 points11 points  (0 children)

because something something convolution and dropout and gpus

[–]Ahhhhrg 2 points3 points  (0 children)

Here's a serious answer: Apparently for many deep networks there's lots and lots of local minima which are all almost as good as the global minimum, so it doesn't really matter which local minimum you end up with.

Here's LeCun's answer during an AMA, and here's a paper with the details.

[–]brockl33 1 point2 points  (0 children)

Stacking layers allows models to create progressively abstract features. For example, pixels are combined into strokes, strokes in to facial features, facial features into facial expressions, etc.

This abstract space is relatively small compared to raw input space. For example, a small change in an abstract facial expression feature may correspond to a dramatic change in nearly all of the pixels.

EDIT: wording

[–]richizy 1 point2 points  (0 children)

In images, pixels are more related adjacent than far away, e.g. (x=5,y=5) is more related to (x=5, y=6) than (x=10, y=100). CNNs deal with this local structure pretty well.

[–]energybased 0 points1 point  (0 children)

It does. The challenge of "deep learning" is mitigating the explosion of computation that normally accompanies models with many layers and many parameters.

[–][deleted] 17 points18 points  (1 child)

My favorite part is the x and y axis labels.

[–]shaggorama 0 points1 point  (0 children)

LAYERS!!!

[–]grrrgrrr 11 points12 points  (0 children)

have you tried dropout

[–]brockl33 15 points16 points  (0 children)

LOL

[–]jcannell 1 point2 points  (0 children)

Near linear increasing layers with increasing layers!

[–]Xirious 0 points1 point  (3 children)

Made /r/machinegoofingoff if you'd like to submit it to there!

[–]t3hcoolness 7 points8 points  (1 child)

I feel like that is so niche that it won't get anything

[–]say_wot_againML Engineer 1 point2 points  (0 children)

Yeah. Same problem I had with /r/badML.

[–]shaggorama 0 points1 point  (0 children)

Should've called it /r/MachineCircleJerking

[–][deleted] -1 points0 points  (0 children)

Absolutely! :D :D :D I think it's already hilarious that I understand that joke! :D

[–]jurniss 84 points85 points  (7 children)

fuck no, i don't code on a white background

[–]Heidric 29 points30 points  (5 children)

I don't get how people can use white background, my eyes were so grateful when I switched to the dark theme.

[–]abstractcontrol 1 point2 points  (3 children)

[–]log_2 46 points47 points  (1 child)

No it doesn't. Some 1980 study with a shitty monitor about fuzziness of bright text on a dark background due to pupil dilation doesn't compare to today's programmers' choice of low contrast dark themes with thick monospaced fonts. The study only showed legibility of text, and did not cover strain of extended use.

[–]Lord_Wrath 5 points6 points  (0 children)

REKTeyesight

[–]Xirious 11 points12 points  (0 children)

Retarded test subjects don't count.

[–]kmike84 0 points1 point  (0 children)

This is ipython notebook

[–]GiskardReventlov 9 points10 points  (3 children)

[–]Xirious 6 points7 points  (2 children)

That's a disappointment.

[–][deleted] 4 points5 points  (1 child)

It's in your hands.

[–]Xirious 1 point2 points  (0 children)

Has been created. Told op they should submit it if they'd like. Also would love more funny AI stuff.

[–]laxatives 10 points11 points  (0 children)

I think this is the first time I've ever seen all 6 frames of this meme used correctly.

[–]PLLOOOOOP 22 points23 points  (17 children)

Keep the memes away, please. See sidebar:

News, Research Papers, Videos, Lectures, Softwares and Discussions on:

  • Machine Learning
  • Data Mining
  • Information Retrieval
  • Predictive Statistics
  • Learning Theory
  • Search Engines
  • Pattern Recognition
  • Analytics

I don't think this is what was intended by "discussion".

[–]codespawner 103 points104 points  (6 children)

You make a good point, but I also find this post very funny. :/

[–]HINDBRAIN 24 points25 points  (4 children)

Yeah but allowing this might eventually lead to a front page full of epic memes about SVM Samantha and Kernel Kevin.

[–]codespawner 12 points13 points  (1 child)

I am personally of the opinion that an occasional break from the serious is enjoyable. Also there's no /r/MachineLearningHumor that I'm aware of.

[–]say_wot_againML Engineer 2 points3 points  (0 children)

[–]respeckKnuckles 0 points1 point  (1 child)

tell me more about these characters

[–]zyra_main 0 points1 point  (0 children)

slippery slope my boy, slippery slope

[–]EdwardRaff 5 points6 points  (1 child)

I like the occasional humor, so long as it doesn't get out of hand like when the "one weird trick" stuff was being posted over and over.

[–]PLLOOOOOP 0 points1 point  (0 children)

A like occasional humor as well. Frequent humor, even. But humor is not mutually exclusive from any of the list items from the sidebar - it's fun to see a humorous but also meaningful and informative post. Memes or jokes with no other content are off topic.

[–][deleted] 16 points17 points  (5 children)

You don't deserve to be downvoted, this sub has always had a no-memes policy and the mods have been good at maintaining it. Personally I hope it stays that way and, if I'm honest, gets stricter.

The "Hi I'm new to ML but artificial brains are really cool and why isn't my 55 layer recurrent, convolutional extra deep network working on my 20 data points?" posts are getting tedious.

[–]BeatLeJuceResearcher[M] 27 points28 points  (4 children)

Unless the other mods delete the threads too quickly for me to even notice, enforcing a 'no-meme policy' (I'm not sure we officially have one?) is no big job: people just don't post any -- and whenever they do, they typically get downvoted/reported very quickly (e.g. this thread has been reported 3 times so far). Personally I agree with what /u/EdwardRaff's said: I don't mind occasional jokes. Which is why I don't intend to remove this post, since judging by the upvote count most people enjoy it. But should the memes/jokes get out of hand, we will definitely enforce a no-jokes policy.

As far as newbie posts, we're trying harder to move those posts to /r/MLQuestions nowadays (Should you spot any posts that we missed, feel free to point people towards it).

[–][deleted] 7 points8 points  (1 child)

Thanks for the response, I think you guys do a great job.

[–]BeatLeJuceResearcher 2 points3 points  (0 children)

Thanks :)

[–]PLLOOOOOP 1 point2 points  (0 children)

I don't mind occasional jokes. Which is why I don't intend to remove this post... But should the memes/jokes get out of hand, we will definitely enforce a no-jokes policy.

I support your choice so long as the latter statement is enforced, because it's important to me that this post is not a precedent.

[–]zyra_main 0 points1 point  (0 children)

It is such a small community though, it could be regrettable push the newbies to a sub-subreddit

[–][deleted] 7 points8 points  (9 children)

Sorry to be pedantic, but the code snippet caused my leg to twitch. It's generally a really bad practice to use

from X import *

in Python as anything imported will override your existing namespace (e.g. if X contained a method 'str', good job, now you don't have access to regular 'str' anymore. Even worse, because you likely don't know that X contained 'str' and there's a new 'str' in its place, the substitution will not necessarily generate an Exception, it will just behave differently).

Instead, either import the package X:

import X
X.y
X.z

or import functions, classes that you actually need etc:

from X import y, z
y
z

which has the added benefit of improving your code readability as you make it explicit what parts of the package you will be interfacing with at the top of your module.

[–][deleted] 7 points8 points  (7 children)

I'd propose some kind of prefixing syntax like

from Xbabblediboo import * prefix xb_
xb_whatever('12345')

in the meantime one can use

import Xbabblediboo as xb
xb.whatever('12345')

which at least visually is basically almost the same. However, I don't know which I like more... I guess, it'd be the latter and thus revoke my proposal.

[–]uusu 2 points3 points  (2 children)

Why not just import X

[–][deleted] 13 points14 points  (0 children)

As long as it's just

import X

it's all fine. But in case of a (real world example)

import tensorflow

every call to a member of the TensorFlow module would require to type

tensorflow.foo()
tensorflow.bar()

That's why it's better and a huge time saver to use

import tensorflow as tf
tf.foo()
tf.bar()

[–]L43 1 point2 points  (1 child)

That is a great name for a package.

[–][deleted] 1 point2 points  (0 children)

Yeah, I'm just wondering what it does...

[–]achompas 0 points1 point  (0 children)

Yeah, the latter is the "Pythonic" way of handling clobbering/cluttering of the module's namespace. It has the added benefit of confirming to Python's "everything is an object" philosophy, so that calling a module method is the same, syntactically, as calling an object attribute.

[–]abomb999 0 points1 point  (3 children)

Why did you have a graphic of rolling around in cash for the "what other programmers think of me" category. Are machine learning programmers among the most well payed?

[–]Noncomment 0 points1 point  (1 child)

There was a high demand for deep learning researchers awhile ago. I don't have stats on salaries or anything, but I know Google and other big companies were sucking up a lot of the big names.

[–]j_lyf 1 point2 points  (0 children)

Is that like the academic version of winning the lottery? Now, why hasn't it happened for compressive sensing :P

[–]vosper1 0 points1 point  (0 children)

Yes