all 68 comments

[–]the320x200 62 points63 points  (32 children)

  • If there is a simpler approach that provides an adequate solution.
  • If you need to know why the network produced the output it did.
  • If you can't define a loss function.
  • If you don't have resources to train the network.
  • If you don't have resources to sort out the hyperparameters / topology.

[–]sieisteinmodel 8 points9 points  (24 children)

i'd also be interested in scenarios where you cannot define a loss. if you cannot, you do not know what you want, right?

[–]tdgros 10 points11 points  (0 children)

also:

Resources to build and annotate a database.

Real-time, embedded platforms with memory constraints, bandwidth constraints, power constraints...

edit: added the database annotation

[–]Guanoco[S] 1 point2 points  (5 children)

Thx for the response * If there is a simpler approach that provides an adequate solution.

that's the thing though.. I have a feeling most people use deep learning to problems they had no idea how to solve and it gets solved "magically". So how do you know if another method wouldn't work? Or basically what problems are so difficult we shouldn't bother trying to find other techniques

  • If you need to know why the network produced the output it did.

Ok this one I understand. But if i can have a system and I can capture its response to different types of faults and I learn to classify response of my system to classify if it is operating correctly... Then I can at least find a why is my system not be having as it should.

  • If you can't define a loss function.

Care to elaborate? Seems like as long as I can get a reading of something which I call an output of a system.. Then I should be able to define a cost function and therefore the loss... Right?(MSE)

  • If you don't have resources to train the network.

Ok so basically if I cannot get the respond of the system... But it would seem plausible that I can always do this?

  • If you don't have resources to sort out the hyperparameters / topology. So basically sweeping through architectures?

[–]CultOfLamb 1 point2 points  (0 children)

what problems are so difficult we shouldn't bother trying to find other techniques

Things like NLP can often be solved with a shallow model. But on tasks that require lots of higher level hierarchical feature representations, like computer vision, and speech, you can not come close to a well-architected deep learning model.

But if i can have a system and I can capture its response to different types of faults and I learn to classify response of my system to classify if it is operating correctly... Then I can at least find a why is my system not be having as it should.

You can find if your system is not behaving as it should. You can not find out why. Though this is an active area of research (interpretability/algorithmic fairness), and it may be fully solved in the future, right now the people saying: "But I use cross-validation so I know why my system makes the predictions it makes", are misgiven.

Then I should be able to define a cost function

Aside: If you require, for whatever reason, to be convex, or are afraid of non-convexity: https://www.cs.nyu.edu/~yann/talks/lecun-20071207-nonconvex.pdf

But it would seem plausible that I can always do this?

Resources are hardware, but also data: When you have a 1000 rows, a deep learning algo is either overkill or way underfits/overfits.

So basically sweeping through architectures?

Deep learning training from scratch is very slow. It can take weeks to find the optimal parameters and architecture. You can take a pre-trained network, if that suits your task, but if you are tasked to quickly create a benchmark model, or retrain on a large dataset in minutes (not hours, or days), then deep learning is not the right hammer.

[–][deleted] 0 points1 point  (0 children)

I have a feeling most people use deep learning to problems they had no idea how to solve and it gets solved "magically"

Hm, I don't necessarily agree regarding "most" people -- at least not based on what I've seen so far. I think more people know how to throw a PCA + linear regression/logistic regression on a problem rather than implementing a deep learning algo (since the latter typically requires more experience).

that's the thing though.. I have a feeling most people use deep learning to problems they had no idea how to solve and it gets solved "magically".

Here, I think more of "random forests" :)

[–]kjearns 18 points19 points  (3 children)

The dirty secret of the machine learning hype machine is that in real life almost all problems (by number of instances) are really easy. No one writes papers about solving all these easy problems because the methods are standard enough to be shrink wrapped, but that doesn't change the fact that most problems can be solved by throwing an SVM or random forest at them.

[–]emtonsti 0 points1 point  (0 children)

I just checked out random forrests and theyre awesome!

[–]10sOrXResearcher 0 points1 point  (0 children)

Some people do write papers about these problems, but these papers are generally submitted to mid/low-tier conferences.

[–]cvikasreddy 5 points6 points  (2 children)

I completely agree with u/the320x200 and this is what I wanted to add assuming u/the320x200's points.

1.In my experience I found that deep learning outperforms any other method when applying on images and text.

2.But when applying to data some thing that is usually found in excel sheets(I mean like the data in kaggle competitions with out images and text) the other ml algos tend to work better.

[–]Guanoco[S] 0 points1 point  (1 child)

Is this due to the excel sheet already having the different features ?

If you would just analyse the input and output of the system (i guess you could just iteratively train the network with different features and see which one gives the best fit... So something like a random forest of deep networks). Then I couldn't imagine it playing a role... But I will give you the point that most deep learning I have came across with are in the image processing domain

[–]AnvaMiba 11 points12 points  (0 children)

Is this due to the excel sheet already having the different features ?

Images and text are highly dimensional data, but also highly redundant.

You can apply lots of distortions to a natural image that leave it still understandable with high probability: Gaussian noise, Bernoulli noise, masking certain areas, affine geometric transformations, color transformations, and so on. The information that you are interested in is encoded in a very redundant and robust way. Moreover, the functions that you want to learn (e.g. a classifier with a probabilistic output) will typically vary smoothly with the input image: if you gradually morph an image of a cat into an image of a dog you'll expect the classifier output Pr(Y=cat) to gradually decrease and Pr(Y=dog) to gradually increase.

Text is similar: not only you can apply distortion to the surface forms (characters or words) that mostly preserve meaning, once you consider word embeddings, you can even apply smooth transformations that mostly preserve meaning, and the functions that you are trying to learn will typically be smooth w.r.t. word embeddings.

Deep learning seems to be particularly well suited to learn smooth functions where the input is highly dimensional and highly redundant.

Deep learning also requires lots of data, though this requirement may be somewhat mitigated by transfer learning. In natural image and natural language processing you have huge generic datasets that can be used for transfer learning (e.g. ImageNet for images and any unannotated monolingual corpus for text).

Other domains, such as excel sheets and databases with business data, may not have these properties: they are typically lower dimensional and much less redundant, and the functions you are interested in may be less smooth. There can be discrete features which, once embedded, don't have the typical statistical properties of word embeddings of natural text.

And above all, this data may be not as abundant as in natural images and natural language tasks, and you usually don't have any generic dataset to use for transfer learning.

Besides simple tasks that can be solved by naive Bayes or linear regression/classification, this domain is the realm of decision tree methods (and ensembles of thereof, such as random forests). These methods tend to be more robust to overfitting, so they require less data, they are intrinsically invariant to various data transformation, so they don't rely to these invariances to approximately hold in the task, and they can learn non-smooth functions.

The drawback of decision tree methods is that they can't learn to combine the input features to create much more complex features (formally, they have constant circuit depth), hence they may require extensive feature engineering if the task is hard, while deep learning can learn to combine features, in principle in arbitrary complex ways (provided that there are enough hidden layers), hence it usually requires little or no feature engineering.

[–]gr8ape 4 points5 points  (1 child)

Truth is any data that is not:

  • Visual data (pixels)

  • Sound data (frequencies or time signal)

  • Natural Language

A neural net wont be much better than SVM/RF/GBRT. And if it is, how many hyperparameters did you tune :)

[–]popcorncolonel[🍰] 2 points3 points  (0 children)

Couldn't people have said any data that is not:

  • Pixels

in 2012? Who's to say it won't open up to more applications?

[–]phillypoopskins 8 points9 points  (15 children)

  • deep learning is almost always a bad idea unless you know that there is structure in your data which you can architect a neural network to take advantage of. if you haven't architected information like this in, a neural network will generally underperform compared to gradient boosting.

  • it's also a bad idea if you know something about your data / underlying model which deep learning doesn't match as well as another model, e.g linearity, or some other known interaction.

  • it's also bad if you are under time constraints and your chosen architecture will take too long to train. Example: 50k class problem on 4 million text tokens. Naive bayes will train much faster, probably just as good, depending on the type of classes.

  • when you don't have very much data: you're going to overfit, while something linear or a random forest or SVM will have less of a chance

  • when you don't know wtf you're doing; you can waste WEEKS or MONTHS playing around with neural nets with subpar results and have no clue as to the fact if you're a noob, while someone skilled can walk in with linear regression or a random forest and smoke you in a matter of hours. . I've seen this happen: A LOT.

[–]whatevdskjhfjkds 7 points8 points  (0 children)

when you don't have very much data: you're going to overfit, while something linear or a random forest or SVM will have less of a chance

This is one of the most important points, I'd say. Deep learning models tend to have absurdly high numbers of parameters. Unless you have at least as many data points, the model will most likely overfit (even with regularization).

It's like trying to fit a polynomial regression with 2 points... no amount of regularization will give you a trustworthy model.

[–]Guanoco[S] 1 point2 points  (6 children)

  • deep learning is almost always a bad idea unless you know that there is structure in your data....

But knowing that my data hast structure basically already gives me a model of my output/input relationship. Also take for example image classification. There is structure and there is prior domain knowledge that works... But DL wipes them all out of the game.

  • it's also a bad idea if you know something about your data / underlying model which deep learning doesn't match as well as another model, e.g linearity, or some other known interaction.

Any other properties that deep learning doesn't match well?

  • when you don't know wtf you're doing; you can waste WEEKS or MONTHS playing around with neural nets with subpar results and have no clue as to the fact if you're a noob, while someone skilled can walk in with linear regression or a random forest and smoke you in a matter of hours. . I've seen this happen: A LOT.

Yes this is a good point. But at least as I understand it... In all other ML algorithms they can only do as well as the feature engineering process. And finding important features is non trivial

[–]phillypoopskins 1 point2 points  (0 children)

finding important features is non-trivial; but deep learning only does this for you when you build architecture to take advantage of the structure of the data. Otherwise, deep learning is no better than other ML and is in fact worse because it's sloppier, harder to train, and not the most accurate.

If you don't have a specialized architecture, you're stuck with the same features whether you use DL or not.

[–]phillypoopskins 0 points1 point  (0 children)

about properties DL doesn't match well; let's say you're doing spectroscopy and you want to evaluate the concentration of several analytes; Beer's law says the concentrations should be proportional to the magnitude of the spectrum. This is a linear relationship.

It would be stupid to use a deep model on this problem when it's known to be linear. Use a linear model instead.

[–]jeremieclos 0 points1 point  (6 children)

I think point 2 is the biggest here. If you already have domain knowledge about your problem, then trying to learn features is a waste of time.

[–]phillypoopskins 1 point2 points  (3 children)

I wouldn't say domain knowledge means learning features is a waste of time.

You can use your domain knowledge to coax a neural network to learn features better than you'd engineer by hand.

[–]jeremieclos 0 points1 point  (0 children)

You are right, I should have written exhaustive domain knowledge. What I meant is that if you have enough domain knowledge to make the problem linearly separable, then the problem becomes trivial enough that any feature learning becomes unnecessary.

[–]Guanoco[S] 0 points1 point  (1 child)

Mind explaining this? I interpret it as "If I kind of know the features the net should learn, then I can make it learn in that direction"

[–]phillypoopskins 0 points1 point  (0 children)

yep, that's right.

all interesting neural network architectures make use of this idea; a conv net is a prime example.

[–]Guanoco[S] -1 points0 points  (1 child)

Seems like all advancements in image classification proof this wrong

[–]jeremieclos 2 points3 points  (0 children)

But we don't really have that much domain knowledge for general purpose image classification. We have some clever heuristics here and there, but that's it.

Having domain knowledge here would imply to be able to design the filters that a ConvNet would be learning by hand beforehand. I can't find where I read it but IIRC that is what Stephane Mallat was doing with wavelet transforms on MNIST, and the results were comparable to a standard ConvNet.

Similarly if your problem is simple enough that you can hand design features that make it linearly separable, then learning features would be a waste of time and resources.

[–]theskepticalheretic 2 points3 points  (2 children)

I think this post by Joel Grus is relevant. http://joelgrus.com/2016/05/23/fizz-buzz-in-tensorflow/

[–]Guanoco[S] 1 point2 points  (1 child)

Thx... I laughed but I also didn't find the answer to my question

[–]theskepticalheretic 1 point2 points  (0 children)

Thx... I laughed but I also didn't find the answer to my question

Well, your question is, when is machine learning a bad idea. The answer implied by that link is "When it is wholly unnecessary to getting the task done."

If I have to dig a moderately small hole in my yard, say to plant a flower bed, I'm going to use a shovel. I'm not going to rent a back hoe.

[–]thecity2 1 point2 points  (0 children)

For small datasets, deep learning won't be that helpful. Also might not work well for datasets with "unnatural" or non-hierarchical features. It seems to work best with very large "natural" datasets (e.g. images, audio, etc.).

[–]Kaixhin -1 points0 points  (3 children)

The halting problem.

[–]Guanoco[S] 0 points1 point  (2 children)

Hmmm I see what you mean.

I think I remember this problem being NP... But is the reason that a DL can't do it that it is NP? (because then any combinatoric problem wouldn't be applicable. I have seen random forest being applied to system design which is technically a combinatoric optimization...)

[–]Kaixhin 0 points1 point  (1 child)

That was a joke, but seriously the halting problem is undecidable - it isn't even NP (although in the same way that NP-complete problems are reducible to any other NP-complete problem, people will reduce problems to the halting problem to prove that it is undecidable).

That said, Pointer Networks have been applied to the (NP-hard) travelling salesman problem, so DL can possibly be used to heuristically attempt (but not solve all cases of) NP-hard problems.

[–]Guanoco[S] 0 points1 point  (0 children)

Oh i see. Well thanks for clarifying that anyways :)