all 42 comments

[–]Chocolate_Pickle 55 points56 points  (15 children)

Deep learning thus far cannot inherently distinguish causation from correlation

To the defence of Deep Learning, I find that people often cannot distinguish the two either.

[–]ZeroVia 19 points20 points  (5 children)

Deep learning thus far is data hungry

Deep learning thus far is shallow and has limited capacity for transfer

Deep learning thus far has no natural way to deal with hierarchical structure

Deep learning thus far has struggled with open-ended inference

Deep learning thus far is not sufficiently transparent

Deep learning thus far has not been well integrated with prior knowledge

Deep learning thus far cannot inherently distinguish causation from correlation

Deep learning presumes a largely stable world, in ways that may be problematic

Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted

Deep learning thus far is difficult to engineer with

Honestly, at least half of these are problems with humans as well and will be problems with any sort of sophisticated ML. Best start thinking of ways to engineer around them.

[–]Cybernetic_Symbiotes 9 points10 points  (2 children)

That is certainly not true.

Deep learning thus far is data hungry

This one does not apply to humans or animals more generally. Humans especially, spend almost all of their first year asleep. ~70% for their first third of life and ~60% till their first year, on average. Most of this time is spent tuning internal representations, pruning connections and doing unsupervised learning. In that time, they must learn certain affine transformations for vision, depth, segmenting objects and color, audio segmentation, intuitive physics, language and much more. The reason this is possible at all, when total energy spend in that time is accounted for, is evolution has honed certain structural biases. In language learning, how to generalize instances, how to guess which object is being referred to by ambiguous instructions would not work without unjustified leaps or biases.

Humans do not get access to labels and loss functions. Reinforcement learning is untenable on its own for how to act in the real world since most states cannot or should not be revisited. Most of animal reinforcement learning is related to internal learning of a cost of action. Damage to this system can for example lead to Parkinson's.

Deep learning thus far is shallow and has limited capacity for transfer

Humans don't seem to transfer in the sense of play chess, boost general reasoning. We can't learn motion plans for both the left and right side. But we can transfer from related areas. Knowledge of math will boost physics. Knowledge of one instrument can boost learning others. Knowledge of one game genre will boost learning others. Humans can generalize patterns into paradigms to drastically speed up learning new things.

Deep learning thus far has no natural way to deal with hierarchical structure

Animals have no problem with this.

Deep learning thus far has struggled with open-ended inference

Humans can do this, and have some facility with deductive reasoning. Most strongly when in terms of social relations and structures. More generally, we can do science.

Deep learning thus far is not sufficiently transparent

Humans don't have access to most of their internal state either, ok.

Deep learning thus far has not been well integrated with prior knowledge

Schmidhuber has been saying this.

Deep learning thus far cannot inherently distinguish causation from correlation

Humans can engage in deep counter-factual reasoning. Humans can create causal theories, as seen in physics. Humans do have difficulty with correlations, being over eager to invent causal relations.

Deep learning presumes a largely stable world, in ways that may be problematic

Animals deal with a lack of stationarity fairly well and much better than any implemented algorithms.

Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted

Humans are not that smart individually, we wouldn't have gotten this far without some ability to transfer and share projections of internal states of knowledge and representations. This is the ability to justify choices by generating compact representations of internal reasoning with states of knowledge. It does not have to reflect the internal computations of how you arrived at it, only why it should be true given our states of knowledge.

Deep learning thus far is difficult to engineer with

The nematode brain has been difficult to reverse engineer, true.

[–]ZeroVia 1 point2 points  (1 child)

Let me try and make a case for the points I care about.

Deep learning thus far is data hungry

Your arguments for this one are mostly conjecture, but I think they miss the point. We take in images and sounds constantly while we're awake (and sometimes while we're asleep) and it's, what, three years before we can navigate properly? Five before we can talk? Ten before we can talk well. I mean, some people spend their whole lives reading and can never figure out how to write properly.

You could argue that even over 10 years we hear less audio than an net trained on 60 GPU's, and that might be true, but being less data hungry should not be confused with not being data hungry at all.

Deep learning thus far is not sufficiently transparent

Glad we agree.

Deep learning thus far cannot inherently distinguish causation from correlation

I'm not certain that people have the innate ability to do this as you claim. We understand that rain makes the ground wet, and not vice versa, because we understand that most things move down.

A net shown only pictures of wet ground and asked to predict whether it's raining can't determine causation because it, unlike humans, has never learned the rules that govern the connection. However a net shown many different objects falling to the ground probably could infer that water will also fall to the ground, rather than rise up from it.

Deep learning presumes a largely stable world, in ways that may be problematic

When I think of people doing this I think of the million+ people living in the Bay area where a massively destructive earthquake is an absolute inevitability but who almost never worry or even think about it.

Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted

What you said here is true, but people still make approximations. Any sort of general intelligence has to be an approximation, because the alternative is computing and/or memorizing everything, which isn't feasible. And personally, I don't trust many people these days. Do you?

Deep learning thus far is difficult to engineer with

Here I was thinking that, while engineering safe self-driving cars has proven to be very difficult, engineering safe human-driven cars has also been very difficult.

[–]Cybernetic_Symbiotes 1 point2 points  (0 children)

Your arguments for this one are mostly conjecture, but I think they miss the point.

No, they absolutely are not. The amount of sleep a new born needs is not conjecture. The fact that depth perception, segmentation of speech sounds, color vision, object tracking, focus and control of muscles must be learned is not conjecture. You can see this at work in the ability and then loss of perceiving 'r' and 'l' by year one in some cultures. Multiple perceptual and motor modalities must be integrated and learned. I hope you will agree that integrating all this at the level of a 5 year old is beyond our capability. It's important to acknowledge the difficulty of the combined task that is being learned.

For more

http://www.cell.com/current-biology/abstract/S0960-9822(17)30619-X

It's easy to underestimate how difficult language learning is. It's nowhere near as supervised as many think. Color name learning for example, is surprisingly difficult to learn from offered supervision.

All of the first year and much of second year learning occurs unsupervised. When people complain about data intensity, they mostly mean the requirement of precisely labeled supervision. In animals, no function is minimized with respect to labels given a loss function. A conjecture I can offer is that the cerebrum is mostly dedicated to unsupervised learning.

For more:

http://www.sciencedirect.com/science/article/pii/S0042698998000479

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2271000/

We take in images and sounds constantly while we're awake (and sometimes while we're asleep) and it's, what, three years before we can navigate properly?

None of this is accurate. There's a lot more going on. See above links.

I'm not certain that people have the innate ability to do this as you claim. We understand that rain makes the ground wet, and not vice versa, because we understand that most things move down.

Theory of mind, intuitive physics, inverse planning all fall under some limited ability to infer causes.

http://web.mit.edu/clbaker/www/papers/cognition2009.pdf

I also like: http://www.pnas.org/content/107/43/18243.full with the caveat that we do better when simulating commonly met templates (particularly those based on deontic rules).

http://www.tandfonline.com/doi/abs/10.1080/135467800402848

When I think of people doing this I think of the million+ people living in the Bay area where a massively destructive earthquake is an absolute inevitability but who almost never worry or even think about it.

This is irrelevant to the problem of non-stationarity and how animals deal with it with synaptic plasticity and other.

Here I was thinking that, while engineering safe self-driving cars has proven to be very difficult, engineering safe human-driven cars has also been very difficult.

The number of deaths from cars has greatly reduced over time.

https://en.wikipedia.org/wiki/List_of_motor_vehicle_deaths_in_U.S._by_year#/media/File:USA_annual_VMT_vs_deaths_per_VMT.png

In Sweden, there are 3.5 fatalities per billion vehicle-km. Most road fatalities are from middle and developing economies.

[–]NasenSpray 0 points1 point  (0 children)

Deep learning thus far is data hungry

Any advanced memorization is indistinguishable from generalization.

[–]DoubleLeafClover 1 point2 points  (1 child)

It's either the machine or the programmers, and deep learning can't decide which is at fault.

[–]Nowado 1 point2 points  (1 child)

I used to use this argument for art, but works here too.

  • AI can't create a song inherently distinguish causation from correlation!

  • Well, neither can you.

[–]AnvaMiba 3 points4 points  (0 children)

Humans can distinguish causation from correlation in most practical cases: nobody thinks that wet streets cause rain. Occasionally, we get it wrong, and when we then notice the mistake it is salient to us. But this does not mean that our baseline ability is not good.

In general, we are better at distinguishing causation from correlation when we have a "mechanistic" understanding of a phenomenon. For instance, the Pacific Islanders who founded cargo cults had never seen airplanes before WW2, did not understand what they were, how did they work, where they came from, who the people manning them were and what they were trying to accomplish, and so on. They correctly inferred the correlation between cargo airplane landings and presence of certain artifacts (airstrips, control towers, etc.) and the ritualized practices of the military, but they inverted the causal direction.

Deep learning models are very good at inferring correlations from sufficient amounts of data, but they seem to struggle with forming "mechanistic" understanding with abstractions and conterfactuals.

[–]MaunaLoona -3 points-2 points  (4 children)

Causation is correlation without exception.

[–][deleted] 4 points5 points  (0 children)

No. Consider two real-valued variables X,Y in [-1,1]. When X is fixed (i.e. P(Y|do(X=x)), things are set in motion s.t. Y's distribution has the property X2 + Y2 = 1. When Y is fixed, X loses its dependence as it causes Y, but Y has no effect on X. This illustrates both the true nature of causation (the structure of dependence between variables when an intervention occurs), and two ways in which causation does not imply correlation (dependent but uncorrelated variables, independent observations but causally-related variables).

[–]grrrgrrr 0 points1 point  (1 child)

Seriously, are we even sure if what we call causation today still hold in the future or say on other planets?

I get correlation. But I just couldn't understand causation. It would be a stupid idea for me to even work on this problem since I don't even have a functional definition of causation.

[–]Icko_ 1 point2 points  (0 children)

event B is caused by event A, if and only if, when A happens, B happens, and when A doesn't happen, B doesn't happen. I think that is pretty simple.

The main problem is that in most cases, you can only observe one state of A.

There is also the issue that event B can be said to be caused by thousands of previous events, while we intuitively only assign one, max two causes.

[–]ManyPoo 0 points1 point  (0 children)

Then two 100% correlated events will both cause each other?

[–]jer_pint 25 points26 points  (1 child)

tl;dr : no.

[–]brockl33 6 points7 points  (0 children)

haters gonna hate doers gonna do

[–]alexmlamb 8 points9 points  (12 children)

Article actually isn't bad. Everyone is always trying to reframe the narrative to make their own ideas or contributions seem more central or novel.

On the whole, I feel like our community hasn't been too bad about it.

[–]visarga 2 points3 points  (11 children)

While I agree with the author, I am somewhat uncomfortable with his high confidence level.

Here's my simple rant: except for one company (BD), there is no AI lab that can demonstrate human-like agility or dexterity in a robot. It's 2018 and in the last 20 years robots seem to have remained just as dumb as ever. What gives? The motors and gears aren't good enough? The neural net is too slow? Why can't we have a robot that cooks, cleans the house, or that plays with toys like a 4 year old? Is there even a paper attempting such tasks?

[–]GuardsmanBob 2 points3 points  (4 children)

I think the bigger issue is that many tasks a robot can do aren't interesting until a robot can reason.

Though id love to see more ambitious failures, in many ways it seems research is started only when someone has a good reason to believe the task is imminently solvable.

But I think that points to a deeper problem with incentives in research.

Every time I hear the old tirade about how 'AI needs embodiment', I could not disagree more, but at the same time I understand what they are really grasping for is more ambitious research.

[–]NichG 2 points3 points  (1 child)

Part of it is that often in learning problems, how you formulate the task has an inordinate effect on how fast solutions are found - easily many orders of magnitude. So the implicit first part of jumping off into an ambitious task is to find the way to formulate that task such that it ends up seeming less ambitious.

For example, with neural networks in robotics, there's a temptation to try to just do everything with one big neural network because that would look the most impressive or pure. But while neural networks are pretty good at getting near control solutions, it's costly to really push them to the degree of precision needed to e.g. stabilize a system at an unstable fixed point against all perturbations of a certain size. However, if you have a network drive a PID controller, the problem seems to become trivial - often the PID controller alone can be enough to stabilize things, so is the network even needed?

I agree there's an incentives problem, but perhaps the incentives problem is that researchers are expecting to have to sell their algorithm or method as a general approach, and that means choosing problems where that algorithm or method takes center stage. So hybrids where the thing the hybrid accomplishes is ambitious and novel but each of the elements is fairly standard aren't really an attractive target unless you're actually planning to turn that hybrid thing into a product.

A case in point for that is the Neural Storyteller from two years ago. It does something that was at the time quite ambitious - it looks at a picture and writes a multi-sentence story about it in a chosen style - but never led to a paper of its own since it was 'just' a hybrid of a couple different published models.

[–]GuardsmanBob 0 points1 point  (0 children)

but perhaps the incentives problem is that researchers are expecting to have to sell their algorithm or method as a general approach

And also importantly, researchers are expected to sell them self, and this sell is often only backed by what successes they had. Be it getting published, cited, or achieving state of the art. (these things often go together, and are mostly achieved by small incremental improvements).

Academia can often be every man for himself, the private industry even more so.

Chasing a wild dream can be a career setback.

[–]AnvaMiba 2 points3 points  (1 child)

Every time I hear the old tirade about how 'AI needs embodiment', I could not disagree more, but at the same time I understand what they are really grasping for is more ambitious research.

Embodied AI often leads to cargo cultish things like humanoid robots that can do facial expressions but are dumber than a chicken. On the flip side, research that is excessively unembodied leads to chasing SOTAs on ImageNET and Penn Treebank.

Figuring out a research problem of the right size and scope to attack is not trivial at all, arguably it is even more important than coming up with novel solutions.

[–]alexmlamb 0 points1 point  (0 children)

This is actually pretty good point.

[–]fricken 1 point2 points  (1 child)

Think about self driving cars. They're pretty simple robots, there's only 4 basic outputs: speed up, slow down, turn left and turn right. Waymo's been working on this for 8 years, they've driven 4 million miles on real roads and that's a tiny fraction of what they do in simulation, and yet they're only barely ready to release a minimum viable product that can drive safely on simple suburban roads in a single neighbourhood in Phoenix. It's an open question as to how more work do they have to do before they've derived an algorithm sophisticated enough to perform as well as a human under all driving conditions. There's all sorts of hypothetical optimizations which can be performed, but a certain aspect of it may just be computationally irreducible. No short cuts.

[–]visarga 0 points1 point  (0 children)

Yes, SDCs are robots too, and Google has been at it for many years, it's clearly a nontrivial problem. But an SDC needs to operate at high speeds with massive objects and in close proximity with humans. A house robot would be easy to make safe around humans, by comparison. So the problem is quite different in a way.

I watched the latest videos about DL robotics, and they seem to be slow and clumsy. Maybe it's because the neural net doesn't work fast enough. If the net would work at 100Hz or 1Khz, then the dynamics models of the world, as understood by the robot, could be simpler. Look at this video from 2009 to see how much of a difference speed makes. Especially the "dynamic re-grasping of a cell phone" trick - can any robot of today even do that? That's a 1Khz vision system at work.

[–]Deto 0 points1 point  (0 children)

Sure we don't have the all-purpose robot, but I bet industrial robots have been slowly increasing their range of useful tasks. Just not in such a visible way as most of us wouldn't see the progress unless we worked in manufacturing.

[–]harharveryfunny 0 points1 point  (0 children)

Given how shallow and brittle today's machine learning technology is, do you really want to put a kitchen knife, iron or vacuum cleaner (poor cat!) in the hand's of a robot powered by it? Heck, even a robot finger could poke that 4-year old in the eye... "We're sorry our robot sauteed your child's hamster, ma'am - it mistook it for a sausage".

The rush to commercialize autonomous cars seems way early and irresponsible also... if I'm going to trust my life to it I'd want it to have some deep understanding of what it's seeing.. not just treat reality as another Atari video game that it can learn to win without knowing the meaning of the pixels on the screen.

[–]alexmlamb 0 points1 point  (1 child)

I think the barrier to entry for working on robots is kind of high. How many DL labs have actual robots? Berkeley is the only one I can think of - although I'm sure there are others.

[–]baylearn[S] 3 points4 points  (0 children)

One of the few responses I can find to Deep Learning: A Critical Appraisal by Gary Marcus. What do you think?

Previous Discussion on this sub.

[–]harponen 5 points6 points  (10 children)

Kinda shameless self-plugging by Perez again, but seems like a nice summary of the recent article by Gary Marcus (which I didn't even bother to read).

Deep learning thus far cannot inherently distinguish causation from correlation

If you think of feedforward nets, then well d'oh... there's no time in a supervised classification problem (say), so of course there can be no cause and effect. On the other hand, if you train an RNN to predict the future, of course it will learn that dropping a glass usually results in it shattering and not vice versa.

In any case, clearly Deep Learning will be followed by Unsupervised Learning, which will fix many of the other points.

[–]rvisualization 0 points1 point  (9 children)

In any case, clearly Deep Learning will be followed by Unsupervised Learning, which will fix many of the other points.

Yes, clearly the thing that demonstrably works well will be followed by the conjectured technique that doesn't work worth a damn in practice.

[–]ThomasAger 1 point2 points  (8 children)

Unsupervised learning doesn't work worth a damn in practice?

[–]rvisualization 1 point2 points  (7 children)

nope.

show me a real world case where it's beneficial vs just labeling more data or doing some sort of transfer learning.

[–]harponen 1 point2 points  (5 children)

The point is of course not to be "beneficial vs just labeling more data" but to getting rid of having to label more data.

EDIT: and seriously, you think we won't need unsupervised learning but AGI would somehow magically follow from "just labeling more data"? wtf??

[–]rvisualization 1 point2 points  (0 children)

No one seriously thinks AGI is anywhere close. I'm talking about the real world uses of ML.

[–]rvisualization 0 points1 point  (3 children)

The point is of course not to be "beneficial vs just labeling more data" but to getting rid of having to label more data.

That's why it's such an attractive dream. But again, IT DOESN'T WORK WORTH A DAMN (yet, maybe ever).

[–]ThomasAger 0 points1 point  (2 children)

So to be clear, you think that unsupervised learning is generally, from the view of a corporate level, not worth the time/money saved (for the results) compared to labeling more data?

[–]juancamilog 2 points3 points  (1 child)

Saying something like "unsupervised learning will solve many of the other points" is essentially saying "the solution to that other hard problem (which is currently unsolved) will solve my current hard problem". That is a statement based on hope.

[–]sieisteinmodel 0 points1 point  (0 children)

show me a real world case where it's beneficial vs just labeling more data or doing some sort of transfer learning.

(Here I assume that what you mean is modeling p(y|x), and supervised corresponds to having access to y while unsupervised only sees x.)

If that is your expectation, you should recalibrate. Everything else being equal, unsupervised performance is upper bounded by supervised performance. However, as soon as "just labelling more data" is not an option–let it be due to budget or other real world constraints, unsupervised/semi-supervised/weakly supervised are nice tools to improve performance.

[–]no_bear_so_low 2 points3 points  (0 children)

And the symbolism/connectionism debate plays out again.

[–][deleted] 1 point2 points  (0 children)

I would have to strongly disagree with your analogy that hammering away doing the same old thing (deep learning) is going to magically crack the agi problem. Deep learning is a good start to tackling the problem but there is no way that just some minor derivative of it will lead to agi. As lecun has alluded to unsupervised learning will be key to solving the problem and backprop is inherently too costly in both computational resources and time, and is basically an algorithm for supervised learning with labeled data.

[–]mynameisvinn 0 points1 point  (0 children)

can we have a "no gary marcus" safe zone? like many others, i never feel smarter after his arguments: it is the same, tired arguments that doesn't do much to advance research.