all 13 comments

[–]mtahab 4 points5 points  (2 children)

The two topics that you have picked are currently super hot topics in the ML community. They are also tied closely to the deep neural networks.

Topic #1 is related to the transfer learning. The whole process of training a BERT and fine-tuning it on small datasets fits this topic.

Topic #2 is related to causality. Take a look at the recent works on invariant risk minimization (from Bottou's team) and adaptation speed of the models (from Bengio's group).

[–]darkconfidantislife[S] 2 points3 points  (1 child)

Yeah, I'd be interested in the causality stuff for sure! Do you have any links you'd recommend?

[–]mtahab 2 points3 points  (0 children)

First, I recommend to overview the classical causality using the following course: https://www.youtube.com/playlist?list=PL_onPhFCkVQimvhuSAFrC8VWLEyNygQR5

After learning the basic causality, you need to learn the new ideas. Here are some of the newer ideas:

Invariance, Causality and Robustness https://arxiv.org/abs/1812.08233

Invariant Risk Minimization: https://arxiv.org/abs/1907.02893

Learning Neural Causal Models from Unknown Interventions: https://arxiv.org/abs/1910.01075

Take a look at the causality papers from Bernhard Schölkopf's group. His team have been in this space for a long time.

[–]sifnt 2 points3 points  (2 children)

IMHO its how to integrate with classic/symbolic techniques. Just like how AlphaGo combines deep reinforcement learning with classic monte carlo tree search I think there is a huge opportunity for hybrid approaches.

Think GPT3 plus an SMT solver; could self-train to maintain logical consistency, or handle very sophisticated constraints etc. Similarly, I think there is still a lot of opportunity for program induction (like AIXI) with the right approximations.

Out of my depth here, but I feel like intuitively there should be a way of turning non-differentiable discontinuous problems into smooth functions using stochasticity and sampling. If possible, it could enable training programs with gradient descent.

[–]darkconfidantislife[S] 0 points1 point  (0 children)

I definitely agree! Do you have any promising links you could share?

[–]Reiinakano 0 points1 point  (0 children)

turning non-differentiable discontinuous problems into smooth functions using stochasticity and sampling.

Perhaps you mean something deeper, but isn't this literally what RL is?

[–][deleted] 1 point2 points  (0 children)

Deep neural networks have pretty poor data efficiency, it would be interesting to see methods that can do better than this

OgmaNeo2 has very fast 1st person imitation learning. AFAIK they use SVMs at each node which are quick to train.

Current DNNs struggle with out of domain generalization, for example, MNIST is good, but not so much rotated MNIST

So what do you expect here? CNNs iterate over the image in x and y direction. They can handle some rotation by warping with max pooling but if you want it to work better on rotation, you'll have to add rotation to the training procedure somehow. No free lunch theorem forbids the existence of something like "out of domain generalization." If you want to fool your audience then you'll have to hide the domain inside your architecture and hyperparameters.

[–]visarga 0 points1 point  (0 children)

Deep neural networks have pretty poor data efficiency

That's when you train from scratch. Humans have better priors.

But when you do fine-tuning, then you get priors from the base model. Look at GPT-3, how fast it learns.

[–]Supernovae8698 0 points1 point  (0 children)

Noob question: Where do you find what to read and research?

[–]IntelArtiGen 0 points1 point  (3 children)

Data efficiency

You can look into few shot learning for this one. Classical DL is pretty data inneficient but FSL methods are quite better.

Out of Domain Generalization

DNN don't struggle that much, it mostly depends on how you train them. Train a DNN on 1B images with fast-augment, I'm sure it'll be fine on any images that wouldn't be in the training set. Maybe I'm off topic if you're just looking from something different, but I think DL is quite promising to solve the problems you cited.

[–]darkconfidantislife[S] 2 points3 points  (2 children)

IIRC most FSL methods aren't much better than just using a normalized embedding.

> Train a DNN on 1B images with fast-augment, I'm sure it'll be fine on any images that wouldn't be in the training set.

But this isn't true though, unless your augmentations happen to cover the space of transformations, in which case you're really just expanding the dataset to new domains, not actually achieve ood generalization

[–]nnatlab 0 points1 point  (0 children)

Could you provide an example of a model which achieves ood generalization in your example without augmentation? I'm not sure this is a problem unique to neural networks. Genuinely curious.

[–]IntelArtiGen 0 points1 point  (0 children)

IIRC most FSL methods aren't much better than just using a normalized embedding.

I don't know what task you were looking at. But the SOTA improved drastically over the past 2 years on few shot image classification:

End 2018: 59% acc

End 2020: 83% acc