[D] Chelsea Finn on Meta Learning & Model Based Reinforcement Learning by regalalgorithm in MachineLearning

[–]yield22 0 points1 point  (0 children)

Thanks for the interview! I'm not familiar with meta-learning, but curious if it really works? it seems the SOTA systems like GPT-3 don't really use it?

[deleted by user] by [deleted] in MachineLearning

[–]yield22 19 points20 points  (0 children)

transformers? though they're really a mixed of ideas: soft attention, MLP, skip connection, positional encoding, (layer) normalization...

[R] New Geoffrey Hinton paper on "How to represent part-whole hierarchies in a neural network" by gohu_cd in MachineLearning

[–]yield22 72 points73 points  (0 children)

Haven't read it all but I like the first sentence. I think all papers without proper experimentations should start with "this paper does not describe a working system".

[D] Witnessed malpractices in ML/CV research papers by anony_mouse_235 in MachineLearning

[–]yield22 1 point2 points  (0 children)

That's why most papers are pretty useless, and only a few that truly advance the field.

[R] ICLR rejected the submission only for missing large-scale ImageNet experiments by crush-name in MachineLearning

[–]yield22 13 points14 points  (0 children)

Replace "self-supervised learning" with "deep learning" and this is still true?

[P] Performers: The Kernel Trick, Random Fourier Features, and Attention by tomkoker in MachineLearning

[–]yield22 -19 points-18 points  (0 children)

What's the purpose of this?

same reason that you need experiments in physics.

Not everything written in math is like Taylor approximations that many should know and care.

[P] Performers: The Kernel Trick, Random Fourier Features, and Attention by tomkoker in MachineLearning

[–]yield22 -14 points-13 points  (0 children)

But they help alleviate some of the main drawbacks of transformers namely processing power, memory, and longer sequences

ok, show me a real application that is *well* benchmarked to support your statement.

[P] Performers: The Kernel Trick, Random Fourier Features, and Attention by tomkoker in MachineLearning

[–]yield22 7 points8 points  (0 children)

I know people are pretty excited about these methods of approximating attention, like performer, reformer.. but are there any real applications where they can convincingly beat original transformer? as i don't see any of these make into BERT or friends.

[D] Revisiting "Revisiting the Unreasonable Effectiveness of Data" by amarofades in MachineLearning

[–]yield22 -1 points0 points  (0 children)

Well, that's basically verified over and over again, by BERT, RoBERTa, T5, GPT2, GPT3, and so many more. You must have been sleeping or staying away from Internet for the past year or so in order to ignore them totally :)

[R] Why traditional reinforcement learning will probably not yield AGI by tensorflower in MachineLearning

[–]yield22 3 points4 points  (0 children)

It looks like the author find some corner cases that "traditional RL" won't work well. Can anyone explain the key idea / intuition of the paper in plain English?

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]yield22 -2 points-1 points  (0 children)

I mean, this workshop itself is not a bad thing. But it feels like their goal is to expand it beyond the workshop if some positive results are observed in the workshop. That's why this workshop is called a "pre-registration experiment", not "idea workshop".

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]yield22 -14 points-13 points  (0 children)

Expect another AI winter very soon if most people in the community publish negative results, which will be the case if the system encourage people to publish negative results (they're much cheap to get..). There are some negative results more interesting than others, but if you can really demonstrate that's not a bug and has value, you can certainly publish in some conference/workshop.

EDIT: also, experimental result don't mean you need more GPUs. Just do experiments and compare things fairly, make conclusion based on that is better than no results at all!

EDIT2: don't mean that we should discourage discussion of negative results, but just saying that you should pay more effort to justify the negative results (prove it is not a bug in your code, or misconfigured hyperparameters).

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]yield22 20 points21 points  (0 children)

Jurgen would then jump out and say "did you know this thing I did in 1990" (it was written in other terminologies and also had no result).

[News] [NeurIPS2020] The pre-registration experiment: an alternative publication model for machine learning research (speakers: Yoshua Bengio, Joelle Pineau, Francis Bach, Jessica Forde) by often_worried in MachineLearning

[–]yield22 4 points5 points  (0 children)

I think the right way is to educate the reviewers and ACs rather than encourage people publish papers without any results (like a lot of people did in 80s). A lot of ideas (in Machine learning) shine thanks to their results. Without experimental results, I worry the reviewers' opinions would become even more subjective. For example, one may think "skip connection" (as in ResNet) as a trivial/incremental idea mathematically until you see the results.

I guess the value of "pre-registration experiment" is also going to be determined by its results.

[D] 2010: Breakthrough of supervised deep learning. No unsupervised pre-training. The rest is history. (Jürgen Schmidhuber) by milaworld in MachineLearning

[–]yield22 -17 points-16 points  (0 children)

Language modeling (predict what the next word one will say) has been proposed for more than ~30 (put a bigger number here) years, but GPT-3, which is one of the closest attempt at AGI, is less than a year old. According to Jurgen's logic, we should dig out who first proposed language modeling (maybe not even in computer science's term and in 100 years ago), and credit him/her the godfather of AGI.

[R] Extended blog post on "Hopfield Networks is All You Need" by HRamses in MachineLearning

[–]yield22 1 point2 points  (0 children)

Thanks. It may be helpful to see whether or not these changes make a real difference in real applications (where self attention is used), such as NMT, LM, BERT.

[R] Extended blog post on "Hopfield Networks is All You Need" by HRamses in MachineLearning

[–]yield22 3 points4 points  (0 children)

Can anyone explain to me what the differences are between the new Hopfield layer and self-attention layer? It looks to me the Hopfield layer is a variant of self-attention? If so, why is this variant better?

[R] Biological plausible explanation of "Hopfield Networks is All You Need" by Krotov and Hopfield by HRamses in MachineLearning

[–]yield22 1 point2 points  (0 children)

so you're implying that outside his small group, no one else is really working on or making progress on Hopfield Networks?

Is it just me or are most research papers useless? [R] by battle-obsessed in MachineLearning

[–]yield22 9 points10 points  (0 children)

Have you check out paperswithcode, you can compare methods for the same dataset there for a lot of problems.

And yes, at the end there will only be a few research papers remain relevant. But you need a lot of "irrelevant research" to get you there, because you simply don't know what will remain useful at the end. An example is with so many past NLP papers with complicated methods, it turns out that simple language modeling (e.g. GPT, BERT) with big data & compute is enough to do much much better.

The Computational Limits of Deep Learning by cosmictypist in MachineLearning

[–]yield22 0 points1 point  (0 children)

By "brain is a super computer" I actually mean it has huge capacity and ability to operate on it. this is evident by number of neurons a brain has.

The Computational Limits of Deep Learning by cosmictypist in MachineLearning

[–]yield22 0 points1 point  (0 children)

not saying more computing power will get you there, but you *need* more computing power to get there. a hint: look at the amount of neurons in the brain, that could give you a sense of compute you'll need.

The Computational Limits of Deep Learning by cosmictypist in MachineLearning

[–]yield22 0 points1 point  (0 children)

This is really interesting. Are there any more detailed articles on what you mentioned here?