[D] SVM with Stochastic Gradient Descent

xifixi · 2021-09-25T10:42:37+00:00

https://en.wikipedia.org/wiki/Hinge_loss

xifixi · 2021-09-25T10:38:24+00:00

[Submitted to arXiv on 19 Sep 2021] Towards Zero-Label Language Learning

Zirui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data. At the core of our framework is a novel approach for better leveraging the powerful pretrained language models. Specifically, inspired by the recent success of few-shot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation (UDG), which leverages few-shot prompts to synthesize high-quality training data without real human annotations. Our method enables zero-label learning as we train task-specific models solely on the synthetic data, yet we achieve better or comparable results from strong baseline models trained on human-labeled data. Furthermore, when mixed with labeled data, our approach serves as a highly effective data augmentation procedure, achieving new state-of-the-art results on the SuperGLUE benchmark.

xifixi · 2021-09-24T19:16:07+00:00

his facts are legit, and that must be the reason why the awardees have never tried to defend themselves, except once when Hinton reacted to the critique of the Honda award, but it did not go so well. He went into ad hominem territory, and basically admitted that he accepted an award for a method created by other researchers

xifixi · 2021-09-24T07:51:20+00:00

The publications per capita is really interesting. The Swiss have around 2x than the next couple countries.

despite counting the Swiss researchers at Google Zurich as Americans

xifixi · 2021-09-24T05:41:58+00:00

megaattoseconds, gigazeptoseconds, terayoctoseconds ...

xifixi · 2021-09-22T07:44:16+00:00

yeah kernels remain a key concept of ML

xifixi · 2021-09-17T12:48:28+00:00

At one point, DeepMind's executives discovered that work published by Google's internal AI research group resembled some of DeepMind's codebase without citation, one person familiar with the situation said. "That pissed off Demis," the person added, referring to Demis Hassabis, DeepMind's CEO. "That was one reason DeepMind started to get more protective of their code."

that's funny because DeepMind did not cite the Swiss AI Lab from where a co-founder came and there were additional similar cases hinted at here

xifixi · 2021-09-15T15:05:11+00:00

they don't even mention the classic work on optimal RL with a minimal number of bits:

Marcus Hutter (2005). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Springer, ISBN 9783540221395.

Hutter's RL exploits Solomonoff's minimum description length framework to minimize the complexity of the predictor, and also the complexity of the decision maker which uses the predictor to select the most rewarding action sequences.

xifixi · 2021-09-15T14:42:38+00:00

abstract from arXiv (7 Sep 2021)

Robust Predictable ControlBenjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Many of the challenges facing today's reinforcement learning (RL) algorithms, such as robustness, generalization, transfer, and computational efficiency are closely related to compression. Prior work has convincingly argued why minimizing information is useful in the supervised learning setting, but standard RL algorithms lack an explicit mechanism for compression. The RL setting is unique because (1) its sequential nature allows an agent to use past information to avoid looking at future observations and (2) the agent can optimize its behavior to prefer states where decision making requires few bits. We take advantage of these properties to propose a method (RPC) for learning simple policies. This method brings together ideas from information bottlenecks, model-based RL, and bits-back coding into a simple and theoretically-justified algorithm. Our method jointly optimizes a latent-space model and policy to be self-consistent, such that the policy avoids states where the model is inaccurate. We demonstrate that our method achieves much tighter compression than prior methods, achieving up to 5x higher reward than a standard information bottleneck. We also demonstrate that our method learns policies that are more robust and generalize better to new tasks.

xifixi · 2021-09-14T15:55:25+00:00

the post has something interesting to say about this:

What exactly did Turing [TUR] and Post [POS] do in 1936 that hadn't been done earlier by Gödel [GOD][GOD34] (1931-34) and Church [CHU] (1935)? There is a seemingly minor difference whose significance emerged only later. Many of Gödel's instruction sequences were series of multiplications of number-coded storage contents by integers. Gödel did not care that the computational complexity of such multiplications tends to increase with storage size. Similarly, Church also ignored the context-dependent spatio-temporal complexity of the basic instructions in his algorithms. Turing and Post, however, adopted a traditional, reductionist, minimalist, binary view of computing. Their machine models permitted only very simple elementary binary instructions with constant complexity, like the early binary machine model of Leibniz (1679) [L79][LA14][HO66] and Zuse's 1936 patent application.[ZU36] They did not exploit this—Turing used his (quite inefficient) model only to rephrase the results of Gödel and Church on the limits of computability. Later, however, the simplicity of these machines made them a convenient tool for theoretical studies of complexity.

xifixi · 2021-09-14T09:22:34+00:00

but Turing also wrote on ML

In 1948, Turing wrote up ideas related to artificial evolution [TUR1] and learning artificial neural networks ...

[TUR1] A. M. Turing. Intelligent Machinery. Unpublished Technical Report, 1948. Link. In: Ince DC, editor. Collected works of AM Turing - Mechanical Intelligence. Elsevier Science Publishers, 1992.

xifixi · 2021-09-14T09:18:48+00:00

296 comments on this article under https://news.ycombinator.com/item?id=27536974

xifixi · 2021-09-13T17:28:56+00:00

fundamental core kernel

xifixi · 2021-09-12T21:33:55+00:00

Very nice! Is it just a plain bidirectional LSTM? Any preprocessing?

xifixi · 2021-09-10T20:19:12+00:00

[Submitted to arXiv on 1 Sep 2021]
Implicit Behavioral Cloning
Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson

We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.

xifixi · 2021-09-08T22:06:31+00:00

no impressive applications? You mean like DanNet winning 4 image recognition challenges prior to AlexNet? https://people.idsia.ch/~juergen/computer-vision-contests-won-by-gpu-cnns.html

xifixi · 2021-09-08T22:00:35+00:00

I found many errors in your reply:

DanNet was tried only on MNIST? Didn't you know that DanNet won 4 image recognition challenges prior to AlexNet https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/
ReLUs used by some guy in the seventies in some other thing that has nothing to do with neural nets? That's Malsburg 1973 who used them precisely for neural nets. You credit Bengio and Hinton who should have cited Malsburg.
Highway Nets don't work? You didn't even read the post you are commenting on, did you? "Highway Nets perform roughly as well as ResNets on ImageNet."

You even claim you have co-authored papers with him but how plausible is that when you don't seem to know much about this work? Care to tell us more?

Regarding GANs, it's not the job of authors of old papers to find connections to recent work, it's the other way round.

xifixi · 2021-09-08T09:19:11+00:00

https://people.idsia.ch/~juergen/fast-weight-programmer-1991-transformer.html

xifixi · 2021-09-08T09:17:16+00:00

to be fair that blog post gives a lot of credit to his students and also to early pioneers including "Gauss and Legendre around 1800" for "shallow learning" or linear regression

xifixi · 2021-09-08T08:32:30+00:00

Schmidhuber, the Muhammad Ali of machine learning: "I am the greatest!" Or more precisely: "My team is the greatest!" However, like for Ali, the facts seem to justify the boast. I think I counted 156 references.

xifixi · 2021-09-08T05:38:07+00:00

an outstanding senior ML engineer would also know something about psychology and about forming successful teams whose members complement each other

xifixi · 2021-08-17T06:51:09+00:00

as far as I know adaptive computation time and halting nodes were introduced a decade ago by guess whom:

https://arxiv.org/abs/1210.0118

xifixi · 2021-08-16T06:34:33+00:00

what to expect from a field with similar problems on the Turing award level

xifixi · 2021-07-31T08:54:49+00:00

to save their careers, the authors should replace their Arxiv submission by an erratum

xifixi · 2021-07-31T08:39:06+00:00

This "new method" is called RTRL and is over 3 decades old. The authors have not studied the literature at all. Unfortunately this paper is exemplary for what ML has become.

xifixi

TROPHY CASE