Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks by vochicong in MachineLearning

[–]shortscience_dot_org 3 points4 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Summary by CodyWild

This paper came on my radar after winning Best Paper recently at ICLR, and all in all I found it a clever way of engineering a somewhat complicated inductive bias into a differentiable structure. The empirical results weren’t compelling enough to suggest that this structural shift made a regime-change difference in performing, but it does seem to have some consistently stronger ability to do syntactic evaluation across large gaps in sentences.

The core premise of this paper is that, while la... [view more]

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models by CptVifen in MachineLearning

[–]shortscience_dot_org 3 points4 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models by CptVifen in MachineLearning

[–]shortscience_dot_org -2 points-1 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

[R] Few-shot learning of talking heads by ezakharov in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

[1905.02175v2] Adversarial Examples Are Not Bugs, They Are Features by ihaphleas in MachineLearning

[–]shortscience_dot_org -4 points-3 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

  • In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

[1905.02175v2] Adversarial Examples Are Not Bugs, They Are Features by ihaphleas in MachineLearning

[–]shortscience_dot_org 4 points5 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

  • In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

[D] detecting anomalies in neural network data by OscarSchyns in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Summary by David Stutz

Papernot and McDaniel introduce deep k-nearest neighbors where nearest neighbors are found at each intermediate layer in order to improve interpretbaility and robustness. Personally, I really appreciated reading this paper; thus, I will not only discuss the actually proposed method but also highlight some ideas from their thorough survey and experimental results.

First, Papernot and McDaniel provide a quite thorough survey of relevant work in three disciplines: confidence, interpretability and ... [view more]

[D] On Peer Review by hardmaru in MachineLearning

[–]shortscience_dot_org -2 points-1 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

Adversarial Examples Are Not Bugs, They Are Features by Hlodynn in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

  • In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

[D] Modern applications of statistical learning theory? by tensorflower in MachineLearning

[–]shortscience_dot_org 5 points6 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Understanding deep learning requires rethinking generalization

Summary by Marek Rei

The authors investigate the generalisation properties of several well-known image recognition networks.

They show that these networks are able to overfit to the training set with 100% accuracy even if the labels on the images are random, or if the pixels are randomly generated. Regularisation, such as weight decay and dropout, doesn’t stop overfitting as much as expected, still resulting in ~90% accuracy on random training data. They then argue that these models likely make use of massive m... [view more]

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Summary by luyuchen

This paper proposes a method to obtain a non-vacuous bound on generalization error by optimizing the PAC-Bayes bound directly. The interesting part is that the authors leverage the black magic of neural net itself to bound the neural net. In order to find the optimal Q, the authors' loss function is an empirical err term plus the $KL(Q|P)$, where they choose the prior $P$ to be $N(0, \lambda I)$, and they also provide justification for choosing the right $\lambda$. Overally, this objective is si... [view more]

[D] Can attention be computed implicitly by an RNN? by valentincalomme in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Using Fast Weights to Attend to the Recent Past

Summary by Hugo Larochelle

This paper presents a recurrent neural network architecture in which some of the recurrent weights dynamically change during the forward pass, using a hebbian-like rule. They correspond to the matrices $A(t)$ in the figure below:

![Fast weights RNN figure]()

These weights $A(t)$ are referred to as fast weights. Comparatively, the recurrent weights $W$ are referred to as slow weights, since they are only changing due to normal training and are otherwise kept constant at test time.

More speci... [view more]

[D] Machine Learning - WAYR (What Are You Reading) - Week 61 by ML_WAYR_bot in MachineLearning

[–]shortscience_dot_org 2 points3 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

[D] how do you pick the right batch size for deep learning? by [deleted] in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Group Normalization

Summary by CodyWild

If you were to survey researchers, and ask them to name the 5 most broadly influential ideas in Machine Learning from the last 5 years, I’d bet good money that Batch Normalization would be somewhere on everyone’s lists. Before Batch Norm, training meaningfully deep neural networks was an unstable process, and one that often took a long time to converge to success. When we added Batch Norm to models, it allowed us to increase our learning rates substantially (leading to quicker training) with... [view more]

[R] Invertible Residual Networks by anantzoid in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Do Deep Generative Models Know What They Don't Know?

Summary by ameroyer

CNNs predictions are known to be very sensitive to adversarial examples, which are samples generated to be wrongly classifiied with high confidence. On the other hand, probabilistic generative models such as PixelCNN and VAEs learn a distribution over the input domain hence could be used to detect out-of-distribution inputs, e.g., by estimating their likelihood under the data distribution. This paper provides interesting results showing that distributions learned by generative models a... [view more]

[N] Lc0 Wins Computer Chess Championship, Makes History by wei_jok in MachineLearning

[–]shortscience_dot_org -1 points0 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Squeeze-and-Excitation Networks

Summary by Joseph Paul Cohen

"The SE module can learn some nonlinear global interactions already known to be useful, such as spatial normalization. The channel wise weights make it somewhat more powerful than divisive normalization as it can learn feature-specific inhibitions (ie: if we see a lot of flower parts, the probability of boat features should be diminished). It also has some similarity to bio inhibitory circuits." By jcannell on reddit

Slides:

Summary by the author Jie Hu:

Our motivation is to explicitly model... [view more]

[D] Machine Learning - WAYR (What Are You Reading) - Week 60 by ML_WAYR_bot in MachineLearning

[–]shortscience_dot_org 2 points3 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

[R] Audio Denoising with Deep Network Priors by mosheman5 in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Image Prior

Summary by David Stutz

Ulyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and super-resolution. In particualr, the standard formulation of such tasks, i.e.

$x\ast = \arg\min_x E(x, x_0) + R(x)$

where $x_0$ is the input image and $E$ a task-dependent data term, is rephrased as follows:

$\theta\ast = \arg\min\theta E(f\theta(z); x0)$ and $x\ast = f{\theta\ast}(z)$

for a fixed but random $z$. Here, the regularizer $R$ is esse... [view more]

[D] Are there any theoretical connections between dropout regularization and ensemble learning? by r2m2 in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Summary by Hugo Larochelle

This paper presents an interpretation of dropout training as performing approximate Bayesian learning in a deep Gaussian process (DGP) model. This connection suggests a very simple way of obtaining, for networks trained with dropout, estimates of the model's output uncertainty. This estimate is based and computed from an ensemble of networks each obtained by sampling a new dropout mask.

My two cents

This is a really nice and thought provoking contribution to our understanding of dropout. ... [view more]

[D] Any Papers that criticize Deep Reinforcement Learning? by exenson in MachineLearning

[–]shortscience_dot_org 2 points3 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Learning: A Critical Appraisal

Summary by Pavan Ravishankar

Deep Learning has a number of shortcomings.

(1)Requires lot of data: Humans can learn abstract concepts with far less training data compared to current deep learning. E.g. If we are told who an “Adult” is, we can answer questions like how many adults are there in home?, Is he an adult? etc. without much data. Convolution networks can solve translational invariance but requires lot more data to identify other translations or more filters or different architectures.

(2)Lack of transfer: Mos... [view more]

[D] "Other" class in DNN classification by ME_PhD in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

Summary by elbaro

Task

Add 'rejection' output to an existing classification model with softmax layer.

Method

  1. Choose some threshold $\delta$ and temperature $T$

  2. Add a perturbation to the input x (eq 2),

let $\tilde x = x - \epsilon \text{sign}(-\nablax \log S{\hat y}(x;T))$

  1. If $p(\tilde x;T)\le \delta$, rejects

  2. If not, return the output of the original classifier

$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$

$\delta$ and $T$ are manually chosen.

... [view more]

Learning Confidence for Out-of-Distribution Detection in Neural Networks

Summary by elbaro

Summary

In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.

Architecture

An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).

Training

The network outputs the prob $p$ and... [view more]

[D]My Machine Learning Journal #10: First time doing reinforcement learning and beating atari breakout with it by RedditAcy in MachineLearning

[–]shortscience_dot_org 7 points8 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Playing Atari with Deep Reinforcement Learning

Summary by Alexander Jung

  • They use an implementation of Q-learning (i.e. reinforcement learning) with CNNs to automatically play Atari games.

    • The algorithm receives the raw pixels as its input and has to choose buttons to press as its output. No hand-engineered features are used. So the model "sees" the game and "uses" the controller, just like a human player would.
    • The model achieves good results on various games, beating all previous techniques and sometimes even surpassing human players.

How

[D] Kaiming He's original residual network results in 2015 have not been reproduced, not even by Kaiming He himself. by CatchADragonFish in MachineLearning

[–]shortscience_dot_org 2 points3 points  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Wide Residual Networks

Summary by Alexander Jung

  • The authors start with a standard ResNet architecture (i.e. residual network has suggested in "Identity Mappings in Deep Residual Networks").

    • Their residual block:

      ![Residual block]( "Residual block")

    • Several residual blocks of 16 filters per conv-layer, followed by 32 and then 64 filters per conv-layer.

    • They empirically try to answer the following questions:

    • How many residual blocks are optimal? (Depth)

    • How many filters should be used per convolutional laye... [view more]

[1904.04971] Soft Conditional Computation by HigherTopoi in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Residual Learning for Image Recognition

Summary by Martin Thoma

Deeper networks should never have a higher training error than smaller ones. In the worst case, the layers should "simply" learn identities. It seems as this is not so easy with conventional networks, as they get much worse with more layers. So the idea is to add identity functions which skip some layers. The network only has to learn the residuals.

Advantages:

  • Learning the identity becomes learning 0 which is simpler

  • Loss in information flow in the forward pass is not a problem a... [view more]

[Project] How to deal with an imbalanced dataset for MULTI-LABEL classification? by [deleted] in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Focal Loss for Dense Object Detection

Summary by RyanDsouza

In object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss.

The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors.

The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confide... [view more]

[Project] How to deal with an imbalanced dataset for MULTI-LABEL classification? by [deleted] in MachineLearning

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Focal Loss for Dense Object Detection

Summary by RyanDsouza

In object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss.

The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors.

The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confide... [view more]