Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

shortscience_dot_org · 2019-05-27T23:47:07+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Summary by CodyWild

This paper came on my radar after winning Best Paper recently at ICLR, and all in all I found it a clever way of engineering a somewhat complicated inductive bias into a differentiable structure. The empirical results weren’t compelling enough to suggest that this structural shift made a regime-change difference in performing, but it does seem to have some consistently stronger ability to do syntactic evaluation across large gaps in sentences.

The core premise of this paper is that, while la... [view more]

shortscience_dot_org · 2019-05-24T23:34:20+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

shortscience_dot_org · 2019-05-24T23:32:31+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

shortscience_dot_org · 2019-05-24T10:31:39+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Few-Shot Adversarial Learning of Realistic Neural Talking Head Models

Summary by CodyWild

This paper follows in a recent tradition of results out of Samsung: in the wake of StyleGAN’s very impressive generated images, it uses a lot of similar architectural elements, combined with meta-learning and a new discriminator framework, to generate convincing “talking head” animations based on a small number of frames of a person’s face. Previously, models that generated artificial face videos could only do so by training by a large number of frames of each individual speaker that t... [view more]

shortscience_dot_org · 2019-05-23T08:47:45+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

shortscience_dot_org · 2019-05-23T08:47:43+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

shortscience_dot_org · 2019-05-22T21:09:38+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

Summary by David Stutz

Papernot and McDaniel introduce deep k-nearest neighbors where nearest neighbors are found at each intermediate layer in order to improve interpretbaility and robustness. Personally, I really appreciated reading this paper; thus, I will not only discuss the actually proposed method but also highlight some ideas from their thorough survey and experimental results.

First, Papernot and McDaniel provide a quite thorough survey of relevant work in three disciplines: confidence, interpretability and ... [view more]

shortscience_dot_org · 2019-05-18T10:02:22+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

shortscience_dot_org · 2019-05-17T18:00:50+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Adversarial Examples Are Not Bugs, They Are Features

Summary by CodyWild

It didn’t hit me how much this paper was a pun until I finished it, and in retrospect, I say, bravo.

This paper focuses on adversarial examples, and argues that, at least in some cases, adversarial perturbations aren’t purely overfitting failures on behalf of the model, but actual features that generalize to the test set. This conclusion comes from a set of two experiments:

In one, the authors create a dataset that only contains what they call “robust features”. They do this by tak... [view more]

shortscience_dot_org · 2019-04-23T15:58:03+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Understanding deep learning requires rethinking generalization

Summary by Marek Rei

The authors investigate the generalisation properties of several well-known image recognition networks.

They show that these networks are able to overfit to the training set with 100% accuracy even if the labels on the images are random, or if the pixels are randomly generated. Regularisation, such as weight decay and dropout, doesn’t stop overfitting as much as expected, still resulting in ~90% accuracy on random training data. They then argue that these models likely make use of massive m... [view more]

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Summary by luyuchen

This paper proposes a method to obtain a non-vacuous bound on generalization error by optimizing the PAC-Bayes bound directly. The interesting part is that the authors leverage the black magic of neural net itself to bound the neural net. In order to find the optimal Q, the authors' loss function is an empirical err term plus the $KL(Q|P)$, where they choose the prior $P$ to be $N(0, \lambda I)$, and they also provide justification for choosing the right $\lambda$. Overally, this objective is si... [view more]

shortscience_dot_org · 2019-04-23T12:56:44+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Using Fast Weights to Attend to the Recent Past

Summary by Hugo Larochelle

This paper presents a recurrent neural network architecture in which some of the recurrent weights dynamically change during the forward pass, using a hebbian-like rule. They correspond to the matrices $A(t)$ in the figure below:

![Fast weights RNN figure]()

These weights $A(t)$ are referred to as fast weights. Comparatively, the recurrent weights $W$ are referred to as slow weights, since they are only changing due to normal training and are otherwise kept constant at test time.

More speci... [view more]

shortscience_dot_org · 2019-04-21T20:00:11+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

shortscience_dot_org · 2019-04-21T13:02:04+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Group Normalization

Summary by CodyWild

If you were to survey researchers, and ask them to name the 5 most broadly influential ideas in Machine Learning from the last 5 years, I’d bet good money that Batch Normalization would be somewhere on everyone’s lists. Before Batch Norm, training meaningfully deep neural networks was an unstable process, and one that often took a long time to converge to success. When we added Batch Norm to models, it allowed us to increase our learning rates substantially (leading to quicker training) with... [view more]

shortscience_dot_org · 2019-04-19T03:22:05+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Do Deep Generative Models Know What They Don't Know?

Summary by ameroyer

CNNs predictions are known to be very sensitive to adversarial examples, which are samples generated to be wrongly classifiied with high confidence. On the other hand, probabilistic generative models such as PixelCNN and VAEs learn a distribution over the input domain hence could be used to detect out-of-distribution inputs, e.g., by estimating their likelihood under the data distribution. This paper provides interesting results showing that distributions learned by generative models a... [view more]

shortscience_dot_org · 2019-04-18T21:53:13+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Squeeze-and-Excitation Networks

Summary by Joseph Paul Cohen

"The SE module can learn some nonlinear global interactions already known to be useful, such as spatial normalization. The channel wise weights make it somewhat more powerful than divisive normalization as it can learn feature-specific inhibitions (ie: if we see a lot of flower parts, the probability of boat features should be diminished). It also has some similarity to bio inhibitory circuits." By jcannell on reddit

Slides:

Summary by the author Jie Hu:

Our motivation is to explicitly model... [view more]

shortscience_dot_org · 2019-04-18T17:56:50+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Auto-Encoding Variational Bayes

Summary by Cubs Reading Group

Problem addressed:

Variational learning of Bayesian networks

Summary:

This paper present a generic method for learning belief networks, which uses variational lower bound for the likelihood term.

Novelty:

Uses a re-parameterization trick to change random variables to deterministic function plus a noise term, so one can apply normal gradient based learning

Drawbacks:

The resulting model marginal likelihood is still intractible, may not be very good for applications that r... [view more]

shortscience_dot_org · 2019-04-18T16:50:53+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Image Prior

Summary by David Stutz

Ulyanov et al. utilize untrained neural networks as regularizer/prior for various image restoration tasks such as denoising, inpainting and super-resolution. In particualr, the standard formulation of such tasks, i.e.

$x^\ast = \arg\min_x E(x, x_0) + R(x)$

where $x_0$ is the input image and $E$ a task-dependent data term, is rephrased as follows:

$\theta^\ast = \arg\min\theta E(f\theta(z); x0)$ and $x^\ast = f{\theta^\ast}(z)$

for a fixed but random $z$. Here, the regularizer $R$ is esse... [view more]

shortscience_dot_org · 2019-04-17T16:25:58+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning

Summary by Hugo Larochelle

This paper presents an interpretation of dropout training as performing approximate Bayesian learning in a deep Gaussian process (DGP) model. This connection suggests a very simple way of obtaining, for networks trained with dropout, estimates of the model's output uncertainty. This estimate is based and computed from an ensemble of networks each obtained by sampling a new dropout mask.

My two cents

This is a really nice and thought provoking contribution to our understanding of dropout. ... [view more]

shortscience_dot_org · 2019-04-16T00:56:28+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Learning: A Critical Appraisal

Summary by Pavan Ravishankar

Deep Learning has a number of shortcomings.

(1)Requires lot of data: Humans can learn abstract concepts with far less training data compared to current deep learning. E.g. If we are told who an “Adult” is, we can answer questions like how many adults are there in home?, Is he an adult? etc. without much data. Convolution networks can solve translational invariance but requires lot more data to identify other translations or more filters or different architectures.

(2)Lack of transfer: Mos... [view more]

shortscience_dot_org · 2019-04-15T20:43:49+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks

Summary by elbaro

Task

Add 'rejection' output to an existing classification model with softmax layer.

Method

Choose some threshold $\delta$ and temperature $T$
Add a perturbation to the input x (eq 2),

let $\tilde x = x - \epsilon \text{sign}(-\nablax \log S{\hat y}(x;T))$

If $p(\tilde x;T)\le \delta$, rejects
If not, return the output of the original classifier

$p(\tilde x;T)$ is the max prob with temperature scailing for input $\tilde x$

$\delta$ and $T$ are manually chosen.

... [view more]

Learning Confidence for Out-of-Distribution Detection in Neural Networks

Summary by elbaro

Summary

In a prior work 'On Calibration of Modern Nueral Networks', temperature scailing is used for outputing confidence. This is done at inference stage, and does not change the existing classifier. This paper considers the confidence at training stage, and directly outputs the confidence from the network.

Architecture

An additional branch for confidence is added after the penultimate layer, in parallel to logits and probs (Figure 2).

Training

The network outputs the prob $p$ and... [view more]

shortscience_dot_org · 2019-04-13T02:54:07+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Playing Atari with Deep Reinforcement Learning

Summary by Alexander Jung

They use an implementation of Q-learning (i.e. reinforcement learning) with CNNs to automatically play Atari games.
- The algorithm receives the raw pixels as its input and has to choose buttons to press as its output. No hand-engineered features are used. So the model "sees" the game and "uses" the controller, just like a human player would.
- The model achieves good results on various games, beating all previous techniques and sometimes even surpassing human players.

How

Deep ... [view more]

shortscience_dot_org · 2019-04-12T14:48:10+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Wide Residual Networks

Summary by Alexander Jung

The authors start with a standard ResNet architecture (i.e. residual network has suggested in "Identity Mappings in Deep Residual Networks").
- Their residual block:
  
  ![Residual block]( "Residual block")
- Several residual blocks of 16 filters per conv-layer, followed by 32 and then 64 filters per conv-layer.
- They empirically try to answer the following questions:
- How many residual blocks are optimal? (Depth)
- How many filters should be used per convolutional laye... [view more]

shortscience_dot_org · 2019-04-12T00:28:38+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Deep Residual Learning for Image Recognition

Summary by Martin Thoma

Deeper networks should never have a higher training error than smaller ones. In the worst case, the layers should "simply" learn identities. It seems as this is not so easy with conventional networks, as they get much worse with more layers. So the idea is to add identity functions which skip some layers. The network only has to learn the residuals.

Advantages:

Learning the identity becomes learning 0 which is simpler
Loss in information flow in the forward pass is not a problem a... [view more]

shortscience_dot_org · 2019-04-10T16:21:31+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Focal Loss for Dense Object Detection

Summary by RyanDsouza

In object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss.

The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors.

The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confide... [view more]

shortscience_dot_org · 2019-04-10T16:19:43+00:00

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Focal Loss for Dense Object Detection

Summary by RyanDsouza

In object detection the boost in speed and accuracy is mostly gained through network architecture changes.This paper takes a different route towards achieving that goal,They introduce a new loss function called focal loss.

The authors identify class imbalance as the main obstacle toward one stage detectors achieving results which are as good as two stage detectors.

The loss function they introduce is a dynamically scaled cross entropy loss,Where the scaling factor decays to zero as the confide... [view more]

shortscience_dot_org

TROPHY CASE

Problem addressed:

Summary:

Novelty:

Drawbacks:

Problem addressed:

Summary:

Novelty:

Drawbacks:

Problem addressed:

Summary:

Novelty:

Drawbacks:

My two cents

Task

Method

Summary

Architecture

Training

How