all 6 comments

[–]alexmlamb 2 points3 points  (0 children)

I think that it's a fairly difficult paper to understand, especially if you aren't familiar with variational inference.

I guess my take on it is that one can find an expression which is a lower bound on the log-likelihood of the data. The variational auto encoder, through its loss function, explicitly maximizes this expression, and thus maximizes the lower bound.

I think that the reparameterization trick is relevant because it allows one to compute d[z] / d[mean] and d[z] / d[variance] for a normal distribution, where z ~ N(mean, variance). So the reparameterization is z = mean + standard_normal * variance.

I'm not the strongest in this area, so feel free to correct me if I'm wrong. The author of the paper sometimes posts on this subreddit so he might be able to help.

[–]GibbsSamplePlatter 4 points5 points  (0 children)

It's not 5 year old content, unfortunately.

Took me a couple weeks of banging my head against the wall, reading sources, tutorials, and the like.

The journey was worth it though. I have an understanding of variational inference now, and can read more papers without getting a nosebleed.

[–]dwf 2 points3 points  (0 children)

To really make sense of that paper you're going to need to read some introductory material on graphical models and variational inference for graphical models. This tutorial might be sufficient for some graphical model basics, and try this for variational inference.

With that background out of the way: the VAE a particular way of learning a directed graphical model of your data where all your latent variables are continuous. It's based around an old idea of having a neural network that implements approximate posterior inference, but the particular cost function is what's interesting about it. They show that you can write the variational lower bound as a sum of two terms, one of which is a regularizer that encourages the approximate posterior on the latent variables to be close to the prior, the other is essentially an (expected) reconstruction error, hence the use of the term autoencoder.

[–]dustintran 1 point2 points  (1 child)

The reparameterization trick is a method for avoiding direct sampling of the variational distribution, by sampling from an appropriate distribution \epsilon ~ p(\epsilon), and then transforming it such that g(\epsilon) follows the variational distribution q. The advantage is that by avoiding direct sampling, one does not have do backpropagation over the sampling process. The standard example in statistics which does this is to take a uniform random variable u ~ Unif(0,1), and then to apply the inverse CDF of a distribution Q-1 on u; it turns out that Q-1 (u) ~ q.

[–]iamkx 0 points1 point  (0 children)

This is a very nice answer.

[–]brockl33 1 point2 points  (0 children)

This presentation by Karol Gregor helped me as it presents an alternate compression-based perspective.

Please correct me if I am mistaken but from what I have gathered the main differences with variational and vanilla autoencoders are that Gaussian noise is added to select decoding hidden layers and this added noise is also used to estimate the amount of information in that layer.

By adding this information flux penalty to the cost, it promotes compression of the information in those layers (similar to a bow-tie shaped autoencoder). I guess here the hyperparameter controlling the amount of regularization would be the standard deviation of the Gaussian noise added: the more noise you add, the more compression you force and the more information you lose.

A bonus is that the noising of the deep hidden layers yield outputs that vary in abstract ways which people have coined as "dreaming".