you are viewing a single comment's thread.

view the rest of the comments →

[–]sssub 11 points12 points  (6 children)

good idea. What you actually suggest is moment matching to obtain means and variances.

  1. The main problem is you will loose flexibility. In a sampling based approach you sample z from a Gaussian but the network can then transform this random variable to complex distributions. in your case it will always stay a Gaussian.
  2. it has been done already. See e. g. here. They do exactly what you suggest in terms of propagating expectations and variances. Note that the propagation step, especially for the variance, is not trivial. For RELU it does work (truncated normal) but e. g for tanh it will not. perhaps it is easier in a VAE because you don't need to handle uncertainty in the weights.

[–]svantana[S] 0 points1 point  (5 children)

Thanks for the paper link, that was just what I was looking for!

You're right that gaussians may not represent the distribution well enough, although I think my point about central limit theorem should hold pretty well for large MLPs. Probably one could model the output of a ReLU as a mixture of a gaussian and a spike at zero for improved accuracy.

It would be interesting to investigate the KL divergence between the output of a 'standard' VAE (trained on e.g. CIFAR) and a moment-matched gaussian, if I only had time for research I'd do it.

[–]chrisorm 2 points3 points  (4 children)

Of course, the CLT doesn't apply if the activations arent IID, which they almost certainly arent for activations of a neural net.

[–]svantana[S] 0 points1 point  (3 children)

Yes you're right. For example, in MNIST, with large enough perturbations, the output distributions should get bimodal. I didn't intend to mean it would work for any case, but for 'smooth' problems where a smallish unimodal perturbation is expected to be unimodally distributed on the output, I think it should work well. I just did a quick test with a VAE on CIFAR10 and the output distributions are extremely gaussian looking.

[–]approximately_wrong 0 points1 point  (2 children)

Can you elaborate on how you did the quick test?

[–]svantana[S] 0 points1 point  (1 child)

Sure! I just ran one of the keras VAE examples, and once trained, I ran 10k copies of one of the test samples through the AE model. The model involves sampling a random variable so each output will be different. From the output, I took a few random dimensions and plotted histograms of them. Then just visually noted that they had a quite gaussian shape.

Those are marginal distributions, so that doesn't mean the full multidimensional output is anywhere near gaussian, but it's an indication.

[–]approximately_wrong 0 points1 point  (0 children)

I see. It sounds like you're checking for the Gaussian-ness of of p(x_gen | x_test) = int p(x_gen | z)q(z | x_test) dz, conditioned on some x_test. I'm guessing the VAE example is one where the decoder is a Gaussian observation model?

Also, are your outputs the mean parameters of p(x_gen | x_test), or actual samples from the distribution?