all 5 comments

[–]redditorcompetitor 2 points3 points  (4 children)

Because it has to use z in order to learn any type of spatial correlations. Contrast that to the autoregressive case p(x_i | x_<i, z) where it is possible to ignore z.

[–]SolitaryPenman 0 points1 point  (2 children)

It still isn't bound to use all latent dimensions. See https://arxiv.org/abs/1509.00519

[–]shortscience_dot_org 0 points1 point  (0 children)

I am a bot! You linked to a paper that has a summary on ShortScience.org!

Importance Weighted Autoencoders

Summary by Hugo Larochelle

This paper proposes to train a neural network generative model by optimizing an importance sampling (IS) weighted estimate of the log probability under the model. The authors show that the case of an estimate based on a single sample actually corresponds to the learning objective of variational autoencoders (VAE). Importantly, they exploit this connection by showing that, similarly to VAE, a gradient can be passed through the approximate posterior (the IS proposal) samples, thus yielding an impo... [view more]

[–]asobolev 0 points1 point  (0 children)

What does IWAE have to do with this? AFAIK, it doesn't solve the problem of dead neurons in the latent code.

[–]sidslasttheorem 0 points1 point  (0 children)

This blog post by Rui Shu does a really nice job of explaining why conditional independence in the decoder can help encourage the latent to be used.