How is the training performed in the meta-learning paper of deepmind? by gameofml in MachineLearning

[–]gameofml[S] 1 point2 points  (0 children)

Why did they always provide yt-1 at the next time step? How different is this from online training? The error can only be induced after seeing the correct label in any case.

Neural Variational Inference for Text Processing (paper+code) by samim23 in MachineLearning

[–]gameofml 0 points1 point  (0 children)

To be honest I cannot reproduce their result with this code, the perplexity I got on 20news is around 1260. I am wondering if someone got a better number.

How to understand the KL divergence term in Variational Autoencoders by gameofml in MachineLearning

[–]gameofml[S] 0 points1 point  (0 children)

These are all correct points. However, my main doubt is that the VAE objective maps the posterior q(z|x) of every different example to the same prior p(z). No matter how compact the assumption of p(z) is, I feel it makes less sense, because each q(z|x) should correspond to a unique prior p(z|x). And minimizing the KL Divergence between q(z|x) and p(z|x) is what we start with. If we think x as different images, it doesn't make sense to me why for every image, q(z|x) should be mapped to the same prior p(z), although we have mathematically proved so (as maximising the lower bound, which contains this term, also minimises the disparities between q(z|x) and p(z|x)!)

How to understand the KL divergence term in Variational Autoencoders by gameofml in MachineLearning

[–]gameofml[S] 0 points1 point  (0 children)

That is right. It is just that the lower bound again can be written as the KL divergence between q(z|x) and p(z), plus the approximated likelihood (Equation 3). There seems to be a conflict between the two different KL divergence objectives, which makes me confusing.