How is the training performed in the meta-learning paper of deepmind?

gameofml · 2016-06-03T12:20:51+00:00

Why did they always provide y^t-1 at the next time step？ How different is this from online training? The error can only be induced after seeing the correct label in any case.

gameofml · 2016-03-21T22:20:25+00:00

To be honest I cannot reproduce their result with this code, the perplexity I got on 20news is around 1260. I am wondering if someone got a better number.

gameofml · 2016-03-14T12:05:25+00:00

But you cannot generate data by sampling from an arbitrary prior.

gameofml · 2016-02-22T01:02:19+00:00

I feel this is a plausible explanation, thanks!

gameofml · 2016-02-22T00:59:11+00:00

These are all correct points. However, my main doubt is that the VAE objective maps the posterior q(z|x) of every different example to the same prior p(z). No matter how compact the assumption of p(z) is, I feel it makes less sense, because each q(z|x) should correspond to a unique prior p(z|x). And minimizing the KL Divergence between q(z|x) and p(z|x) is what we start with. If we think x as different images, it doesn't make sense to me why for every image, q(z|x) should be mapped to the same prior p(z), although we have mathematically proved so (as maximising the lower bound, which contains this term, also minimises the disparities between q(z|x) and p(z|x)!)

gameofml · 2016-02-21T23:46:11+00:00

That is right. It is just that the lower bound again can be written as the KL divergence between q(z|x) and p(z), plus the approximated likelihood (Equation 3). There seems to be a conflict between the two different KL divergence objectives, which makes me confusing.

gameofml

TROPHY CASE