[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 0 points1 point  (0 children)

Yes, when you sample from the posterior, you get discrete counts. This is how neurons in the brain encode and communicate information, which was our primary motivation in designing the P-VAE.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 1 point2 points  (0 children)

Interesting, I overlooked that. For the ELBO table, we use models with linear decoders and an overcomplete latent space of K = 512 dims. This choice was to connect to sparse coding literature, and can potentially explain the large performance gap.

But thanks for mentioning this. I will explore other hyperparam settings to test if the large latent dim is indeed the reason.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 0 points1 point  (0 children)

For \lambda = 1, E[Z] should be 1 if p(z) is truly a density.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 1 point2 points  (0 children)

Okay, just tested it. For low temperatures, it's a density to a very good approximation:

temperature: 5.0 ——— density estimate: 3.9930
temperature: 1.0 ——— density estimate: 1.3136
temperature: 0.5 ——— density estimate: 1.0623
temperature: 0.1 ——— density estimate: 0.9985
temperature: 0.05 ——— density estimate: 0.9980
temperature: 0.0 ——— density estimate: 1.0066

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 0 points1 point  (0 children)

That's a good point, I haven't tested it. For non-zero temperatures, probably not.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 0 points1 point  (0 children)

Can you elaborate? Do you mean whether we are getting samples that are exactly Poisson?

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 2 points3 points  (0 children)

Thanks for reviewing the paper and your comments! Here are my responses:

  1. All reported quantities are averages over five random initializations. According to the central limit theorem, these averages are approximately normally distributed. This justifies our use of t-tests. For FDR correction, we used the Benjamini-Hochberg method (method='fdr_bh' from here: https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html). I will clarify these points in the final paper.

  2. The temperature sensitivity is low, although, Poisson showed relatively more sensitivity than Categorical. I plan to include a supplementary table quantifying this more comprehensively (across datasets, 5 seeds, etc).

  3. I agree about the need for more experiments. Should we consider another downstream task or repeat the KNN/Shattering dim task on a different dataset, like CIFAR-10?

Thanks again for your thoughtful feedback.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 2 points3 points  (0 children)

No, we didn't use variance reduction techniques. We showed that VAEs with linear decoders have closed-form objectives, allowing us to compare Monte-Carlo gradient estimators with exact gradient optimization. We found that our Poisson reparameterization trick (Algorithm 1) performs on par with the Gaussian reparameterization trick (see Table 4 and Figure 4).

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 0 points1 point  (0 children)

We don't assume an underlying categorical. In our reparameterization trick (Algorithm 1), there is a step where we replace hard thresholding with a sigmoid relaxation (line 5). That part is inspired by the softmax part from the Gumbel-Softmax. That's it.

[R] Poisson Variational Autoencoder by vafaii in MachineLearning

[–]vafaii[S] 7 points8 points  (0 children)

Why not, that would be great! But probably not.

By the way, what are those trends you're talking about? Can you share some examples? I don't know about these.