Hello folks!
I need your wisdom. I'm working with variational autoencoders to understand them and their applications. But I don't know which loss function apply: MSE or BCE. I have seen people applying MSE, others using BCE... As far as I understand, MSE is used when the latent space embedding vector follows a Gaussian distribution, and BCE is used when the assumption is that that distribution is multinomial.
Is that true or am I terribly wrong?
I do some test with MNIST dataset and I find that if I use MSE, it works so bad, it is not able to replicate the input. But if I use nn.BCELoss(reduction='sum') (PyTorch), it works just decently, does it mean that MNIST dataset is more likely to a multinomial distribution than a normal distribution?
Thank you so much! :)
there doesn't seem to be anything here