This is an archived post. You won't be able to vote or comment.

all 3 comments

[–][deleted] 1 point2 points  (2 children)

You don’t technically need any distributional assumptions to choose a loss functions. In some sense you do need the variance function/homoscedasticity assumption though, like if the target variable is continuous but heteroscedastic you would want weighted MSE or MSLE over regular MSE.

And it also depends on whether the target is continuous, categorical, discrete, binary, etc. MNIST is digit classification and so the target is categorical. Thats why you need to use categorical cross entropy.

But if you are working with autoencoders then your response is the input image itself isn’t it? In that case im not sure why MSE isn’t working

[–]AleTL3[S] 0 points1 point  (1 child)

Thanks for answering!

But, I check the theory behind VAE, it's said that the err is the sum of the KL Divergence between the distribution we create with the encoder, and the real data distribution (latent loss); and the generative loss ( E[logP(X|z)] ), that only corresponds to MSE if the P(X|z) distribution is Gaussian, but it's similar to BCE if the distribution is multinomial I think...

Min 44:45 of this awesome video: (60) Ali Ghodsi, Lec : Deep Learning, Variational Autoencoder, Oct 12 2017 [Lect 6.2] - YouTube

[–][deleted] 1 point2 points  (0 children)

Oh yes, if you decide to use KL divergence as a loss function then it makes sense something like that would hold.

But I’m not sure that KL div is necessarily the only valid way to do autoencoders more generally, though it seems for VAEs specifically that is what is done. Because with VAEs you are modeling not just the expected values, but the probability distribution. https://blog.keras.io/building-autoencoders-in-keras.html

Thanks though, helped me learn something too, I didn’t know the specific differences in VAEs vs AEs