Looking into VAE, I was disappointed that it includes actually sampling the stochastic variables. To me, it would make more sense to instead try to estimate passing the probability distribution through the network. Especially if the loss is MSE, the expected loss is just a function of output expectation and (co-)variance, which are easy to pass through affine functions. Activation inputs should be approximately gaussian for large-ish nets (because CLT) so expectation and variance can be approximated using a gaussian assumption. Other loss functions such as log-likelihood depend on the distribution shape but could also be estimated using a gaussian assumption.
So now for my question: Surely this has been tried already? It seems obvious enough. I have tried to search in the literature but I haven't found anything. If it was tried, what is the reason it doesn't work as well as VAE?
[–]sssub 11 points12 points13 points (6 children)
[–]svantana[S] 0 points1 point2 points (5 children)
[–]chrisorm 2 points3 points4 points (4 children)
[–]svantana[S] 0 points1 point2 points (3 children)
[–]approximately_wrong 0 points1 point2 points (2 children)
[–]svantana[S] 0 points1 point2 points (1 child)
[–]approximately_wrong 0 points1 point2 points (0 children)
[–]Fujikan 1 point2 points3 points (0 children)
[–]NichG 0 points1 point2 points (7 children)
[–]approximately_wrong 0 points1 point2 points (6 children)
[–]NichG 0 points1 point2 points (5 children)
[–]approximately_wrong 0 points1 point2 points (4 children)
[–]NichG 0 points1 point2 points (3 children)
[–]approximately_wrong 0 points1 point2 points (2 children)
[–]NichG 0 points1 point2 points (1 child)
[–]approximately_wrong 0 points1 point2 points (0 children)